IDEAS home Printed from https://ideas.repec.org/a/bpj/sagmbi/v4y2005i1n13.html
   My bibliography  Save this article

The Relative Inefficiency of Sequence Weights Approaches in Determining a Nucleotide Position Weight Matrix

Author

Listed:
  • Newberg Lee A

    (NYSDOH Wadsworth Center & Rensselaer Polytechnic Institute Department of Computer Science)

  • McCue Lee Ann

    (NYSDOH Wadsworth Center)

  • Lawrence Charles E

    (NYSDOH Wadsworth Center & Brown University)

Abstract

Approaches based upon sequence weights, to construct a position weight matrix of nucleotides from aligned inputs, are popular but little effort has been expended to measure their quality.We derive optimal sequence weights that minimize the sum of the variances of the estimators of base frequency parameters for sequences related by a phylogenetic tree. Using these we find that approaches based upon sequence weights can perform very poorly in comparison to approaches based upon a theoretically optimal maximum-likelihood method in the inference of the parameters of a position-weight matrix. Specifically, we find that among a collection of primate sequences, even an optimal sequences-weights approach is only 51% as efficient as the maximum-likelihood approach in inferences of base frequency parameters.We also show how to employ the variance estimators to obtain a greedy ordering of species for sequencing. Application of this ordering for the weighted estimators to a primate collection yields a curve with a long plateau that is not observed with maximum-likelihood estimators. This plateau indicates that the use of weighted estimators on these data seriously limits the utility of obtaining the sequences of more than two or three additional species.

Suggested Citation

  • Newberg Lee A & McCue Lee Ann & Lawrence Charles E, 2005. "The Relative Inefficiency of Sequence Weights Approaches in Determining a Nucleotide Position Weight Matrix," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 4(1), pages 1-18, June.
  • Handle: RePEc:bpj:sagmbi:v:4:y:2005:i:1:n:13
    DOI: 10.2202/1544-6115.1135
    as

    Download full text from publisher

    File URL: https://doi.org/10.2202/1544-6115.1135
    Download Restriction: For access to full text, subscription to the journal or payment for the individual article is required.

    File URL: https://libkey.io/10.2202/1544-6115.1135?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bpj:sagmbi:v:4:y:2005:i:1:n:13. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Peter Golla (email available below). General contact details of provider: https://www.degruyter.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.