Author
Listed:
- D. Jarquin
- A. Roy
- B. Clarke
- S. Ghosal
Abstract
Plant breeders want to develop cultivars that outperform existing genotypes. Some characteristics (here ‘main traits’) of these cultivars are categorical and difficult to measure directly. It is important to predict the main trait of newly developed genotypes accurately. In addition to marker data, breeding programs often have information on secondary traits (or ‘phenotypes’) that are easy to measure. Our goal is to improve prediction of main traits with interpretable relations by combining the two data types using variable selection techniques. However, the genomic characteristics can overwhelm the set of secondary traits, so a standard technique may fail to select any phenotypic variables. We develop a new statistical technique that ensures appropriate representation from both the secondary traits and the genotypic variables for optimal prediction. When two data types (markers and secondary traits) are available, we achieve improved prediction of a binary trait by two steps that are designed to ensure that a significant intrinsic effect of a phenotype is incorporated in the relation before accounting for extra effects of genotypes. First, we sparsely regress the secondary traits on the markers and replace the secondary traits by their residuals to obtain the effects of phenotypic variables as adjusted by the genotypic variables. Then, we develop a sparse logistic classifier using the markers and residuals so that the adjusted phenotypes may be selected first to avoid being overwhelmed by the genotypic variables due to their numerical advantage. This classifier uses forward selection aided by a penalty term and can be computed effectively by a technique called the one-pass method. It compares favorably with other classifiers on simulated and real data.
Suggested Citation
D. Jarquin & A. Roy & B. Clarke & S. Ghosal, 2024.
"Combining phenotypic and genomic data to improve prediction of binary traits,"
Journal of Applied Statistics, Taylor & Francis Journals, vol. 51(8), pages 1497-1523, June.
Handle:
RePEc:taf:japsta:v:51:y:2024:i:8:p:1497-1523
DOI: 10.1080/02664763.2023.2208773
Download full text from publisher
As the access to this document is restricted, you may want to search for a different version of it.
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:taf:japsta:v:51:y:2024:i:8:p:1497-1523. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Longhurst (email available below). General contact details of provider: http://www.tandfonline.com/CJAS20 .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.