IDEAS home Printed from https://ideas.repec.org/a/gam/jmathe/v9y2021i5p521-d509315.html
   My bibliography  Save this article

Gene Set Analysis Using Spatial Statistics

Author

Listed:
  • Angela L. Riffo-Campos

    (Centro de Excelencia de Modelación y Computación Científica, Universidad de La Frontera, Temuco 4780000, Chile
    Departamento de Estadística e Investigación Operativa, Universidad de Valencia, Avda. Vicent Andrés Estellés, 1, 46100 Burjasot, Spain)

  • Guillermo Ayala

    (Departamento de Estadística e Investigación Operativa, Universidad de Valencia, Avda. Vicent Andrés Estellés, 1, 46100 Burjasot, Spain)

  • Francisco Montes

    (Departamento de Estadística e Investigación Operativa, Universidad de Valencia, Avda. Vicent Andrés Estellés, 1, 46100 Burjasot, Spain)

Abstract

Gene differential expression consists of the study of the possible association between the gene expression, evaluated using different types of data as DNA microarray or RNA-Seq technologies, and the phenotype. This can be performed marginally for each gene (differential gene expression) or using a gene set collection (gene set analysis). A previous (marginal) per-gene analysis of differential expression is usually performed in order to obtain a set of significant genes or marginal p -values used later in the study of association between phenotype and gene expression. This paper proposes the use of methods of spatial statistics for testing gene set differential expression analysis using paired samples of RNA-Seq counts. This approach is not based on a previous per-gene differential expression analysis. Instead, we compare the paired counts within each sample/control using a binomial test. Each pair per gene will produce a p -value so gene expression profile is transformed into a vector of p -values which will be considered as an event belonging to a point pattern. This would be the first component of a bivariate point pattern. The second component is generated by applying two different randomization distributions to the correspondence between samples and treatment. The self-contained null hypothesis considered in gene set analysis can be formulated in terms of the associated point pattern as a random labeling of the considered bivariate point pattern. The gene sets were defined by the Gene Ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. The proposed methodology was tested in four RNA-Seq datasets of colorectal cancer (CRC) patients and the results were contrasted with those obtained using the edgeR-GOseq pipeline. The proposed methodology has proved to be consistent at the biological and statistical level, in particular using Cuzick and Edwards test with one realization of the second component and between-pair distribution.

Suggested Citation

  • Angela L. Riffo-Campos & Guillermo Ayala & Francisco Montes, 2021. "Gene Set Analysis Using Spatial Statistics," Mathematics, MDPI, vol. 9(5), pages 1-13, March.
  • Handle: RePEc:gam:jmathe:v:9:y:2021:i:5:p:521-:d:509315
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/2227-7390/9/5/521/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/2227-7390/9/5/521/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Peter J. Diggle, 1990. "A Point Process Modelling Approach to Raised Incidence of a Rare Phenomenon in the Vicinity of a Prespecified Point," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 153(3), pages 349-362, May.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Alexandre Rodrigues & Peter Diggle & Renato Assuncao, 2010. "Semiparametric approach to point source modelling in epidemiology and criminology," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 59(3), pages 533-542, May.
    2. Martin L. Hazelton & Tilman M. Davies, 2022. "Pointwise comparison of two multivariate density functions," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 49(4), pages 1791-1810, December.
    3. Davies, Tilman M. & Jones, Khair & Hazelton, Martin L., 2016. "Symmetric adaptive smoothing regimens for estimation of the spatial relative risk function," Computational Statistics & Data Analysis, Elsevier, vol. 101(C), pages 12-28.
    4. Peter J. Diggle & Barry S. Rowlingson, 1994. "A Conditional Approach to Point Process Modelling of Elevated Risk," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 157(3), pages 433-440, May.
    5. S P Kingham & A C Gatrell & B Rowlingson, 1995. "Testing for Clustering of Health Events within a Geographical Information System Framework," Environment and Planning A, , vol. 27(5), pages 809-821, May.
    6. Carl Schmertmann & Renato Assunção & Joseph Potter, 2010. "Knox meets Cox: Adapting epidemiological space-time statistics to demographic studies," Demography, Springer;Population Association of America (PAA), vol. 47(3), pages 629-650, August.
    7. Cucala, Lionel, 2009. "A flexible spatial scan test for case event data," Computational Statistics & Data Analysis, Elsevier, vol. 53(8), pages 2843-2850, June.
    8. Dale L. Zimmerman, 2008. "Estimating the Intensity of a Spatial Point Process from Locations Coarsened by Incomplete Geocoding," Biometrics, The International Biometric Society, vol. 64(1), pages 262-270, March.
    9. Paciorek, Christopher J., 2007. "Computational techniques for spatial logistic regression with large data sets," Computational Statistics & Data Analysis, Elsevier, vol. 51(8), pages 3631-3653, May.
    10. A C Gatrell & C E Dunn & P J Boyle, 1991. "The Relative Utility of the Central Postcode Directory and Pinpoint Address Code in Applications of Geographical Information Systems," Environment and Planning A, , vol. 23(10), pages 1447-1458, October.
    11. Hossain, Md. Monir & Lawson, Andrew B., 2009. "Approximate methods in Bayesian point process spatial models," Computational Statistics & Data Analysis, Elsevier, vol. 53(8), pages 2831-2842, June.
    12. Álvaro Briz‐Redón & Jorge Mateu & Francisco Montes, 2022. "Identifying crime generators and spatially overlapping high‐risk areas through a nonlinear model: A comparison between three cities of the Valencian region (Spain)," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 76(1), pages 97-120, February.
    13. Ronald E. Gangnon & Murray K. Clayton, 2000. "Bayesian Detection and Modeling of Spatial Disease Clustering," Biometrics, The International Biometric Society, vol. 56(3), pages 922-935, September.
    14. Takuo Matsubara & Jeremias Knoblauch & François‐Xavier Briol & Chris J. Oates, 2022. "Robust generalised Bayesian inference for intractable likelihoods," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(3), pages 997-1022, July.
    15. Borrajo, M.I. & González-Manteiga, W. & Martínez-Miranda, M.D., 2020. "Bootstrapping kernel intensity estimation for inhomogeneous point processes with spatial covariates," Computational Statistics & Data Analysis, Elsevier, vol. 144(C).
    16. Fuqiang Dai & Hao Liu & Xia Zhang & Qing Li, 2021. "Exploring the Emerging Trends of Spatial Epidemiology: A Scientometric Analysis Based on CiteSpace," SAGE Open, , vol. 11(4), pages 21582440211, November.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jmathe:v:9:y:2021:i:5:p:521-:d:509315. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.