IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1000144.html
   My bibliography  Save this article

Retroviral Integration Process in the Human Genome: Is It Really Non-Random? A New Statistical Approach

Author

Listed:
  • Alessandro Ambrosi
  • Claudia Cattoglio
  • Clelia Di Serio

Abstract

Retroviral vectors are widely used in gene therapy to introduce therapeutic genes into patients' cells, since, once delivered to the nucleus, the genes of interest are stably inserted (integrated) into the target cell genome. There is now compelling evidence that integration of retroviral vectors follows non-random patterns in mammalian genome, with a preference for active genes and regulatory regions. In particular, Moloney Leukemia Virus (MLV)–derived vectors show a tendency to integrate in the proximity of the transcription start site (TSS) of genes, occasionally resulting in the deregulation of gene expression and, where proto-oncogenes are targeted, in tumor initiation. This has drawn the attention of the scientific community to the molecular determinants of the retroviral integration process as well as to statistical methods to evaluate the genome-wide distribution of integration sites. In recent approaches, the observed distribution of MLV integration distances (IDs) from the TSS of the nearest gene is assumed to be non-random by empirical comparison with a random distribution generated by computational simulation procedures. To provide a statistical procedure to test the randomness of the retroviral insertion pattern, we propose a probability model (Beta distribution) based on IDs between two consecutive genes. We apply the procedure to a set of 595 unique MLV insertion sites retrieved from human hematopoietic stem/progenitor cells. The statistical goodness of fit test shows the suitability of this distribution to the observed data. Our statistical analysis confirms the preference of MLV-based vectors to integrate in promoter-proximal regions.Author Summary: Understanding how retroviral vectors (such as Moloney Leukemia Virus–based vectors) integrate in the human genome became a major safety issue in the field of gene therapy, since a concrete risk of developing tumors associated with the integration process was assessed in the clinical setting. Moloney Leukemia Virus–based vectors are apparently characterized by a non-random integration pattern, with a preference for the vicinities of active gene transcription start sites. We approach the problem of non-random retroviral integration from a probabilistic point of view. We model a normalized integration distance from the transcription start site of the nearest upstream or downstream gene. From this model, we derive a simple and straightforward testing procedure to estimate how the transcription start site of a given gene may or may not attract integration events. Our approach overcomes the issues of different gene length, gene orientation, and gene density, which are often critical in analyzing integration distances from transcription start sites. The approach is tested on real experimental data retrieved from human hematopoietic stem/progenitor cells.

Suggested Citation

  • Alessandro Ambrosi & Claudia Cattoglio & Clelia Di Serio, 2008. "Retroviral Integration Process in the Human Genome: Is It Really Non-Random? A New Statistical Approach," PLOS Computational Biology, Public Library of Science, vol. 4(8), pages 1-6, August.
  • Handle: RePEc:plo:pcbi00:1000144
    DOI: 10.1371/journal.pcbi.1000144
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000144
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1000144&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1000144?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Ulrich Abel & Annette Deichmann & Cynthia Bartholomae & Kerstin Schwarzwaelder & Hanno Glimm & Steven Howe & Adrian Thrasher & Alexandrine Garrigue & Salima Hacein-Bey-Abina & Marina Cavazzana-Calvo &, 2007. "Real-Time Definition of Non-Randomness in the Distribution of Genomic Events," PLOS ONE, Public Library of Science, vol. 2(6), pages 1-5, June.
    2. Chris J Needham & James R Bradford & Andrew J Bulpitt & David R Westhead, 2007. "A Primer on Learning in Bayesian Networks for Computational Biology," PLOS Computational Biology, Public Library of Science, vol. 3(8), pages 1-8, August.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Benjamin-Fink, Nicole & Reilly, Brian K., 2017. "A road map for developing and applying object-oriented bayesian networks to “WICKED” problems," Ecological Modelling, Elsevier, vol. 360(C), pages 27-44.
    2. J. H. Smid & A. N. Swart & A. H. Havelaar & A. Pielaat, 2011. "A Practical Framework for the Construction of a Biotracing Model: Application to Salmonella in the Pork Slaughter Chain," Risk Analysis, John Wiley & Sons, vol. 31(9), pages 1434-1450, September.
    3. Juan A G Ranea & Ian Morilla & Jon G Lees & Adam J Reid & Corin Yeats & Andrew B Clegg & Francisca Sanchez-Jimenez & Christine Orengo, 2010. "Finding the “Dark Matter” in Human and Yeast Protein Network Prediction and Modelling," PLOS Computational Biology, Public Library of Science, vol. 6(9), pages 1-14, September.
    4. Paula Laccourreye & Concha Bielza & Pedro Larrañaga, 2022. "Explainable Machine Learning for Longitudinal Multi-Omic Microbiome," Mathematics, MDPI, vol. 10(12), pages 1-23, June.
    5. Jumeniyaz Seydehmet & Guang Hui Lv & Ilyas Nurmemet & Tayierjiang Aishan & Abdulla Abliz & Mamat Sawut & Abdugheni Abliz & Mamattursun Eziz, 2018. "Model Prediction of Secondary Soil Salinization in the Keriya Oasis, Northwest China," Sustainability, MDPI, vol. 10(3), pages 1-22, February.
    6. Yishai Shimoni & Marc Y Fink & Soon-gang Choi & Stuart C Sealfon, 2010. "Plato's Cave Algorithm: Inferring Functional Signaling Networks from Early Gene Expression Shadows," PLOS Computational Biology, Public Library of Science, vol. 6(6), pages 1-13, June.
    7. Kaghazchi, Afsaneh & Hashemy Shahdany, S. Mehdy & Roozbahani, Abbas, 2021. "Simulation and evaluation of agricultural water distribution and delivery systems with a Hybrid Bayesian network model," Agricultural Water Management, Elsevier, vol. 245(C).

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1000144. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.