Mining mammalian transcript data for functional long non-coding RNAs

Amit N. Khachane, Paul M. Harrison

    Research output: Contribution to journalArticlepeer-review

    130 Citations (Scopus)

    Abstract

    Background: The role of long non-coding RNAs (lncRNAs) in controlling gene expression has garnered increased interest in recent years. Sequencing projects, such as Fantom3 for mouse and H-InvDB for human, have generated abundant data on transcribed components of mammalian cells, the majority of which appear not to be protein-coding. However, much of the non-protein-coding transcriptome could merely be a consequence of 'transcription noise'. It is therefore essential to use bioinformatic approaches to identify the likely functional candidates in a high throughput manner. Principal Findings: We derived a scheme for classifying and annotating likely functional lncRNAs in mammals. Using the available experimental full-length cDNA data sets for human and mouse, we identified 78 lncRNAs that are either syntenically conserved between human and mouse, or that originate from the same protein-coding genes. Of these, 11 have significant sequence homology. We found that these lncRNAs exhibit: (i) patterns of codon substitution typical of non-coding transcripts; (ii) preservation of sequences in distant mammals such as dog and cow, (iii) significant sequence conservation relative to their corresponding flanking regions (in 50% cases, flanking regions do not have homology at all; and in the remaining, the degree of conservation is significantly less); (iv) existence mostly as single-exon forms (8/11); and, (v) presence of conserved and stable secondary structure motifs within them. We further identified orthologous protein-coding genes that are contributing to the pool of lncRNAs; of which, genes implicated in carcinogenesis are significantly over-represented. Conclusion: Our comparative mammalian genomics approach coupled with evolutionary analysis identified a small population of conserved long non-protein-coding RNAs (lncRNAs) that are potentially functional across Mammalia. Additionally, our analysis indicates that amongst the orthologous protein-coding genes that produce lncRNAs, those implicated in cancer pathogenesis are significantly over-represented, suggesting that these lncRNAs could play an important role in cancer pathomechanisms.
    Original languageEnglish
    Article numbere10316
    Number of pages9
    JournalPLoS One
    Volume5
    Issue number4
    DOIs
    Publication statusPublished - 2010

    Open Access - Access Right Statement

    Copyright: 2010 Khachane, Harrison. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

    Fingerprint

    Dive into the research topics of 'Mining mammalian transcript data for functional long non-coding RNAs'. Together they form a unique fingerprint.

    Cite this