⮝ Full datasets listing

PXD013892

PXD013892 is an original dataset announced via ProteomeXchange.

Dataset Summary
TitleDiscovery of lincRNA-encoded Peptides: An Integrated Transcriptomics, Proteomics and Bioinformatics Approach
DescriptionLong noncoding RNA (lncRNA) refers to the family of RNA transcripts with more than 200 nucleotides in length, but cannot encode proteins. lincRNA (long intergenic noncoding RNA) is a subset of lncRNA that do not overlap with known genes. Increasing evidences have shown that some of these transcripts do in fact contain open reading frames (ORFs) to code short peptides, and do have significant functional roles within the cells. However, many of these peptides remain unannotated and uncharacterized. This study proposes a workflow integrating proteomics, transcriptomics and bioinformatics specifically for lincRNA-encoded peptide discovery. The workflow was tested on the mouse kidney inner medulla (IM), a region that contains the collecting duct system responsible for regulated water transport. In brief, short peptides (from 2 to 20 kDa) were enriched by tricine protein gel and in-gel trypsinized into peptides, then analyzed using high resolution mass spectrometry. However, to match mass fragment ion spectra to peptide sequences requires a reference peptide sequence database which are not available for the noncoding transcripts, and must be generated de novo in the sample of interest. We modified the RNA-Seq mapping workflow by filtering out coding reads first to better quantitate the noncoding transcript expressions. Also, a rule-based ORF prediction was implemented to select one best predicted ORF per noncoding transcript to construct the peptide library. Candidates were further evaluated using several quality control criteria and bioinformatics tools. Three candidates, conserved in rat and human, passed all criteria, maybe truly novel coding genes. In summary, we present a workflow based on the modern transcriptomics and proteomics technologies for lincRNA-encoded peptide discovery. A computational challenge is to generate a hypothetical lincRNA-encoded peptide database for peptide-mass spectra matching. With this workflow, we discovered three previously unannotated peptides in the mouse kidney inner medulla. The same workflow can be applied in any cell or tissue type of interest to quickly advance this research field.
HostingRepositoryPRIDE
AnnounceDate2020-05-12
AnnouncementXMLSubmission_2020-05-11_23:14:28.xml
DigitalObjectIdentifier
ReviewLevelPeer-reviewed dataset
DatasetOriginOriginal dataset
RepositorySupportUnsupported dataset by repository
PrimarySubmitterCHIN-RANG YANG
SpeciesList scientific name: Mus musculus (Mouse); NCBI TaxID: 10090;
ModificationListmonohydroxylated residue; deamidated residue; iodoacetamide derivatized residue
InstrumentOrbitrap Fusion
Dataset History
RevisionDatetimeStatusChangeLog Entry
02019-05-17 03:18:39ID requested
12020-05-11 23:14:28announced
Publication List
Dataset with its publication pending
Keyword List
submitter keyword: noncoding, lncRNA, lincRNA, lincRNA peptide, micropeptide, sORF, ORF Prediction
Contact List
CHIN-RANG YANG
contact affiliationNHLBI, NIH
contact emailchin-rang.yang@nih.gov
lab head
CHIN-RANG YANG
contact affiliationNational Institutes of Health, USA
contact emailchin-rang.yang@nih.gov
dataset submitter
Full Dataset Link List
Dataset FTP location
NOTE: Most web browsers have now discontinued native support for FTP access within the browser window. But you can usually install another FTP app (we recommend FileZilla) and configure your browser to launch the external application when you click on this FTP link. Or otherwise, launch an app that supports FTP (like FileZilla) and use this address: ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2020/05/PXD013892
PRIDE project URI
Repository Record List
[ + ]