<<< Full experiment listing

PXD019486

PXD019486 is an original dataset announced via ProteomeXchange.

Dataset Summary
TitleLong noncoding RNA (lncRNA), smORF encoded polypeptides (SEPs), NONCODE database, enrichment, mass spectrometry
DescriptionMany small open reading frames (smORFs) embedded in lncRNA transcripts have been shown to encode biologically functional polypeptides (smORFs-encoded polypeptides, SEPs) in different organisms. Despite significant advances in genomics, bioinformatics and proteomics that largely enabled the discovery of novel SEPs, their identification across different biological samples is still hampered by their poor predictability, diminutive size and low relative abundance. Here, we take advantage of NONCODE, a repository containing the most complete collection and annotation of lncRNA transcripts from different species, to build a novel database that attempts to maximize a collection of SEPs from human and mouse lncRNA transcripts. In order to further improve SEP discovery, we implemented two effective and complementary polypeptide enrichment strategies, 30 kDa MWCO filter and C8 SPE column. These combined strategies enabled us to discover 357 and 409 SEPs from, respectively, 8 human cell lines, and 3 mouse cell lines and 8 mouse tissues. Importantly, nineteen of the identified SEPs were then verified through in-vitro expression, immunoblotting, parallel reaction monitoring (PRM) and synthetic peptides. Subsequent bioinformatic analysis revealed that some of the physical and chemical properties of these novel SEPs, including amino acid composition and codon usage, are different from those commonly found in canonical proteins. Intriguingly, nearly 65% of the identified SEPs were found to be initiated with non-AUG start codons. Overall, the strategy presented in this study encompasses an efficient workflow that enabled us to identify 766 novel SEPs across multiple cell lines and tissues, which probably represents the largest number of SEPs detected by mass spectrometry reported to date. These novel SEPs might not only provide new clues for the annotation of noncoding elements in the genome but can also serve as a valuable resource for the functional characterization of individual SEPs.
HostingRepositoryPRIDE
AnnounceDate2021-06-15
AnnouncementXMLSubmission_2021-06-14_18:26:11.050.xml
DigitalObjectIdentifier
ReviewLevelPeer-reviewed dataset
DatasetOriginOriginal dataset
RepositorySupportUnsupported dataset by repository
PrimarySubmitterQing Zhang
SpeciesList scientific name: Mus musculus (Mouse); NCBI TaxID: 10090; scientific name: Homo sapiens (Human); NCBI TaxID: 9606;
ModificationListNo PTMs are included in the dataset
InstrumentQ Exactive
Dataset History
RevisionDatetimeStatusChangeLog Entry
02020-06-01 03:10:44ID requested
12021-06-14 18:26:11announced
Publication List
Dataset with its publication pending
Keyword List
submitter keyword: Long noncoding RNA (lncRNA), smORF encoded polypeptides (SEPs), NONCODE database, enrichment, mass spectrometry
Contact List
Fuquan Yang
contact affiliationInstitute of Biophysics, Chinese Academy of Sciences
contact emailfqyang@ibp.ac.cn
lab head
Qing Zhang
contact affiliationInstitute of Biophysics, Chinese Academy of Sciences
contact emailzhangqing14@mails.ucas.ac.cn
dataset submitter
Full Dataset Link List
Dataset FTP location
NOTE: Most web browsers have now discontinued native support for FTP access within the browser window. But you can usually install another FTP app (we recommend FileZilla) and configure your browser to launch the external application when you click on this FTP link. Or otherwise, launch an app that supports FTP (like FileZilla) and use this address: ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2021/06/PXD019486
PRIDE project URI
Repository Record List
[ + ]