<<< Full experiment listing

PXD029360

PXD029360 is an original dataset announced via ProteomeXchange.

Dataset Summary
TitleGeneration of ENSEMBL-based proteogenomics databases boost the identification of novel peptides
DescriptionA novel bioinformatics tool pypgatk and the pgdb workflow is presented in study to create proteogenomics databases based on ENSEMBL resources. The tools allow the generation of protein sequences from novel protein-coding transcripts by performing a three-frame translation of pseudogenes, lncRNAs, and other non-canonical transcripts, such as those produced by alternative splicing events. It also includes exonic out-of-frame translation from otherwise canonical protein-coding mRNAs. Moreover, the tool enables the generation of variant protein sequences from multiple sources of genomic variants including COSMIC, cBioportal, gnomAD, and mutations detected from sequencing of patient samples. pypgatk and pgdb provide multiple functionalities for database handling, notably optimized target/decoy generation by the algorithm DecoyPyrat. Finally, we perform a reanalysis of four public datasets in PRIDE by generating cell-type specific databases for 65 cell lines using the pypgatk and pgdb workflow, revealing a wealth of non-canonical or cryptic peptides amounting to more than 10% of the total number of peptides identified (43,501 out of 402,512).
HostingRepositoryPRIDE
AnnounceDate2021-10-26
AnnouncementXMLSubmission_2021-10-26_13:27:02.081.xml
DigitalObjectIdentifierhttps://dx.doi.org/10.6019/PXD029360
ReviewLevelPeer-reviewed dataset
DatasetOriginOriginal dataset
RepositorySupportSupported dataset by repository
PrimarySubmitterYasset Perez-Riverol
SpeciesList scientific name: Homo sapiens (Human); NCBI TaxID: 9606;
ModificationListphosphorylated residue; monohydroxylated residue
InstrumentQ Exactive HF; Q Exactive
Dataset History
RevisionDatetimeStatusChangeLog Entry
02021-10-26 04:13:09ID requested
12021-10-26 13:27:02announced
Publication List
Dataset with its publication pending
Keyword List
submitter keyword: proteogenomics, ENSEMBL, reanalysis
Contact List
Yasset Perez-Riverol
contact affiliationEMBL-EBI
contact emailyperez@ebi.ac.uk
lab head
Yasset Perez-Riverol
contact affiliationEBI
contact emailyperez@ebi.ac.uk
dataset submitter
Full Dataset Link List
Dataset FTP location
NOTE: Most web browsers have now discontinued native support for FTP access within the browser window. But you can usually install another FTP app (we recommend FileZilla) and configure your browser to launch the external application when you click on this FTP link. Or otherwise, launch an app that supports FTP (like FileZilla) and use this address: ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2021/10/PXD029360
PRIDE project URI
Repository Record List
[ + ]