⮝ Full datasets listing

PXD042958-1

PXD042958 is an original dataset announced via ProteomeXchange.

Dataset Summary
TitleLarge-scale proteogenomics characterization of the Mycobacterium tuberculosis hidden proteome
DescriptionTraditional genome annotation methods exclude Open Reading Frames shorter than 300 codons (smORFs), which leaves a substantial portion of the proteome overlooked. Proteogenomics is a multi-omics approach that merges genomics, transcriptomics and proteomics to identify proteoforms and unannotated proteins from Mass Spectrometry data. Here, we employed our recently developed proteogenomics pipeline to aid genome annotation and identify hundreds of novel microproteins encoded by smORFs in the genome of Mycobacterium tuberculosis (Mtb). To avoid limitations regarding sensitivity, we used 680 Mass Spectrometry experiments in a large-scale approach, which let us classify the findings by different degrees of confidence using our machine learning model. After integrating the results with RNA-Seq datasets, we explore the biological relevance of the novel sequences and show they are differentially expressed upon starvation and antibiotic treatment, and are co-expressed with many annotated genes that are vital for bacterial virulence. Moreover, some smORFs are located inside essential genomic segments and could be attractive targets for the development of new drugs. Altogether, our results should improve the current annotation of the proteome of Mtb and guide the following studies focusing on studying these microproteins thoroughly.
HostingRepositoryPRIDE
AnnounceDate2025-05-06
AnnouncementXMLSubmission_2025-05-06_10:59:18.981.xml
DigitalObjectIdentifier
ReviewLevelPeer-reviewed dataset
DatasetOriginOriginal dataset
RepositorySupportUnsupported dataset by repository
PrimarySubmitterEduardo Vieira de Souza
SpeciesList scientific name: Mycobacterium tuberculosis H37Rv; NCBI TaxID: 83332;
ModificationListiodoacetamide derivatized residue
InstrumentLTQ Orbitrap
Dataset History
RevisionDatetimeStatusChangeLog Entry
02023-06-13 13:46:54ID requested
12025-05-06 10:59:19announced
Publication List
10.1038/s41598-024-82465-w;
de Souza EV, Dalberto PF, Miranda AC, Saghatelian A, Pinto AM, Basso LA, Machado P, Bizarro CV, Large-scale proteogenomics characterization of microproteins in Mycobacterium tuberculosis. Sci Rep, 14(1):31186(2024) [pubmed]
Keyword List
submitter keyword: proteogenomics, microproteins, tuberculosis,small orfs
Contact List
Cristiano Valim Bizarro
contact affiliationCentro de Pesquisas em Biologia Molecular e Funcional (CPBMF), Instituto Nacional de Ciência e Tecnologia em Tuberculose (INCT-TB), Programa de Pós-Graduação em Biologia Celular e Molecular - Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS), Porto Alegre, Brazil
contact emailcristiano.bizarro@pucrs.br
lab head
Eduardo Vieira de Souza
contact affiliationSalk Institute for Biological Studies
contact emailvdseduardo@gmail.com
dataset submitter
Full Dataset Link List
Dataset FTP location
NOTE: Most web browsers have now discontinued native support for FTP access within the browser window. But you can usually install another FTP app (we recommend FileZilla) and configure your browser to launch the external application when you click on this FTP link. Or otherwise, launch an app that supports FTP (like FileZilla) and use this address: ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2025/05/PXD042958
PRIDE project URI
Repository Record List
[ + ]