PXD042958 is an
original dataset announced via ProteomeXchange.
Dataset Summary
Title | Large-scale proteogenomics characterization of the Mycobacterium tuberculosis hidden proteome |
Description | Traditional genome annotation methods exclude Open Reading Frames shorter than 300 codons (smORFs), which leaves a substantial portion of the proteome overlooked. Proteogenomics is a multi-omics approach that merges genomics, transcriptomics and proteomics to identify proteoforms and unannotated proteins from Mass Spectrometry data. Here, we employed our recently developed proteogenomics pipeline to aid genome annotation and identify hundreds of novel microproteins encoded by smORFs in the genome of Mycobacterium tuberculosis (Mtb). To avoid limitations regarding sensitivity, we used 680 Mass Spectrometry experiments in a large-scale approach, which let us classify the findings by different degrees of confidence using our machine learning model. After integrating the results with RNA-Seq datasets, we explore the biological relevance of the novel sequences and show they are differentially expressed upon starvation and antibiotic treatment, and are co-expressed with many annotated genes that are vital for bacterial virulence. Moreover, some smORFs are located inside essential genomic segments and could be attractive targets for the development of new drugs. Altogether, our results should improve the current annotation of the proteome of Mtb and guide the following studies focusing on studying these microproteins thoroughly. |
HostingRepository | PRIDE |
AnnounceDate | 2025-05-06 |
AnnouncementXML | Submission_2025-05-06_10:59:18.981.xml |
DigitalObjectIdentifier | |
ReviewLevel | Peer-reviewed dataset |
DatasetOrigin | Original dataset |
RepositorySupport | Unsupported dataset by repository |
PrimarySubmitter | Eduardo Vieira de Souza |
SpeciesList | scientific name: Mycobacterium tuberculosis H37Rv; NCBI TaxID: 83332; |
ModificationList | iodoacetamide derivatized residue |
Instrument | LTQ Orbitrap |
Dataset History
Revision | Datetime | Status | ChangeLog Entry |
0 | 2023-06-13 13:46:54 | ID requested | |
⏵ 1 | 2025-05-06 10:59:19 | announced | |
Publication List
10.1038/s41598-024-82465-w; |
de Souza EV, Dalberto PF, Miranda AC, Saghatelian A, Pinto AM, Basso LA, Machado P, Bizarro CV, Large-scale proteogenomics characterization of microproteins in Mycobacterium tuberculosis. Sci Rep, 14(1):31186(2024) [pubmed] |
Keyword List
submitter keyword: proteogenomics, microproteins, tuberculosis,small orfs |
Contact List
Cristiano Valim Bizarro |
contact affiliation | Centro de Pesquisas em Biologia Molecular e Funcional (CPBMF), Instituto Nacional de Ciência e Tecnologia em Tuberculose (INCT-TB), Programa de Pós-Graduação em Biologia Celular e Molecular - Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS), Porto Alegre, Brazil |
contact email | cristiano.bizarro@pucrs.br |
lab head | |
Eduardo Vieira de Souza |
contact affiliation | Salk Institute for Biological Studies |
contact email | vdseduardo@gmail.com |
dataset submitter | |
Full Dataset Link List
Dataset FTP location
NOTE: Most web browsers have now discontinued native support for FTP access within the browser window. But you can usually install another FTP app (we recommend FileZilla) and configure your browser to launch the external application when you click on this FTP link. Or otherwise, launch an app that supports FTP (like FileZilla) and use this address: ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2025/05/PXD042958 |
PRIDE project URI |
Repository Record List
[ + ]
[ - ]
- PRIDE
- PXD042958
- Label: PRIDE project
- Name: Large-scale proteogenomics characterization of the Mycobacterium tuberculosis hidden proteome