⮝ Full datasets listing

PXD062120

PXD062120 is an original dataset announced via ProteomeXchange.

Dataset Summary
TitlePropensity for proto-gene emergence in bacteria
DescriptionThe birth of new genes from non-coding sequences has been postulated to be preceded by a proto-gene phase, in which a sequence is translated into protein but does not exhibit hallmarks of a clear function. Despite the abundance of such proto-genes in bacterial genomes, the frequency of their emergence and whether they actually act as precursors of new genes in natural populations are still open questions. To address these issues, we applied a combination of transcriptomic, proteomic and comparative genomic approaches to identify and analyze hundreds of novel bacterial protein-coding genes that have previously escaped annotation. These novel proteins, including many that are widely conserved across genera, display sequence properties indistinguishable from the non-coding regions of the genome, suggesting that the vast majority are evolving neutrally. We provide evidence of de novo emergence of three proto-genes within the history of the E. coli species; however, most such elements are formed via the mutational modification of existing open reading frames. Contrary to expectations, we discover that proto-genes emerge at a uniform rate across distant bacterial taxa despite significant differences in their genomic characteristics, suggesting the presence of taxon-specific mechanisms that regulate their origination and persistence.
HostingRepositoryPRIDE
AnnounceDate2025-09-19
AnnouncementXMLSubmission_2025-09-19_10:01:37.758.xml
DigitalObjectIdentifierhttps://dx.doi.org/10.6019/PXD062120
ReviewLevelPeer-reviewed dataset
DatasetOriginOriginal dataset
RepositorySupportSupported dataset by repository
PrimarySubmitterMd Hassan uz-Zaman
SpeciesList scientific name: Escherichia coli; NCBI TaxID: 562; scientific name: Bacteria; NCBI TaxID: NCBITaxon:2;
ModificationListmonohydroxylated residue; iodoacetamide derivatized residue
InstrumentThermo Scientific instrument model
Dataset History
RevisionDatetimeStatusChangeLog Entry
02025-03-21 08:06:04ID requested
12025-09-19 10:01:38announced
Publication List
10.6019/PXD062120;
Keyword List
submitter keyword: Proto-genes, mass spectrometry, bacteria, De novo gene evolution
Contact List
Howardh Ochman
contact affiliationMolecular Biosciences, University of Texas at Austin
contact emailhoward.ochman@austin.utexas.edu
lab head
Md Hassan uz-Zaman
contact affiliationPostdoctoral Fellow, Molecular Biosciences, University of Texas at Austin
contact emailh.uzzaman@utexas.edu
dataset submitter
Full Dataset Link List
Dataset FTP location
NOTE: Most web browsers have now discontinued native support for FTP access within the browser window. But you can usually install another FTP app (we recommend FileZilla) and configure your browser to launch the external application when you click on this FTP link. Or otherwise, launch an app that supports FTP (like FileZilla) and use this address: ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2025/09/PXD062120
PRIDE project URI
Repository Record List
[ + ]