PXD056915 is an
original dataset announced via ProteomeXchange.
Dataset Summary
Title | PeptideForest: Semi-supervised machine learning integrating multiple search engines for peptide identification |
Description | We introduce PeptideForest, a semi-supervised machine learning approach that integrates the assignment of peptides to mass spectra from multiple algorithms to train a random forest classifier, thereby combining the results from different search engines. PeptideForest increases the number of peptide-to-spectrum matches that exhibit a q-value lower than 1% by 25.2 ± 1.6% compared to MS-GF+ data on samples containing mixed HEK and Escherichia coli proteomes. However, an increase in quantity does not necessarily reflect an increase in quality and this is why we devised a novel approach to determine the quality of the assigned spectra through TMT quantification of samples with known ground truths. Thereby, we could show that the increase in PSMs below 1% q-value does not come with a decrease in quantification quality and as such PeptideForest offers a possibility to gain deeper insights into bottom-up proteomics. PeptideForest has been integrated into our pipeline framework Ursgal and can therefore be combined with a wide array of algorithms. |
HostingRepository | PRIDE |
AnnounceDate | 2025-06-02 |
AnnouncementXML | Submission_2025-06-02_15:06:56.058.xml |
DigitalObjectIdentifier | |
ReviewLevel | Peer-reviewed dataset |
DatasetOrigin | Original dataset |
RepositorySupport | Unsupported dataset by repository |
PrimarySubmitter | Stefan Schulze |
SpeciesList | scientific name: Escherichia coli; NCBI TaxID: 562; scientific name: Homo sapiens (Human); NCBI TaxID: 9606; |
ModificationList | TMT6plex-126 reporter+balance reagent acylated residue; iodoacetamide derivatized residue |
Instrument | Q Exactive HF |
Dataset History
Revision | Datetime | Status | ChangeLog Entry |
0 | 2024-10-17 11:44:11 | ID requested | |
⏵ 1 | 2025-06-02 15:06:56 | announced | |
Publication List
10.1021/acs.jproteome.4c00686; |
Ranff T, Dennison M, B, é, dorf J, Schulze S, Zinn N, Bantscheff M, van Heugten JJRM, Fufezan C, PeptideForest: Semisupervised Machine Learning Integrating Multiple Search Engines for Peptide Identification. J Proteome Res, 24(2):929-939(2025) [pubmed] |
Keyword List
submitter keyword: Human, machine learning, E. coli, statistical post-processing, peptide identification |
Contact List
Christian Fufezan |
contact affiliation | Cellzome A GSK Company, Heidelberg 69117, Germany |
contact email | christian@fufezan.net |
lab head | |
Stefan Schulze |
contact affiliation | University of Pennsylvania |
contact email | sschulze@sas.upenn.edu |
dataset submitter | |
Full Dataset Link List
Dataset FTP location
NOTE: Most web browsers have now discontinued native support for FTP access within the browser window. But you can usually install another FTP app (we recommend FileZilla) and configure your browser to launch the external application when you click on this FTP link. Or otherwise, launch an app that supports FTP (like FileZilla) and use this address: ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2025/06/PXD056915 |
PRIDE project URI |
Repository Record List
[ + ]
[ - ]
- PRIDE
- PXD056915
- Label: PRIDE project
- Name: PeptideForest: Semi-supervised machine learning integrating multiple search engines for peptide identification