⮝ Full datasets listing

PXD056915

PXD056915 is an original dataset announced via ProteomeXchange.

Dataset Summary
TitlePeptideForest: Semi-supervised machine learning integrating multiple search engines for peptide identification
DescriptionWe introduce PeptideForest, a semi-supervised machine learning approach that integrates the assignment of peptides to mass spectra from multiple algorithms to train a random forest classifier, thereby combining the results from different search engines. PeptideForest increases the number of peptide-to-spectrum matches that exhibit a q-value lower than 1% by 25.2 ± 1.6% compared to MS-GF+ data on samples containing mixed HEK and Escherichia coli proteomes. However, an increase in quantity does not necessarily reflect an increase in quality and this is why we devised a novel approach to determine the quality of the assigned spectra through TMT quantification of samples with known ground truths. Thereby, we could show that the increase in PSMs below 1% q-value does not come with a decrease in quantification quality and as such PeptideForest offers a possibility to gain deeper insights into bottom-up proteomics. PeptideForest has been integrated into our pipeline framework Ursgal and can therefore be combined with a wide array of algorithms.
HostingRepositoryPRIDE
AnnounceDate2025-06-02
AnnouncementXMLSubmission_2025-06-02_15:06:56.058.xml
DigitalObjectIdentifier
ReviewLevelPeer-reviewed dataset
DatasetOriginOriginal dataset
RepositorySupportUnsupported dataset by repository
PrimarySubmitterStefan Schulze
SpeciesList scientific name: Escherichia coli; NCBI TaxID: 562; scientific name: Homo sapiens (Human); NCBI TaxID: 9606;
ModificationListTMT6plex-126 reporter+balance reagent acylated residue; iodoacetamide derivatized residue
InstrumentQ Exactive HF
Dataset History
RevisionDatetimeStatusChangeLog Entry
02024-10-17 11:44:11ID requested
12025-06-02 15:06:56announced
Publication List
10.1021/acs.jproteome.4c00686;
Ranff T, Dennison M, B, é, dorf J, Schulze S, Zinn N, Bantscheff M, van Heugten JJRM, Fufezan C, PeptideForest: Semisupervised Machine Learning Integrating Multiple Search Engines for Peptide Identification. J Proteome Res, 24(2):929-939(2025) [pubmed]
Keyword List
submitter keyword: Human, machine learning, E. coli, statistical post-processing, peptide identification
Contact List
Christian Fufezan
contact affiliationCellzome A GSK Company, Heidelberg 69117, Germany
contact emailchristian@fufezan.net
lab head
Stefan Schulze
contact affiliationUniversity of Pennsylvania
contact emailsschulze@sas.upenn.edu
dataset submitter
Full Dataset Link List
Dataset FTP location
NOTE: Most web browsers have now discontinued native support for FTP access within the browser window. But you can usually install another FTP app (we recommend FileZilla) and configure your browser to launch the external application when you click on this FTP link. Or otherwise, launch an app that supports FTP (like FileZilla) and use this address: ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2025/06/PXD056915
PRIDE project URI
Repository Record List
[ + ]