<<< Full experiment listing

PXD056205

PXD056205 is an original dataset announced via ProteomeXchange.

Dataset Summary
TitleBenchmarking Peptide Spectral Library Search Dataset
DescriptionSpectral library search (SLS) is a major approach for peptide identification from tandem mass spectrometry data, offering a complementary approach to conventional database search. Moreover, with the emergence of spectrum prediction models, proteomics database search is progressively becoming more like spectral library search of predicted peptide spectra. The performance of peptide identification algorithms thus frequently depends on how well the underlying Spectrum-Spectrum Matching (SSM) scoring functions distinguish true and false positive matches. However, detailed comparative studies evaluating the performance of SSM scoring functions remain limited by the absence of comprehensive benchmark datasets. We propose new methods to build benchmarks that assess the effectiveness and robustness of SSM scoring functions. The resulting benchmark dataset is composed of (i) a set of 476,063 precursors used to construct 8 query spectrum sets with different levels of noise added to "ideal" and real experimental spectra, and (ii) three spectral libraries with different spectra for the same 3,065,819 precursors: experimental spectra, annotated/de-noised spectra and predicted spectra. The benchmark set was then used to evaluate 9 common spectrum preprocessing scenarios, followed by the evaluation of 3 standard SSM scoring functions, Cosine, Projected-Cosine (commonly used for the analysis of chimeric/mixture spectra), and Jensen-Shannon divergence, and 2 additional scoring functions used in state-of-the-art SLS tools: SpectraST and EntropyScore. The results revealed that scoring spectrum-spectrum matches is still an important open problem, with the best recall for typical SLS searches still assessed to be poor at just ~70% at the typical 1% error rate. Overall, SpectraST performed best for spectra with little-to-no noise, but JS-divergence performed better in some cases as it was found to be most resistant to noise. Conversely, the performance of Cosine and Entropy score was found to be generally lower than previously reported, with Projected-Cosine performing especially poorly in most cases. However, the performance of the SSM scoring functions was also found to depend quite significantly on the minimum number of matching peaks required for each SSM, with benchmark results showing that the scoring functions' performance and relative ranking can be very significantly affected by how this important parameter is set. The resulting benchmark dataset can be used to test and support the development of SSM scoring functions and the proposed benchmark construction approach, providing a foundation that can be extended for additional types of spectrum-spectrum matching.
HostingRepositoryMassIVE
AnnounceDate2024-10-02
AnnouncementXMLSubmission_2024-10-02_08:56:33.071.xml
DigitalObjectIdentifier
ReviewLevelNon peer-reviewed dataset
DatasetOriginOriginal dataset
RepositorySupportUnsupported dataset by repository
PrimarySubmitterHao Xu
SpeciesList scientific name: Homo sapiens; common name: human; NCBI TaxID: 9606;
ModificationList2-pyrrolidone-5-carboxylic acid (Gln); deamidated L-asparagine; deamidated L-glutamine; L-methionine sulfoxide; S-carboxamidomethyl-L-cysteine; N-acetylated residue; carbamoylated residue
InstrumentQ Exactive
Dataset History
RevisionDatetimeStatusChangeLog Entry
02024-09-24 19:52:29ID requested
12024-10-02 08:56:33announced
Publication List
no publication
Keyword List
submitter keyword: Spectral library search, Benchmark dataset, MassIVE-KB, Predicted mass spectra, Noise resistance
Contact List
Nuno Bandeira
contact affiliationUCSD
contact emailbandeira@ucsd.edu
lab head
Hao Xu
contact affiliationUCSD
contact emailhax019
dataset submitter
Full Dataset Link List
MassIVE dataset URI
Dataset FTP location
NOTE: Most web browsers have now discontinued native support for FTP access within the browser window. But you can usually install another FTP app (we recommend FileZilla) and configure your browser to launch the external application when you click on this FTP link. Or otherwise, launch an app that supports FTP (like FileZilla) and use this address: ftp://massive.ucsd.edu/v08/MSV000095946/