PXD056205
	PXD056205 is an original dataset announced via ProteomeXchange.
      
	 
	 Dataset Summary
| Title | Benchmarking Peptide Spectral Library Search Dataset | 
| Description | Spectral library search (SLS) is a major approach for peptide identification from tandem mass spectrometry data, offering a complementary approach to conventional database search. Moreover, with the emergence of spectrum prediction models, proteomics database search is progressively becoming more like spectral library search of predicted peptide spectra. The performance of peptide identification algorithms thus frequently depends on how well the underlying Spectrum-Spectrum Matching (SSM) scoring functions distinguish true and false positive matches. However, detailed comparative studies evaluating the performance of SSM scoring functions remain limited by the absence of comprehensive benchmark datasets. We propose new methods to build benchmarks that assess the effectiveness and robustness of SSM scoring functions. The resulting benchmark dataset is composed of (i) a set of 476,063 precursors used to construct 8 query spectrum sets with different levels of noise added to "ideal" and real experimental spectra, and (ii) three spectral libraries with different spectra for the same 3,065,819 precursors: experimental spectra, annotated/de-noised spectra and predicted spectra. The benchmark set was then used to evaluate 9 common spectrum preprocessing scenarios, followed by the evaluation of 3 standard SSM scoring functions, Cosine, Projected-Cosine (commonly used for the analysis of chimeric/mixture spectra), and Jensen-Shannon divergence, and 2 additional scoring functions used in state-of-the-art SLS tools: SpectraST and EntropyScore. The results revealed that scoring spectrum-spectrum matches is still an important open problem, with the best recall for typical SLS searches still assessed to be poor at just ~70% at the typical 1% error rate. Overall, SpectraST performed best for spectra with little-to-no noise, but JS-divergence performed better in some cases as it was found to be most resistant to noise. Conversely, the performance of Cosine and Entropy score was found to be generally lower than previously reported, with Projected-Cosine performing especially poorly in most cases. However, the performance of the SSM scoring functions was also found to depend quite significantly on the minimum number of matching peaks required for each SSM, with benchmark results showing that the scoring functions' performance and relative ranking can be very significantly affected by how this important parameter is set. The resulting benchmark dataset can be used to test and support the development of SSM scoring functions and the proposed benchmark construction approach, providing a foundation that can be extended for additional types of spectrum-spectrum matching. | 
| HostingRepository | MassIVE | 
| AnnounceDate | 2024-10-02 | 
| AnnouncementXML | Submission_2024-10-02_08:56:33.071.xml | 
| DigitalObjectIdentifier | |
| ReviewLevel | Non peer-reviewed dataset | 
| DatasetOrigin | Original dataset | 
| RepositorySupport | Unsupported dataset by repository | 
| PrimarySubmitter | Hao Xu | 
| SpeciesList | scientific name: Homo sapiens; common name: human; NCBI TaxID: 9606; | 
| ModificationList | 2-pyrrolidone-5-carboxylic acid (Gln); deamidated L-asparagine; deamidated L-glutamine; L-methionine sulfoxide; S-carboxamidomethyl-L-cysteine; N-acetylated residue; carbamoylated residue | 
| Instrument | Q Exactive | 
Dataset History
| Revision | Datetime | Status | ChangeLog Entry | 
|---|---|---|---|
| 0 | 2024-09-24 19:52:29 | ID requested | |
| ⏵ 1 | 2024-10-02 08:56:33 | announced | 
Publication List 
| no publication | 
Keyword List 
| submitter keyword: Spectral library search, Benchmark dataset, MassIVE-KB, Predicted mass spectra, Noise resistance | 
Contact List 
| Nuno Bandeira | |
|---|---|
| contact affiliation | UCSD | 
| contact email | bandeira@ucsd.edu | 
| lab head | |
| Hao Xu | |
| contact affiliation | UCSD | 
| contact email | hax019 | 
| dataset submitter | |
Full Dataset Link List 
| MassIVE dataset URI | 
| Dataset FTP location NOTE: Most web browsers have now discontinued native support for FTP access within the browser window. But you can usually install another FTP app (we recommend FileZilla) and configure your browser to launch the external application when you click on this FTP link. Or otherwise, launch an app that supports FTP (like FileZilla) and use this address: ftp://massive.ucsd.edu/v08/MSV000095946/ | 


 to receive all new ProteomeXchange dataset release announcements!
 to receive all new ProteomeXchange dataset release announcements!

