PXD020407
PXD020407 is an original dataset announced via ProteomeXchange.
Dataset Summary
Title | Retention Time Prediction Using Neural Networks Increases Identifications in Crosslinking Mass Spectrometry |
Description | Abstract: Crosslinking mass spectrometry (Crosslinking MS) has developed into a robust technique that is increasingly used to investigate the interactomes of organelles and cells. However, the incomplete and noisy information contained in spectra limits especially the identification of heteromeric protein-protein interactions (PPIs) from the many theoretically possible PPIs. We successfully leveraged here chromatographic retention time (RT) to complement the mass spectrometry-centric identification process. For this, we first made crosslinked peptides amenable to RT prediction, through a Siamese neural network, and then added RT information to the identification process. Our multi-task machine learning model xiRT achieved highly accurate predictions in a multi-dimensional separation experiment of crosslinked E. coli lysate conducted for this study. We combined strong cation exchange (SCX), hydrophilic strong anion exchange (hSAX) and reversed-phase (RP) chromatography and reached R^2 0.94 in RP and a margin of error of 1 fraction for hSAX in 94%, and SCX in 85% of the cases. Importantly, supplementing the search engine score with retention time features led to a 1.4-fold increase in PPIs, at 1% PPI false discovery rate (FDR). We also demonstrated the value of this approach for the more routine analysis of multiprotein complexes. In the Fanconi anaemia monoubiquitin ligase complex, an increase of 1.7-fold in heteromeric residue-pairs was achieved at 1% residue-pair FDR, solely using reversed-phase RT. Retention times therefore proved to be a powerful complement to mass spectrometric information to improve the identification of crosslinked peptides. We envision xiRT to supplement search engines in their scoring routines to increase the sensitivity of Crosslinking MS analyses especially for protein-protein interactions. Conclusion: Using a Siamese network architecture, we succeeded in bringing RT prediction into the Crosslinking MS field, independent of separation setup and search software. Our open source application xiRT introduces the concept of multi-task learning to achieve multi-dimensional chromatographic retention time prediction, and may use any peptide sequence-dependent measure including for example collision cross section or isoelectric point. The black-box character of the neural network was reduced by means of interpretable machine learning that revealed individual amino acid contributions towards the separation behavior. The RT predictions – even when using only the RP dimension – complement mass spectrometric information to enhance the identification of heteromeric crosslinks in multiprotein complex and proteome-wide studies. Overfitting does not account for this gain as known false target matches from an entrapment database did not increase. Leveraging additional information sources may help to address the mass-spectrometric identification challenge of heteromeric crosslinks. |
HostingRepository | jPOST |
AnnounceDate | 2021-04-19 |
AnnouncementXML | Submission_2022-09-18_03:36:05.619.xml |
DigitalObjectIdentifier | https://dx.doi.org/10.6019/PXD020407 |
ReviewLevel | Non peer-reviewed dataset |
DatasetOrigin | Original dataset |
RepositorySupport | Supported dataset by repository |
PrimarySubmitter | Sven Giese |
SpeciesList | scientific name: Escherichia coli; NCBI TaxID: 562; |
ModificationList | S-carboxamidomethyl-L-cysteine; L-methionine sulfoxide |
Instrument | Q Exactive |
Dataset History
Revision | Datetime | Status | ChangeLog Entry |
---|---|---|---|
0 | 2020-07-16 19:46:54 | ID requested | |
1 | 2021-04-19 12:00:46 | announced | |
⏵ 2 | 2022-09-18 03:36:05 | announced | 2022-09-18: Updated FTP location. |
Publication List
Dataset with its publication pending |
Keyword List
submitter keyword: proteomics, machine learning, retention time prediction, PPI |
Contact List
Juri Rappsilber | |
---|---|
lab head | |
Sven Giese | |
contact affiliation | TU Berlin |
dataset submitter |
Full Dataset Link List
jPOST dataset URI |
Dataset FTP location NOTE: Most web browsers have now discontinued native support for FTP access within the browser window. But you can usually install another FTP app (we recommend FileZilla) and configure your browser to launch the external application when you click on this FTP link. Or otherwise, launch an app that supports FTP (like FileZilla) and use this address: ftp://ftp.jpostdb.org/JPST000916/ |