<<< Full experiment listing

PXD020407-1

PXD020407 is an original dataset announced via ProteomeXchange.

Dataset Summary
TitleRetention Time Prediction Using Neural Networks Increases Identifications in Crosslinking Mass Spectrometry
DescriptionAbstract: Crosslinking mass spectrometry (Crosslinking MS) has developed into a robust technique that is increasingly used to investigate the interactomes of organelles and cells. However, the incomplete and noisy information contained in spectra limits especially the identification of heteromeric protein-protein interactions (PPIs) from the many theoretically possible PPIs. We successfully leveraged here chromatographic retention time (RT) to complement the mass spectrometry-centric identification process. For this, we first made crosslinked peptides amenable to RT prediction, through a Siamese neural network, and then added RT information to the identification process. Our multi-task machine learning model xiRT achieved highly accurate predictions in a multi-dimensional separation experiment of crosslinked E. coli lysate conducted for this study. We combined strong cation exchange (SCX), hydrophilic strong anion exchange (hSAX) and reversed-phase (RP) chromatography and reached R^2 0.94 in RP and a margin of error of 1 fraction for hSAX in 94%, and SCX in 85% of the cases. Importantly, supplementing the search engine score with retention time features led to a 1.4-fold increase in PPIs, at 1% PPI false discovery rate (FDR). We also demonstrated the value of this approach for the more routine analysis of multiprotein complexes. In the Fanconi anaemia monoubiquitin ligase complex, an increase of 1.7-fold in heteromeric residue-pairs was achieved at 1% residue-pair FDR, solely using reversed-phase RT. Retention times therefore proved to be a powerful complement to mass spectrometric information to improve the identification of crosslinked peptides. We envision xiRT to supplement search engines in their scoring routines to increase the sensitivity of Crosslinking MS analyses especially for protein-protein interactions. Conclusion: Using a Siamese network architecture, we succeeded in bringing RT prediction into the Crosslinking MS field, independent of separation setup and search software. Our open source application xiRT introduces the concept of multi-task learning to achieve multi-dimensional chromatographic retention time prediction, and may use any peptide sequence-dependent measure including for example collision cross section or isoelectric point. The black-box character of the neural network was reduced by means of interpretable machine learning that revealed individual amino acid contributions towards the separation behavior. The RT predictions – even when using only the RP dimension – complement mass spectrometric information to enhance the identification of heteromeric crosslinks in multiprotein complex and proteome-wide studies. Overfitting does not account for this gain as known false target matches from an entrapment database did not increase. Leveraging additional information sources may help to address the mass-spectrometric identification challenge of heteromeric crosslinks.
HostingRepositoryjPOST
AnnounceDate2021-04-19
AnnouncementXMLSubmission_2021-04-19_12:00:46.172.xml
DigitalObjectIdentifierhttps://dx.doi.org/10.6019/PXD020407
ReviewLevelNon peer-reviewed dataset
DatasetOriginOriginal dataset
RepositorySupportSupported dataset by repository
PrimarySubmitterSven Giese
SpeciesList scientific name: Escherichia coli; NCBI TaxID: 562;
ModificationListS-carboxamidomethyl-L-cysteine; L-methionine sulfoxide
InstrumentQ Exactive
Dataset History
RevisionDatetimeStatusChangeLog Entry
02020-07-16 19:46:54ID requested
12021-04-19 12:00:46announced
22022-09-18 03:36:05announced2022-09-18: Updated FTP location.
Publication List
Dataset with its publication pending
Keyword List
submitter keyword: proteomics, machine learning, retention time prediction, PPI
Contact List
Juri Rappsilber
lab head
Sven Giese
contact affiliationTU Berlin
dataset submitter
Full Dataset Link List
jPOST dataset URI
Dataset FTP location
NOTE: Most web browsers have now discontinued native support for FTP access within the browser window. But you can usually install another FTP app (we recommend FileZilla) and configure your browser to launch the external application when you click on this FTP link. Or otherwise, launch an app that supports FTP (like FileZilla) and use this address: ftp://ftp.biosciencedbc.jp/archive/jpostrepos/JPST000916