The classification of samples on a molecular level has manifold applications, from cancer biology, where the goal is to classify patients according to effective treatments, to phylogenetics to identify evolutionary relationships between species. In such scenarios modern molecular methods are based on the alignment of DNA or amino acid sequences, often only on selected parts of the genome, but also genome-wide comparisons of sequences are performed. Recently proteomics-based approaches have become popular. An established method for the identification of peptides and proteins is liquid chromatography - tandem mass spectrometry (LC-MS/MS). This technique is used to identify protein sequences from tandem mass spectra by means of database searches, given samples with known genome-wide sequence information, and then to apply sequence based methods. Alternatively, de novo peptide sequencing algorithms annotate MS/MS spectra and deduce peptide/protein information without the need of database. A newer approach independent of additional information is to directly compare unidentified tandem mass spectra. The challenge then is to compute the distance between pairwise MS/MS runs consisting of thousands of spectra.