In this study, we faced the challenge of deciphering a protein that has been designed and expressed by E.coli in such a way that the amino acid sequence encodes two concatenated English sentences. The sequence carried unknown modifications and cannot be found online. The letters ‘O’ and ‘U’ are both replaced by ‘K’ in the protein. To solve the challenge, we developped a workflow consisting of shotgun proteomics, de novo sequencing and a bioinformatic tool to search for words from the identified sequences. By using this workflow, we assembled the first complete English sentence and validated by searching against a customized sequence database.