Monday, February 22, 2010

Short notations for α-amino acids and peptides based on one-letter code (1LC)

A three-letter code (3LC) is typically applied to encode molecular structures that contain sequences of the proteinogenic amino acids. The notations for such structures can further be shortened by about 75% (also omitting hyphens in the sequence notation) when one-letter codes (1LCs) are used. Both, 3LCs and 1LCs for the proteinogenic amino acids can be looked up by their names in different languages (English, French, German, Italian, Portuguese and Spanish) and in context of thermodynamic property links.
The synthetic pentapeptide pentigetide, for which the 3LC-based encoding has previously been demonstrated (see Short notations for α-amino acids and peptides based on three-letter code (3LC) ) being Asp-Ser-Asp-Pro-Arg, shrinks to DSDPR by using the 1LC system.
1LC-based notations are very efficient for sequence search and similarity-based modeling of peptides and derivatives in large libraries.

Note: The design of an encoding system for derivatives of amino acid sequences needs a syntax that distiguishes between amino acid 1LCs and one-letter chemical element symbols. Ambiguities arise for the following letters:
  • C: cysteine (carbon)
  • F: phenylalanine (fluorine)
  • H: histidine (hydrogen)
  • I: isoleucine (iodine)
  • K: lysine (potassium)
  • N: asparagine (nitrogen)
  • O: pyrrolysine (oxygen)
  • P: proline (phosphorus)
  • S: serine (sulfur)
  • U: selenocysteine (uranium)
  • V: valine (vanadium)
  • Y: tyrosine (yttrium)
Finally, the 1LC for aspartic acid (D) may conflict with the symbol for the hydrogen isotope deuterium.


