Nomenclature and Symbolism for Amino Acids and Peptides

3AA-1 and 3AA-2

Contents of 3AA-1 and 3AA-2.

3AA-1 Names of Common α-Amino Acids

3AA-2 Formation of Semisystematic Names for Amino Acids and Derivatives

References for 3AA-1 and 3AA-2

Continued in 3AA-3 to 3AA-5


Part 1. Nomenclature

Part 1, Section A: AMINO-ACID NOMENCLATURE

3AA-1. NAMES OF COMMON α-AMlNO ACIDS

The trivial names of the α-amino acids that are commonly found in proteins and are represented in the genetic code, together with their symbols, systematic names [14] and formulas, are given in Table 1. Some other common amino acids are listed in the Appendix.

When the phrase 'amino acid' is a qualified noun it contains no hyphen; a hyphen is inserted when it becomes an adjective so as to join its components in qualifying another noun, e.g. amino-acid sequence.

Click here for "table free" view if the following is faulty.

Table 1. α-Amino acids incorporated into protein under mRNA direction.

The systematic names and formulas given refer to hypothetical forms in which amino groups are unprotonated and carboxyl groups are undissociated. This convention is useful to avoid various nomenclatural problems but should not be taken to imply that these structures represent an appreciable fraction of the amino-acid molecules.

Trivial
namea
Symbolsb Systematic namec Formula
AlanineAlaA2-Aminopropanoic acid CH3-CH(NH2)-COOH
ArginineArgR2-Amino-5-guanidinopentanoic acid H2N-C(=NH)-NH-[CH2]3-CH(NH2)-COOH
AsparagineAsndN d2-Amino-3-carbamoylpropanoic acid H2N-CO-CH2-CH(NH2)-COOH
Aspartic acidAspdD d2-Aminobutanedioic acidHOOC-CH2-CH(NH2)-COOH
CysteineCysC2-Amino-3-mercaptopropanoic acid HS-CH2-CH(NH2)-COOH
GlutamineGlndQd2-Amino-4-carbamoylbutanoic acid H2N-CO-[CH2]2-CH(NH2)-COOH
Glutamic acidGludE d2-Aminopentanedioic acid HOOC-[CH2]2-CH(NH2)-COOH
GlycineGlyGAminoethanoic acid CH2(NH2)-COOH
HistidineHisH2-Amino-3-(1H-imidazol-4-yl)-
propanoic acid
IsoleucineIleI2-Amino-3-methylpentanoic acide C2H5-CH(CH3)-CH(NH2)-COOH
LeucineLeuL2-Amino-4-methylpentanoic acid (CH3)2CH-CH2-CH(NH2)-COOH
LysineLysK2,6-Diaminohexanoic acid H2N-[CH2]4-CH(NH2)-COOH
MethionineMetM2-Amino-4-(methylthio)butanoic acid CH3-S-[CH2]2-CH(NH2)-COOH
PhenylalaninePheF2-Amino-3-phenylpropanoic acid C6H5-CH2-CH(NH2)-COOH
ProlineProPPyrrolidine-2-carboxylic acid
SerineSerS2-Amino-3-hydroxypropanoic acid HO-CH2-CH(NH2)-COOH
ThreonineThrT2-Amino-3-hydroxybutanoic acid eCH3-CH(OH)-CH(NH2)-COOH
TryptophanTrpW2-Amino-3-(lH-indol-3-yl)-
propanoic acid
TyrosineTyrY2-Amino-3-(4-hydroxyphenyl)-
propanoic acid
ValineValV2-Amino-3-methylbutanoic acid (CH3)2CH-CH(NH2)-COOH
Unspecified amino
acid
XaaXf
a The trivial name refers to the L or D or DL-amino acid; for those that are chiral only the L-amino acid is used for protein biosynthesis.

b Use of the one-letter symbols should be restricted to the comparison of long sequences (3AA-20).

c The fully systematic forms ethanoic, propanoic, butanoic and pentanoic may alternatively be called acetic, propionic, butyric and valeric, respectively. Similarly, butanedioic = succinic, 3-carbamoylpropanoic = succinamic, pentanedioic = glutaric, and 4-carbamoylbutanoic = glutaramic.

d The symbol Asx denotes Asp or Asn; likewise B denotes N or D. Glx and Z likewise represent glutamic acid or glutamine or a substance, such as 4-carboxyglutamic acid, Gla (3AA-15.2.6), or 5-oxoproline, Glp (3AA-16.5), that yields glutamic acid on acid hydrolysis of peptides.

e See 3AA-3 and -4 for stereochemical designation.

f See Addendum for alternative use of X.

g See Newsletter 1999 for use of U.

3AA-2. FORMATION OF SEMISYSTEMATIC NAMES FOR AMINO ACIDS AND DERIVATIVES

3AA-2.1. Principles of Forming Names

Semisystematic names of substituted α-amino acids are formed according to the general principles of organic nomenclature [14], by attaching the name of the substituent group to the trivial name of the amino acid. The position of the substitution is indicated by locants (see 3AA-2.2). The configuration, if known, should be indicated (see 3AA-3, 3AA-4).

New trivial names should not be coined for newly discovered α-amino acids unless there are compelling reasons. When they are needed (e.g. because the substance is important and its semisystematic name is cumbersome), the name should be constructed according to the general principles for naming natural products [15], including either some element of its chemical structure or reference to its biological origin. It is important to use no elements in the trivial name that imply an incorrect structure; when a new trivial name is used, it is essential that it be defined by a correctly constructed systematic or semisystematic name. A number of existing trivial names are given in the Appendix, and an extensive list has been published previously [6].

3AA-2.2 Designation of Locants

Note. The atom numbering given below is the normal chemical system for designating locants. A somewhat different system has been recommended for describing polypeptide conformations [16], in which Greek letters are used irrespective of the nature of the atom (unless it is hydrogen), so that in lysine N-6 becomes Nζ, and in phenylalanine C-l, C-2 and C-6 become Cδ1 and Cδ2 respectively.

2.2.1. Acyclic Amino Acids

In acyclic amino acids, the carbon atom of the carboxyl group next to the carbon atom carrying the amino group is numbered 1. Alternatively, Greek letters may be used, with C-2 being designated α. This practice is not encouraged for locants, although terms like 'α-amino acids' and 'α-carbon atom' are retained. Example:

A heteroatom has the same number as the carbon atom to which it is attached, e.g. N-2 is on C-2. When such numerals are used as locants they may be written as N6- or as 6-N, e.g. N6-acetyllysine.

The carbon atoms of the methyl groups of valine are numbered 4 and 4'; likewise those of leucine are 5 and 5'. Isoleucine is numbered as follows:

The word 'methyl' can be italicized for use as a locant for substitution on (or isotopic modification {Section H in [14]} of) the methyl group of methionine, e.g. [methyl-14C]methionine. The nitrogen atoms of arginine are designated as shown for the arginine (1+)cation:

It should be noted that the ω and ω' atoms of this cation are equivalent because of resonance. The carbon atom in the guanidino group may be called guanidino-C (it may be needed as a locant for isotopic replacement although it cannot carry a substituent).

2.2.2. Proline

The carbon atoms in proline are numbered as in pyrrolidine, the nitrogen atom being numbered 1, and proceeding towards the carboxyl group.

2.2.3. Aromatic Rings

The carbon atoms in the aromatic rings of phenylalanine, tyrosine and tryptophan are numbered as in systematic nomenclature, with 1 (or 3 for tryptophan) designating the carbon atom bearing the aliphatic chain. The carbon atoms of this chain are designated α (for the carbon atom attached to the amino and carboxyl groups) and β (for the atom attached to the ring system).

Note. This numbering should also be used for decarboxylated products (e.g. tryptamine).

2.2.4 Histidine

The nitrogen atoms of the imidazole ring of histidine are denoted by pros ('near', abbreviated π) and tele ('far', abbreviated τ) to show their position relative to the side chain. This recommendation [6,10] arose from the fact that two different systems of numbering the atoms in the imidazole ring of histidine had both been used for a considerable time (biochemists generally numbering as 1 the nitrogen atom adjacent to the side chain, and organic chemists designating it as 3). The carbon atom between the two ring nitrogen atoms is numbered 2 (as in imidazole), and the carbon atom next to the τ nitrogen is numbered 5. The carbon atoms of the aliphatic chain are designated α and β as in 2.2.1 and 2.2.3 above. This numbering should also be used for the decarboxylation product histamine and for substituted histidine.

2.2.5. Definition of Side Chain

When amino acids are combined in proteins and peptides, C-l, C-2 and N-2 of each residue (the numbering being that of aliphatic amino acids) form the repeating unit of the main chain ('backbone') and the remainder forms a 'side chain'. Hence the words 'side chain' refer to C-3 and higher numbered carbon atoms and their substituents.

3AA-2.3. Use of the Prefix 'homo'

An α-amino acid that is otherwise similar to one of the common ones (Table 1), but that contains one more methylene group in the carbon chain, may be named by prefixing 'homo' to the name of that common amino acid. 'Homo' in the sense of a higher homologue (F-4.5 of [15]) is commonly used for homoserine (2-amino-4-hydroxybutanoic acid) and homocysteine (2-amino-4-mercaptobutanoic acid).

3AA-2.4. Use of the Prefix 'nor'

The prefix 'nor' denotes removal of a methylene group (Sections F-4.2 and F-4.4 of [15]), but this is not the sense in which it has been used in the names 'norvaline' and 'norleucine'. Such names, although widely used, may therefore be misinterpreted, so we cannot recommend them, especially since the systematic names for the compounds intended, 2-aminopentanoic acid and 2-aminohexanoic acid, are short.

References

6. IUPAC Commission on the Nomenclature of Organic Chemistry (CNOC) and IUPAC-IUB Commission on Biochemical Nomenclature (CBN), Nomenclature of α-Amino Acids, Recommendations 1974, Biochem. J. 149, 1-16 (1975); Biochemistry, 14, 449-462 (1975); Eur. J. Biochem. 53, 1-14 (1975); also pp. 64-77 in [7].

7. International Union of Biochemistry (1978) Biochemical Nomenclature and Related Documents, The Biochemical Society, London.

10. IUPAC-IUB Commission on Biochemical Nomenclature (CBN), Symbols for Amino-Acid Derivatives and Peptides, Recommendations 1971, Arch. Biochem. Biophys. 150, 1-8 (1972); Biochem. J. 126, 773-780 (1972), corrected l35, 9 (1973); Biochemistry 11, 1726-1732 (1972); Biochim. Biophys. Acta, 263, 205-212 (1972); Eur. J. Biochem. 27, 201-207 (1972), corrected 45, 2 (1974); J. Biol. Chem. 247, 977-983 (1972); Pure Appl. Chem. 40, 315-331 (1974); also pp. 78-84 in [7].

14. International Union of Pure and Applied Chemistry (1979) Nomenclature of Organic Chemistry, Sections A, B, C, D, E, F and H, Pergamon Press, Oxford.

15. IUPAC Commission on the Nomenclature of Organic Chemistry (CNOC). Nomenclature of Organic Chemistry, Section F: Natural Products and Related Compounds, Recommendations 1976, Eur. J. Biochem. 86, 1-8 (1978); also pp. 19-26 in [7] and pp. 491-511 in [14]. [See also Biochemical Nomenclature and Related Documents, 2nd edition, Portland Press, 1992, pages 19-26.]

16. IUPAC-IUB Commission on Biochemical Nomenclature (CBN), Abbreviations and Symbols for the Description of the Conformation of Polypeptide Chains, 1969, Arch. Biochem. Biophys. 145, 405-421 (1971); Biochem. J. 121, 577-585 (1971); Biochemistry, 9, 3471-3479 (1970); Biochim. Biophys. Acta, 229, 1-17 (1971); Eur. J. Biochem. 17, 193-201 (1970); J. Biol. Chem. 245, 6489-6497 (1970); Mol. Biol. 7, 289-303 (1973) (in Russian); Pure Appl. Chem. 40, 291-308 (1974); also pp. 94-102 in [7].


Continue to the next section with 3AA-3 to 3AA-5 of Amino Acids and Peptides.

Return to Amino Acids and Peptides home page.