Continued from 3AA-17
Contents of 3AA-18 and 3AA-19
3AA-18 Symbols for Substituents
3AA-19 Peptide Symbolism
Continued in 3AA-20 and 3AA-21
Groups substituted for hydrogen or hydroxyl may be indicated by their formulas or by symbols or by combination of both, e.g.
Note: the symbol Bz is often used for benzoyl in organic chemistry, and Bzl for benzyl, but because these symbols are so similar, the alternative PhCO and PhCH2 are preferable.
Trifluoroacetylglycine CF3CO-Gly (Table 3, Note ii)
Suggestions for symbols to designate substituent (or protecting) groups common in peptide and protein chemistry are given in Tables 2, 3 & 4.
Click here for "table free" view if the tables below are faulty.
Table 2. Nitrogen substituents (protecting groups) of the urethane type
Benzyloxycarbonyl- | Z- or Cbz- |
2-(p-Biphenylyl)isopropyloxycarbonyl- [strictly 1-(biphenyl-4-yl)-1-methylethoxycarbonyl-] | Bpoc- |
p-Bromobenzyloxycarbonyl- | Z(Br)- |
t-Butoxycarbonyl- | Boc- or ButOCO- or t-BuOCO- or Me3C-OCO- |
α,α-Dimethyl-3,5-dimethoxybenzyloxycarbonyl- | Ddz- |
Fluoren-9-ylmethoxycarbonyl- | Fmoc- |
p-Methoxybenzyloxycarbonyl- | Z(OMe)- |
p-Nitrobenzyloxycarbonyl- | Z(NO2)- |
p-Phenylazobenzyloxycarbonyl- | Pz- |
Table 3. Non-urethane substituents for nitrogen, oxygen or sulfur
Acetamidomethyl- | Acm- |
Acetyl- | Ac- |
Benzoyl- (C6H5-CO-) | PhCO- (or Bz-; see note in 3AA-18.1) |
Benzyl- (C6H5 -CH2-) | PhCH2- (or Bzl; see note in 3AA-18.1) |
Carbamoyl- | NH2CO- (preferred to Cbm-) |
(3-Carboxy-4-nitrophenyl)thio- | Nbs- (see 3AA-18.2) |
3-Carboxypropanoyl- (HOOC-CH2-CH2-CO-) | Suc- (see Note i) |
Dansyl-, 5-(dimethylamino)naphth-l-ylsulfonyl- | Dns- |
2,4-Dinitrophenyl- | Dnp- or N2ph (see Note ii) |
Formyl- | HCO- or For- (see Note iii) |
4-Iodophenylsulfonyl- (pipsyl-) | Ips- |
Maleoyl- (-OC-CH=CH-CO-) | -Mal- or Mal< (C-404.1 of [14]) |
Maleyl- (HOOC-CH=CH-CO-) | Mal- |
2-Nitrophenylthio- | NpS (Nps- often used) |
Phenyl(thiocarbamoyl)- | PhNHCS- or Ptc- |
Phthaloyl- | -Pht- or Pht< |
Phthalyl- (o-carboxybenzoyl-) | Pht- |
Succinyl- (-OC-CH2-CH2-CO-) | -Suc- or Suc< (see Note i) |
Tosyl- | Tos- |
Trifluoroacetyl- | CF3CO- |
Trityl- (triphenylmethyl-) | Ph3C- or Trt- |
Notes
(i) In organic nomenclature (C-404.1 of [14]), 'succinyl' signifies the bivalent group formed from succinic acid by removal of both hydroxyl groups, but in biochemical usage it usually signifies the 3-carboxypropanoyl group, e.g. succinyl-CoA.
(ii) The use of D for 'di' and T for 'tri' and 'tetra' is discouraged if these apply to atoms or groups for which simple symbols exist, e.g. in CF3 CO-, Me3Si and H4 folate. We feel less strongly when their avoidance involves giving unusual meanings to symbols, e.g. N for nitro, so Dnp and N2ph are offered as alternative symbols for dinitrophenyl. See also Note ii of 3AA-15.2.5.
(iii) The symbol HCO- is preferred to CHO- for the formyl group, because CHO- has sometimes been used to indicate the attachment of carbohydrate.
Table 4. Substituents at the carboxyl group
Group | Symbol | Name of glycine derivative (see note) |
---|---|---|
Benzotriazol-1-yloxy | -OBt | 1-(Glycyloxy)benzotriazole |
Benzyloxy | -OCH2Ph (or-OBzl, see note in 3AA-18.1) | Glycine benzyl ester |
tert-Butoxy | -OCMe3 or -OBut | Glycine t-butyl ester |
Diphenylmethoxy | -OCHPh2 or -OBzh | Glycine diphenylmethyl ester (or benzhydryl ester) |
Ethoxy | -OEt | Glycine ethyl ester |
Methoxy | -OMe | Glycine methyl ester |
4-Nitrobenzyloxy | -ONb | Glycine 4-nitrobenzyl ester |
4-Nitrophenoxy | -ONp | Glycine 4-nitrophenyl ester |
4-Nitrophenylthio | -SNp | Thioglycine S-(4-nitrophenyl ester) |
Pentachlorophenoxy | -OPcp | Glycine pentachlorophenyl ester |
Phenylthio | -SPh | Thioglycine S-(phenyl ester) |
Quinolin-8-yloxy | -OQu | Glycine quinolin-8-yl ester |
Succinimido-oxy | -ONSu or -OSu | N-(Glycyloxy)succinimide |
2,4,5-Trichlorophenyloxy | -OTcp | Glycine 2,4,5-trichlorophenyl ester |
Note. Carboxyl substituents will not normally appear as prefixes in the names of derivatives of amino acids or peptides, so the name of the group, its prefix name, given in column 1, is little used in naming compounds. Column 3 is therefore given to show how derivatives containing the group are named (by one of the alternative methods of 3AA-9.1).
See Addendum for substituents on a terminal amide group.
3AA-18.2. Principles of Symbolizing Substituent Groups and Reagents
Many reagents used in peptide and protein chemistry for modifying (often protecting) amino, carboxyl and side-chain groups in amino-acid residues have been designated by a variety of acronymic abbreviations, too numerous to list here. Extensive and indiscriminate use of such abbreviations is discouraged, especially when the accepted trivial name of the reagent is short, e.g. tosyl chloride, trityl chloride, etc.
It can be useful to symbolize a reagent in such a way that the group transferred retains its identity in a reaction, e.g.
For this reason Dns-Cl is usually preferred to DNS for dansyl chloride (although the full name is short enough for most textual use), and Dnp-F to the original FDNB for l-fluoro-2,4-dinitrobenzene, and similarly Nbs2 in place of DTNB for 3,3'-dithiobis(6-nitrobenzoic acid) (Ellman's reagent) and (PriO)2PO-F or Dip-F for diisopropyl fluorophosphate.
Symbols constructed from known elements are more readily understood than arbitrary abbreviations, e.g. Tos-Arg-OMe rather than TAME for tosylarginine methyl ester, and Tos-Phe-CH2Cl rather than TPCK for 'tosylphenylalanine chloromethyl ketone', a name incorrectly used for tosylphenylalanylchloromethane (3AA-10.2), but misleading because it erroneously specifies the carbonyl group twice.
The amino-acid symbols were developed for representing peptide sequences (3AA-16). Peptides containing bonds other than between C-1 and N-2 of adjacent residues are also easily represented (3AA-16 to 3AA-18). Examples:
Click here for "table free" view if the examples below are faulty.
Glycylglycine | Gly-Gly |
N-α-Glutamylglycine | Glu-Gly |
N-γ-Glutamylglycine | |
Thyroliberin | Glp-His-Pro-NH2 |
Angiotensin II | Asp-Arg-Val-Tyr-Ile-His-Pro-Phe |
Glutathione |
N2-α-Glutamyllysine | Glu-Lys |
N6-α-Glutamyllysine | |
N2-γ-Glutamyllysine | |
N6-γ-Glutamyllysine |
Symbols for modified residues or names of compounds may be used in such formulas. Thus a peptide with a C-terminal aldehyde may be shown using either a name or a symbol constructed according to 3AA-16.3. Example:
(If the second method is used, the symbol should be explained to avoid confusion.)
If part of a sequence is unknown, but its composition can be specified, this may be indicated by parentheses, with commas between the residues listed as present, e.g. Ala-Lys-(Ala,Gly3,Val2)-Glu-Val.
If a peptide must be written on more than one line, we advise placing a hyphen at the end of each line to be continued (where it has its usual meaning of a continuation symbol), and also at the start of the next line (where it represents the peptide bond), e.g.
In diagrams the two lines can usually be joined, as in but such a break may also be needed in textual material where this is not possible.
3AA-19.2. Use of Configurational Prefixes
Residue symbols written in a sequence denote the L configuration for chiral amino acids, unless otherwise indicated (3AA-14.5). A D residue is shown by inserting a D before the symbol, separated from it by a hyphen (which may be omitted to make the number of residues appear more clearly).
The symbol DL signifies a racemic mixture, so should not occur in the designation of peptides with more than one chiral residue; coupling of a DL-amino acid with a chiral peptide leads to a mixture of diastereoisomeric products whose ratio may depend on the conditions of the reaction and will not in general be unity. To indicate that both are present, ambo may be used (3AA-13.2), and thus the mixture of products formed by acylating L-leucine with DL-alanine may be represented as ambo-Ala-Leu, and a mixture of Phe-Ala-Leu and Phe-D-Ala-Leu may be represented as Phe-ambo-Ala-Leu.
A residue of unknown configuration may be indicated by the prefix ξ (Greek xi), e.g. ξ-Ala.
3AA-19.3. Representation of Charges on Peptides
It is usually convenient to use the same abbreviated formula for a peptide regardless of its state of ionization. To indicate or stress the charges on a peptide, plus and minus signs may be placed over residues with charged side chains and on either side of the formula to represent charged termini, e.g.
If, however, it is desired to indicate charge by formal modification of the symbols for residues, this may be done as follows.
(i) Protonation of the N-terminus. The sign +H is placed beside the symbol for the N-terminal residue without a hyphen between (since a hyphen would signify removal of H). This gives, for example, +HGly-. We prefer this to the alternative recommendation [10] of adding +H2-, to give, for example, +H2-Gly-, because it seems artificial to remove one hydrogen before adding two, and because the hyphen here fails to represent a single bond.
(ii) Deprotonation of the C-terminus. The symbol -O- is placed on the right of the C-terminal residue. Its hyphen signifies removal of -OH from the carboxyl group, so this is replaced by -O.
(iii) Protonation of Side-Chain basic groups. 'H+' is placed above the amino-acid symbol in the two-line representation, or after it, e.g. LysH+, in the one-line system. No lines or parentheses are used, since they would imply removal of H. In earlier [10] recommendations 'H2+, was added with a vertical line or parentheses, but again (cf. i) the line represented no single bond.
(iv) Deprotonation of Side-Chain Acidic Groups. The symbols Asp and Glu may have O- placed at the end of a vertical line above or below them, or in parentheses after them (cf. ii), since O- replaces the OH removed. Other acidic residues, e.g. Cys, have the charge alone at the end of the vertical line or in parentheses, since the group removed here is H.
Hence the two ionic forms shown above for a peptide could be drawn as
3AA-19.4. Peptides Substituted at N-2 (see 3AA-16.2 and 3AA-17.1)
Click here for "table free" view if the examples below are faulty.
Glycylnitrosoglycine | |
Glycylsarcosine (see Appendix) | |
Glycyl-N-acetylglycine | |
N,N-diglycylglycine |
3AA-19.5.1. Homodetic Cyclic Peptides
Cyclic peptides in which the ring consists solely of amino-acid residues in eupeptide linkage may be called homodetic cyclic peptides. Three representations are possible:
(i) The sequence is formulated in the usual manner but placed in parentheses and preceded by 'cyclo'. Example: gramicidin S
or (see 3AA-19.2, sentence 2)
cyclo(-Val-Orn-Leu-DPhe-Pro-Val-Orn-Leu-DPhe-Pro-)
(ii) The sequence is again written in one line, but the residues at each end of the line are joined by a lengthened bond, e.g.
or (3AA-19.2, sentence 2)
(iii) The residues are written on two lines, so that the sequence is reversed on one of them. Hence the CO to NH direction within the peptide bond must be indicated by arrows (3AA-16.2 and 3AA-16.3). Hence gramicidin S may be written (using the option of 3AA-19.2, sentence 2):
3AA-19.5.2. Heterodetic Cyclic Peptides
Heterodetic cyclic peptides are peptides consisting only of amino-acid residues, but the linkages forming the ring are not solely eupeptide bonds; one or more is an isopeptide, disulfide, ester, or other bond.
Their symbolic representation follows logically from that of substituted amino acids (3AA-16.4). Examples:
Cyclic ester of threonylglycylglycylglycine or (3AA-17.6)
Depsipeptides are oligomers formed from amino acids and other bifunctional acids, usually hydroxy acids. They are often cyclic. In symbolic representation, any special symbols used for the hydroxy acids should be defined.
Analogues of peptides in which the -CO-NH- group that joins residues is replaced by another grouping may be indicated [25] by placing a Greek psi, followed by the replacing group in parenthesis, between the residue symbols where the change occurs. Examples:
3AA-19.8. Alignment of Peptide and Nucleic-Acid Sequences
Although hyphens between residues are important in representing peptide sequences (3AA-16), they may be omitted (I) if it is necessary to align sequences with those of nucleic acids; this is an alternative to separating triplets (II):
MetSerIleGlnHis Met-Ser-Ile-Gln-His (I) AGTATGAGTATTCAACAT (II) AGT ATG AGT ATT CAA CAT TCATACTCATAAGTTGTA TCA TAC TCA TAA GTT GTA
7. International Union of Biochemistry (1978) Biochemical Nomenclature and Related Documents, The Biochemical Society, London.
10. IUPAC-IUB Commission on Biochemical Nomenclature (CBN), Symbols for Amino-Acid Derivatives and Peptides, Recommendations 1971, Arch. Biochem. Biophys. 150, 1-8 (1972); Biochem. J. 126, 773-780 (1972), corrected l35, 9 (1973); Biochemistry 11, 1726-1732 (1972); Biochim. Biophys. Acta, 263, 205-212 (1972); Eur. J. Biochem. 27, 201-207 (1972), corrected 45, 2 (1974); J. Biol. Chem. 247, 977-983 (1972); Pure Appl. Chem. 40, 315-331 (1974); also pp. 78-84 in [7].
14. International Union of Pure and Applied Chemistry (1979) Nomenclature of Organic Chemistry, Sections A, B, C, D, E, F and H, Pergamon Press, Oxford.
25. Morley, J. S. (1981) Neuropeptides, 1, 231-235.
Return to Amino Acids and Peptides home page.