The emergence of the genetic code. The concept of a gene, genetic code

The emergence of the genetic code.  The concept of a gene, genetic code
The emergence of the genetic code. The concept of a gene, genetic code

Genetic code, expressed in codons, is a system for encoding information about the structure of proteins, inherent in all living organisms on the planet. It took a decade to decipher it, but science understood that it existed for almost a century. Universality, specificity, unidirectionality, and especially the degeneracy of the genetic code are important biological significance.

History of discoveries

The problem of coding has always been key in biology. Science has moved rather slowly towards the matrix structure of the genetic code. Since the discovery of the double helical structure of DNA by J. Watson and F. Crick in 1953, the stage of unraveling the very structure of the code began, which prompted faith in the greatness of nature. Linear structure proteins and the same DNA structure implied the presence of a genetic code as a correspondence between two texts, but written using different alphabets. And if the alphabet of proteins was known, then the signs of DNA became the subject of study by biologists, physicists and mathematicians.

There is no point in describing all the steps in solving this riddle. A direct experiment that proved and confirmed that there is a clear and consistent correspondence between DNA codons and protein amino acids was carried out in 1964 by C. Janowski and S. Brenner. And then - the period of deciphering the genetic code in vitro (in a test tube) using protein synthesis techniques in cell-free structures.

The fully deciphered code of E. Coli was made public in 1966 at a symposium of biologists in Cold Spring Harbor (USA). Then the redundancy (degeneracy) of the genetic code was discovered. What this means is explained quite simply.

Decoding continues

Obtaining data on deciphering the hereditary code was one of the most significant events of the last century. Today, science continues to in-depth study the mechanisms of molecular encodings and its systemic features and excess of signs, which expresses the degeneracy property of the genetic code. A separate branch of study is the emergence and evolution of the system for coding hereditary material. Evidence of the connection between polynucleotides (DNA) and polypeptides (proteins) gave impetus to the development of molecular biology. And that, in turn, to biotechnology, bioengineering, discoveries in breeding and plant growing.

Dogmas and rules

The main dogma of molecular biology is that information is transferred from DNA to messenger RNA, and then from it to protein. In the opposite direction, transfer is possible from RNA to DNA and from RNA to another RNA.

But the matrix or basis always remains DNA. And all other fundamental features of information transmission are a reflection of this matrix nature of transmission. Namely, transfer through the synthesis of other molecules on the matrix, which will become the structure of reproduction hereditary information.

Genetic code

Linear coding of the structure of protein molecules is carried out using complementary codons (triplets) of nucleotides, of which there are only 4 (adeine, guanine, cytosine, thymine (uracil)), which spontaneously leads to the formation of another chain of nucleotides. Same number and the chemical complementarity of nucleotides is the main condition for such synthesis. But when a protein molecule is formed, there is no quality match between the quantity and quality of monomers (DNA nucleotides are protein amino acids). This is the natural hereditary code - a system for recording the sequence of amino acids in a protein in a sequence of nucleotides (codons).

The genetic code has several properties:

  • Tripletity.
  • Unambiguity.
  • Directionality.
  • Non-overlapping.
  • Redundancy (degeneracy) of the genetic code.
  • Versatility.

Let's give brief description, focusing on biological significance.

Triplety, continuity and the presence of stop signals

Each of the 61 amino acids corresponds to one sense triplet (triplet) of nucleotides. Three triplets do not carry amino acid information and are stop codons. Each nucleotide in the chain is part of a triplet and does not exist on its own. At the end and at the beginning of the chain of nucleotides responsible for one protein, there are stop codons. They start or stop translation (the synthesis of a protein molecule).

Specificity, non-overlap and unidirectionality

Each codon (triplet) codes for only one amino acid. Each triplet is independent of its neighbor and does not overlap. One nucleotide can be included in only one triplet in the chain. Protein synthesis always occurs in only one direction, which is regulated by stop codons.

Redundancy of the genetic code

Each triplet of nucleotides codes for one amino acid. There are 64 nucleotides in total, of which 61 encode amino acids (sense codons), and three are nonsense, that is, they do not encode an amino acid (stop codons). The redundancy (degeneracy) of the genetic code lies in the fact that in each triplet substitutions can be made - radical (lead to the replacement of an amino acid) and conservative (do not change the class of the amino acid). It is easy to calculate that if 9 substitutions can be made in a triplet (positions 1, 2 and 3), each nucleotide can be replaced by 4 - 1 = 3 other options, then total possible options There will be 61 9 nucleotide substitutions = 549.

The degeneracy of the genetic code is manifested in the fact that 549 variants are much more than are needed to encode information about 21 amino acids. Moreover, out of 549 variants, 23 substitutions will lead to the formation of stop codons, 134 + 230 substitutions are conservative, and 162 substitutions are radical.

Rule of degeneracy and exclusion

If two codons have two identical first nucleotides, and the remaining ones are represented by nucleotides of the same class (purine or pyrimidine), then they carry information about the same amino acid. This is the rule of degeneracy or redundancy of the genetic code. Two exceptions are AUA and UGA - the first encodes methionine, although it should be isoleucine, and the second is a stop codon, although it should encode tryptophan.

The meaning of degeneracy and universality

It is these two properties of the genetic code that have the greatest biological significance. All the properties listed above are characteristic of the hereditary information of all forms of living organisms on our planet.

The degeneracy of the genetic code has adaptive significance, like multiple duplication of the code for one amino acid. In addition, this means a decrease in significance (degeneration) of the third nucleotide in the codon. This option minimizes mutational damage in DNA, which will lead to gross disturbances in the structure of the protein. This defense mechanism living organisms on the planet.

- one system records of hereditary information in molecules nucleic acids as a sequence of nucleotides. The genetic code is based on the use of an alphabet consisting of only four letters-nucleotides, distinguished by nitrogenous bases: A, T, G, C.

The main properties of the genetic code are as follows:

1. The genetic code is triplet. A triplet (codon) is a sequence of three nucleotides encoding one amino acid. Since proteins contain 20 amino acids, it is obvious that each of them cannot be encoded by one nucleotide (since there are only four types of nucleotides in DNA, in this case 16 amino acids remain unencoded). Two nucleotides are also not enough to encode amino acids, since in this case only 16 amino acids can be encoded. Means, smallest number number of nucleotides encoding one amino acid is equal to three. (In this case, the number of possible nucleotide triplets is 4 3 = 64).

2. Redundancy (degeneracy) of the code is a consequence of its triplet nature and means that one amino acid can be encoded by several triplets (since there are 20 amino acids and 64 triplets). The exceptions are methionine and tryptophan, which are encoded by only one triplet. In addition, some triplets perform specific functions. So, in the mRNA molecule, three of them UAA, UAG, UGA are stop codons, i.e. stop signals that stop the synthesis of the polypeptide chain. The triplet corresponding to methionine (AUG), located at the beginning of the DNA chain, does not code for an amino acid, but performs the function of initiating (exciting) reading.

3. Along with redundancy, the code is characterized by the property of unambiguity, which means that each codon corresponds to only one specific amino acid.

4. The code is collinear, i.e. the sequence of nucleotides in a gene exactly matches the sequence of amino acids in a protein.

5. The genetic code is non-overlapping and compact, that is, it does not contain “punctuation marks.” This means that the reading process does not allow for the possibility of overlapping columns (triplets), and, starting at a certain codon, reading proceeds continuously, triplet after triplet, until the stop signals (termination codons). For example, in mRNA the following sequence of nitrogenous bases AUGGGUGTSUUAAUGUG will be read only by such triplets: AUG, GUG, TSUU, AAU, GUG, and not AUG, UGG, GGU, GUG, etc. or AUG, GGU, UGC, CUU, etc. etc. or in some other way (for example, codon AUG, punctuation mark G, codon UGC, punctuation mark U, etc.).

6. The genetic code is universal, that is, the nuclear genes of all organisms encode information about proteins in the same way, regardless of the level of organization and systematic position of these organisms.

Thanks to the process of transcription in the cell, information is transferred from DNA to protein: DNA - mRNA - protein. The genetic information contained in DNA and mRNA is contained in the sequence of nucleotides in the molecules. How is information transferred from the “language” of nucleotides to the “language” of amino acids? This translation is carried out using the genetic code. A code, or cipher, is a system of symbols for translating one form of information into another. The genetic code is a system for recording information about the sequence of amino acids in proteins using the sequence of nucleotides in messenger RNA. How important exactly the sequence of arrangement of the same elements (four nucleotides in RNA) is for understanding and preserving the meaning of information can be seen in a simple example: by rearranging the letters in the word code, we get a word with a different meaning - doc. What properties does the genetic code have?

1. The code is triplet. RNA consists of 4 nucleotides: A, G, C, U. If we tried to designate one amino acid with one nucleotide, then 16 out of 20 amino acids would remain unencrypted. A two-letter code would encrypt 16 amino acids (four nucleotides can be used to create 16 different combinations, each containing two nucleotides). Nature has created a three-letter, or triplet, code. This means that each of the 20 amino acids is encoded by a sequence of three nucleotides, called a triplet or codon. From 4 nucleotides you can create 64 different combinations of 3 nucleotides each (4*4*4=64). This is more than enough to encode 20 amino acids and, it would seem, 44 codons are superfluous. However, it is not.

2. The code is degenerate. This means that each amino acid is encrypted by more than one codon (from two to six). The exceptions are the amino acids methionine and tryptophan, each of which is encoded by only one triplet. (This can be seen in the genetic code table.) The fact that methionine is encoded by a single OUT triplet has a special meaning that will become clear to you later (16).

3. The code is unambiguous. Each codon codes for only one amino acid. In all healthy people, in the gene carrying information about the beta chain of hemoglobin, the triplet GAA or GAG, I in sixth place, encodes glutamic acid. In patients with sickle cell anemia, the second nucleotide in this triplet is replaced by U. As can be seen from the table, the triplets GUA or GUG, which are formed in this case, encode the amino acid valine. You already know what such a replacement leads to from the section on DNA.

4. There are “punctuation marks” between genes. In printed text there is a period at the end of each phrase. Several related phrases make up a paragraph. On the tongue genetic information such a paragraph is an operon and its complementary mRNA. Each gene in the operon encodes one polypeptide chain - a phrase. Since in some cases several different polypeptide chains are sequentially created from the mRNA matrix, they must be separated from each other. For this purpose, there are three special triplets in the genetic code - UAA, UAG, UGA, each of which indicates the termination of the synthesis of one polypeptide chain. Thus, these triplets function as punctuation marks. They are found at the end of every gene. There are no "punctuation marks" inside the gene. Since the genetic code is similar to a language, let us analyze this property using the example of a phrase composed of triplets: once upon a time there was a quiet cat, that cat was dear to me. The meaning of what is written is clear, despite the absence of punctuation marks. If we remove one letter in the first word (one nucleotide in the gene), but also read in triplets of letters, then the result will be nonsense: ilb ylk ott ilb yls erm ilm no otk Violation of the meaning also occurs when one or two nucleotides are lost from a gene. The protein that will be read from such a damaged gene will have nothing in common with the protein that was encoded by the normal gene.

6. The code is universal. The genetic code is the same for all creatures living on Earth. In bacteria and fungi, wheat and cotton, fish and worms, frogs and humans, the same triplets encode the same amino acids.

The genetic code is usually understood as a system of signs indicating the sequential arrangement of nucleotide compounds in DNA and RNA, which corresponds to another sign system displaying the sequence of amino acid compounds in a protein molecule.

It is important!

When scientists managed to study the properties of the genetic code, universality was recognized as one of the main ones. Yes, strange as it may sound, everything is united by one, universal, common genetic code. It was formed over a long period of time, and the process ended about 3.5 billion years ago. Consequently, traces of its evolution can be traced in the structure of the code, from its inception to the present day.

When we talk about the sequence of arrangement of elements in the genetic code, we mean that it is far from chaotic, but has a strictly defined order. And this also largely determines the properties of the genetic code. This is equivalent to the arrangement of letters and syllables in words. Once we break the usual order, most of what we read on the pages of books or newspapers will turn into ridiculous gobbledygook.

Basic properties of the genetic code

Usually the code contains some information encrypted in a special way. In order to decipher the code, you need to know distinctive features.

So, the main properties of the genetic code are:

  • triplicity;
  • degeneracy or redundancy;
  • unambiguity;
  • continuity;
  • the versatility already mentioned above.

Let's take a closer look at each property.

1. Triplety

This is when three nucleotide compounds form a sequential chain within a molecule (i.e. DNA or RNA). As a result, a triplet compound is created or encodes one of the amino acids, its location in the peptide chain.

Codons (they are also code words!) are distinguished by their sequence of connections and by the type of those nitrogenous compounds (nucleotides) that are part of them.

In genetics, it is customary to distinguish 64 codon types. They can form combinations of four types 3 nucleotides each. This is equivalent to raising the number 4 to the third power. Thus, the formation of 64 nucleotide combinations is possible.

2. Redundancy of the genetic code

This property is observed when several codons are required to encrypt one amino acid, usually in the range of 2-6. And only tryptophan can be encoded using one triplet.

3. Unambiguity

It is included in the properties of the genetic code as an indicator of healthy genetic inheritance. For example, about good condition blood, the GAA triplet, which is in sixth place in the chain, can tell doctors about normal hemoglobin. It is he who carries information about hemoglobin, and it is also encoded by it. And if a person has anemia, one of the nucleotides is replaced by another letter of the code - U, which is a signal of the disease.

4. Continuity

When recording this property of the genetic code, it should be remembered that codons, like links in a chain, are located not at a distance, but in direct proximity, one after another in the nucleic acid chain, and this chain is not interrupted - it has no beginning or end.

5. Versatility

We should never forget that everything on Earth is united by a common genetic code. And therefore, in primates and humans, in insects and birds, in a hundred-year-old baobab tree and in a blade of grass that barely emerges from the ground, similar triplets encode similar amino acids.

It is in genes that the basic information about the properties of a particular organism is contained, a kind of program that the organism inherits from those who lived earlier and which exists as a genetic code.

They line up in chains and thus produce sequences of genetic letters.

Genetic code

The proteins of almost all living organisms are built from only 20 types of amino acids. These amino acids are called canonical. Each protein is a chain or several chains of amino acids connected in a strictly defined sequence. This sequence determines the structure of the protein, and therefore all its biological properties.

C

CUU (Leu/L)Leucine
CUC (Leu/L)Leucine
CUA (Leu/L)Leucine
CUG (Leu/L)Leucine

In some proteins, non-standard amino acids, such as selenocysteine ​​and pyrrolysine, are inserted by a ribosome reading the stop codon, depending on the sequences in the mRNA. Selenocysteine ​​is now considered to be the 21st, and pyrrolysine the 22nd, amino acids that make up proteins.

Despite these exceptions, all living organisms have a genetic code common features: a codon consists of three nucleotides, where the first two are decisive; codons are translated by tRNA and ribosomes into a sequence of amino acids.

Deviations from the standard genetic code.
Example Codon Normal meaning Reads like:
Some types of yeast Candida C.U.G. Leucine Serin
Mitochondria, in particular in Saccharomyces cerevisiae CU(U, C, A, G) Leucine Serin
Mitochondria of higher plants CGG Arginine Tryptophan
Mitochondria (in all studied organisms without exception) U.G.A. Stop Tryptophan
Mitochondria in mammals, Drosophila, S. cerevisiae and many protozoa AUA Isoleucine Methionine = Start
Prokaryotes G.U.G. Valin Start
Eukaryotes (rare) C.U.G. Leucine Start
Eukaryotes (rare) G.U.G. Valin Start
Prokaryotes (rare) UUG Leucine Start
Eukaryotes (rare) A.C.G. Threonine Start
Mammalian mitochondria AGC, AGU Serin Stop
Drosophila mitochondria A.G.A. Arginine Stop
Mammalian mitochondria AG(A, G) Arginine Stop

History of ideas about the genetic code

However, in the early 60s of the 20th century, new data revealed the inconsistency of the “code without commas” hypothesis. Then experiments showed that codons, considered meaningless by Crick, could provoke protein synthesis in vitro, and by 1965 the meaning of all 64 triplets was established. It turned out that some codons are simply redundant, that is whole line amino acids are encoded by two, four or even six triplets.

see also

Notes

  1. Genetic code supports targeted insertion of two amino acids by one codon. Turanov AA, Lobanov AV, Fomenko DE, Morrison HG, Sogin ML, Klobutcher LA, Hatfield DL, Gladyshev VN. Science. 2009 Jan 9;323(5911):259-61.
  2. The AUG codon encodes methionine, but at the same time serves as a start codon - translation usually begins with the first AUG codon of mRNA.
  3. NCBI: "The Genetic Codes", Compiled by Andrzej (Anjay) Elzanowski and Jim Ostell
  4. Jukes TH, Osawa S, The genetic code in mitochondria and chloroplasts., Experience. 1990 Dec 1;46(11-12):1117-26.
  5. Osawa S, Jukes TH, Watanabe K, Muto A (March 1992). "Recent evidence for evolution of the genetic code." Microbiol. Rev. 56 (1): 229–64. PMID 1579111.
  6. SANGER F. (1952). "The arrangement of amino acids in proteins." Adv Protein Chem. 7 : 1-67. PMID 14933251.
  7. M. Ichas Biological code. - World, 1971.
  8. WATSON JD, CRICK FH. (April 1953). “Molecular structure of nucleic acids; a structure for deoxyribose nucleic acid." Nature 171 : 737-738. PMID 13054692.
  9. WATSON JD, CRICK FH. (May 1953). "Genetic implications of the structure of deoxyribonucleic acid." Nature 171 : 964-967. PMID 13063483.
  10. Crick FH. (April 1966). “The genetic code - yesterday, today, and tomorrow.” Cold Spring Harb Symp Quant Biol.: 1-9. PMID 5237190.
  11. G. GAMOW (February 1954). "Possible Relation between Deoxyribonucleic Acid and Protein Structures." Nature 173 : 318. DOI:10.1038/173318a0. PMID 13882203.
  12. GAMOW G, RICH A, YCAS M. (1956). "The problem of information transfer from the nucleic acids to proteins." Adv Biol Med Phys. 4 : 23-68. PMID 13354508.
  13. Gamow G, Ycas M. (1955). "STATISTICAL CORRELATION OF PROTEIN AND RIBONUCLEIC ACID COMPOSITION. " Proc Natl Acad Sci U S A. 41 : 1011-1019. PMID 16589789.
  14. Crick FH, Griffith JS, Orgel LE. (1957). “CODES WITHOUT COMMAS. " Proc Natl Acad Sci U S A. 43 : 416-421. PMID 16590032.
  15. Hayes B. (1998). "The Invention of the Genetic Code." (PDF reprint). American Scientist 86 : 8-14.

Literature

  • Azimov A. Genetic code. From the theory of evolution to deciphering DNA. - M.: Tsentrpoligraf, 2006. - 208 pp. - ISBN 5-9524-2230-6.
  • Ratner V. A. Genetic code as a system - Soros educational journal, 2000, 6, No. 3, pp. 17-22.
  • Crick FH, Barnett L, Brenner S, Watts-Tobin RJ. General nature of the genetic code for proteins - Nature, 1961 (192), pp. 1227-32

Links

  • Genetic code- article from the Great Soviet Encyclopedia

Wikimedia Foundation. 2010.