Central Dogma of Molecular Biology: DNA, RNA, and Protein Interactions | Lecture notes Biogenetics and Computers

Central dogma of molecular biology

The term “central dogma of molecular biology” is patterned after religious

terminology. However, it refers to a process that is subject to the changes in

understanding that are associated with any scientific research. The most simplified

form of the central dogma is that the flow of information is from DNA Æ RNA

Æ Protein. This concept has been subject to alterations as our understanding of the

processes involved has changed.

The processes involved are transcription and translation. It was understood at the

time that some modifications to this pathway were necessary. The most obvious is

that DNA is used as the template for DNA replication.

More recently, RNA viruses, in which DNA is never involved in the life cycle, have

been discovered. Some of these are retroviruses, in which RNA is used as a template

for DNA synthesis in a process called “reverse transcription”.

Other modifications to the simple scheme of information flow are proteins act as

gene transcriptional regulators, the discovery that some information is stored in

methylation patterns of the DNA, and the discovery of prions (proteins that can

transmit information to other proteins).

In addition, the existence of introns and exons means that the information stored in

the DNA is not always reflected in the mRNA and protein products. A gene is

stretch of DNA containing both a template for RNA synthesis and sequences that

allow the control of RNA production from the template region. However, in many

cases, more than one protein can be produced from a DNA sequence, and the coding

sequence is not necessarily linearly contiguous within the DNA.

More unusual exceptions to the central dogma include the process for expressing

the ApoB gene in humans. The human liver expresses the full length ApoB protein;

however, the human intestines mutate a C to U in the ApoB mRNA to create a stop

codon, and therefore synthesize a shorter protein product although the DNA is not

affected. Trypanosomes (the parasitic organism responsible for sleeping sickness)

insert additional U nucleotides into some of their mRNA to produce proteins that

are not directly coded by the DNA.

How big is DNA?

The amount of DNA required to provide the genetic information for an organism

varies fairly dramatically depending on the organism. DNA molecules are usually

described in terms of the number of paired monomer units in the double stranded

helix, with these units called base-pairs (abbreviated bp). Because DNA molecules

tend to be large, larger units, such as kb, for kilobase pairs, or Mb for megabase

pairs are also frequently used. More recently, the term Gb, for gigabase pair has

been used to refer to the cellular DNA content of higher eukaryotes.

Genome sizes vary over a wide range. Viruses tend to have small genomes, from 5 to

200 kb, but viruses are not free living organisms. Other genomes are listed below.

Organisms with gene counts listed have had their genomes sequenced, although

Partial preview of the text

Download Central Dogma of Molecular Biology: DNA, RNA, and Protein Interactions and more Lecture notes Biogenetics and Computers in PDF only on Docsity!

Central dogma of molecular biology The term “central dogma of molecular biology” is patterned after religious terminology. However, it refers to a process that is subject to the changes in understanding that are associated with any scientific research. The most simplified form of the central dogma is that the flow of information is from DNA Æ RNA Æ Protein. This concept has been subject to alterations as our understanding of the processes involved has changed. The processes involved are transcription and translation. It was understood at the time that some modifications to this pathway were necessary. The most obvious is that DNA is used as the template for DNA replication. More recently, RNA viruses, in which DNA is never involved in the life cycle, have been discovered. Some of these are retroviruses, in which RNA is used as a template for DNA synthesis in a process called “reverse transcription”. Other modifications to the simple scheme of information flow are proteins act as gene transcriptional regulators, the discovery that some information is stored in methylation patterns of the DNA, and the discovery of prions (proteins that can transmit information to other proteins). In addition, the existence of introns and exons means that the information stored in the DNA is not always reflected in the mRNA and protein products. A gene is stretch of DNA containing both a template for RNA synthesis and sequences that allow the control of RNA production from the template region. However, in many cases, more than one protein can be produced from a DNA sequence, and the coding sequence is not necessarily linearly contiguous within the DNA. More unusual exceptions to the central dogma include the process for expressing the ApoB gene in humans. The human liver expresses the full length ApoB protein; however, the human intestines mutate a C to U in the ApoB mRNA to create a stop codon, and therefore synthesize a shorter protein product although the DNA is not affected. Trypanosomes (the parasitic organism responsible for sleeping sickness) insert additional U nucleotides into some of their mRNA to produce proteins that are not directly coded by the DNA. How big is DNA? The amount of DNA required to provide the genetic information for an organism varies fairly dramatically depending on the organism. DNA molecules are usually described in terms of the number of paired monomer units in the double stranded helix, with these units called base-pairs (abbreviated bp). Because DNA molecules tend to be large, larger units, such as kb, for kilobase pairs, or Mb for megabase pairs are also frequently used. More recently, the term Gb, for gigabase pair has been used to refer to the cellular DNA content of higher eukaryotes. Genome sizes vary over a wide range. Viruses tend to have small genomes, from 5 to 200 kb, but viruses are not free living organisms. Other genomes are listed below. Organisms with gene counts listed have had their genomes sequenced, although

some of the genome projects are incomplete. The number of genes and the genome sizes for these organisms are subject to revision as more information becomes available. Note that genome size is not necessarily related to complexity of the organism; the organism with the largest genome in this table is a single-celled organism. However, prokaryotes always have smaller genomes than eukaryotes (the reasons for this are discussed below in the section on replication). Species Type of organism Genome size (bp) Genes Haemophilus influenzae pathogenic bacterium 1,830,138 1740 Escherichia coli enteric bacterium 4,639,221 4, Escherichia coli O157:H7 pathogenic variant 5,440,000 5, Saccharomyces cerevisiae baker’s yeast 12,067,280 6034 Arabidopsis thaliana smallest plant genome (the plant is a weed)

Drosophila melanogaster fruit fly 180,000,000 13, Caenorhabditis elegans nematode worm 100,000,000 19, Gallus gallus chicken 1,200,000,000? Mus musculus mouse 3,454,200,000? Pan troglodytes chimpanzee 3,600,000,000? Homo sapiens human 3,400,000,000 30, to 45, Pinus resinosa pine tree 68,000,000,000? Amoeba dubia amoeba 670,000,000,000? Note: for most of the eukaryotic organisms, the genome size list corresponds to the haploid genome; most eukaryotic cells are diploid and have twice this amount of nuclear DNA. The amoeba may have a polyploid genome and probably has a smaller amount of unique DNA sequence. Humans have 46 chromosomes, and therefore the size of the average human chromosome is ~145 million base pairs. Clearly these molecules are much larger than the chromosomes from the bacterial organisms. DNA molecules are the largest biological molecules. A very large protein has a molecular weight of ~10^6. By comparison the E. coli chromosome, a moderately small DNA molecule, has a molecular weight of ~3 x 10^9. The process of DNA synthesis Proper replication requires a large number of different enzymes. One obvious enzyme is the DNA polymerase that actually incorporates the new DNA molecules. Most organisms have a number of DNA polymerases. The majority of these enzymes are used to proofread the newly synthesized DNA, and to repair mistakes incorporated during synthesis or as a result of damage to the DNA. One of the DNA polymerase types, called DNA polymerase III in E. coli , is the specialized replication polymerase. The replication polymerase is highly processive : it is capable of synthesizing >500,000 bases without dissociating from the template.

bases, predominantly the N^6 of adenines and the 5-carbon of cytidines. The methylation allows the DNA synthesis machinery to differentiate the old strand and new strands. The methylation occurs at positions in these bases that does not interfere with proper base pairing. N N

NH 2

CH 3

O

R ibose

N

N N

N

H H

Ribose

N

N N

N

H CH 3

Ribose

N N

O

H

C H 3

O

Ribose N N

N

O

Ribose

H

N

N N

N

O

Ribose

H

N

H

5-methylcytidine N6-methyladenosine Thymidine Adenosine Cytidine Guanosine Topoisomerases DNA coiling must be controlled. Relaxed DNA forms the standard double helix. Separating the strands tightens the coiling of the DNA that has not yet been replicated. In addition, the cell often needs to change the coiling of the DNA. Replication of circular DNA results in interlocking rings, which must be separated in order to allow the daughter cells to each have a copy. Finally, DNA can “tangle” – anyone ever working with thread or string knows about tangles, and DNA molecules are, in effect, very, very long strings. These problems are solved by topoisomerases , enzyme that can cleave DNA and alter the tightness of the winding, and can pass one strand through another. Eukaryotic DNA replication Eukaryotic cells are much more complex than prokaryotic cells, and contain much more DNA. Human cells contain more than 1000 times more DNA than E. coli. The polymerase responsible for human DNA replication is about 10-fold slower than the E. coli enzyme. Replication must copy the entire genome once and only once. Prokaryotic organisms manage this by having a single replication origin on each DNA molecule. However,

to allow replication to occur in a reasonable amount of time ( i.e. the 4-5 hours of S phase), eukaryotes must simultaneously replicate DNA at many locations in their genome. Cell cycle control proteins coordinate the initiation of replication; this still leaves the potential problem of missing some regions. One mechanism for checking is DNA methylation, but the precise mechanism that the cell uses to copy every base exactly one time for each cell division is incompletely understood. The actual replication process ( i.e. DNA strand separation and polymerization, with a leading and lagging strand) is similar in most respects to the process used in prokaryotic cells. The enzymes are somewhat different, but eukaryotes use the same general procedure. Humans do not have exact analogs of the E. coli DNA polymerase III; the mammalian replication polymerase is a large complex with at least three separate polymerases: pol a, pol d, and pol e. Pol d is the major highly processive polymerase, but all three proteins are involved in mammalian DNA replication. Structural analysis of the DNA replication polymerases indicates that the polymerases form a circular structure that completely surrounds the DNA strand. This structure prevents the polymerase from dissociating from the DNA unless the DNA strand is broken or the polymerase complex is disrupted. Telomeres A major difference between prokaryotic and eukaryotic replication occurs at the ends of the chromosomes. In contrast to circular prokaryotic genomes, eukaryotic chromosomes are linear molecules. This means that it is difficult to synthesize the lagging strand at the end of the chromosome, because there is no place to put the primer required to initiate Okazaki fragment synthesis. This means that each eukaryotic cell division results in chromosomes that are slightly (50-100 bp) shorter than those in the parent cell). The ends of the chromosomes are called telomeres ; telomeres are repeats of 6 bp sequences of DNA (TTAGGG). At least in part as a result of the shortening of the telomeres, most eukaryotic cells are only capable of dividing a limited number of times. Eventually, the telomeres become too short, and the cell is no longer capable of cell division. (This is not, however, the only method for controlling cessation of cell division.) Some cells must continue dividing indefinitely. These cells avoid problems with telomere shortening by expressing an enzyme called telomerase. Telomerase is a reverse transcriptase-like enzyme. It contains both RNA and protein subunits; the RNA acts as a template for the synthesis of the telomere 6 bp repeat, while the protein contains the catalytic activity. Telomerase is capable of lengthening the telomeres, and therefore preventing damage to the ends of the chromosomes.

Another part of the reason is that mutations that occur within important regions may kill one cell, but that cell can (at least in most tissues) be readily replaced. In addition, many cells have redundant pathways; if one gene is inactivated, others may be able to substitute for it.

Central Dogma of Molecular Biology: DNA, RNA, and Protein Interactions, Lecture notes of Biogenetics and Computers

Related documents

Partial preview of the text

Download Central Dogma of Molecular Biology: DNA, RNA, and Protein Interactions and more Lecture notes Biogenetics and Computers in PDF only on Docsity!

NH 2

CH 3

O

N

N N

N

N

H H

N

N N

N

N

H CH 3

N N

O

H

C H 3

O

N

O

H

H

N

N N

N

O

H

N

H

H