Gene Expression: Transcription and Translation, Study notes of Genetics

An overview of gene expression, focusing on the processes of transcription and translation. It explains the central dogma of molecular biology, detailing how dna's genetic information is transcribed into mrna and then translated into proteins. The roles of mrna, rrna, and trna, as well as the function of rna polymerase and promoters. It also discusses post-transcriptional modifications and the coordination between transcription and translation in prokaryotes. This material is suitable for university students studying molecular biology, genetics, or biochemistry. 465 characters long.

Typology: Study notes

Pre 2010

Uploaded on 07/20/2025

nguyenyenbinh92
nguyenyenbinh92 🇬🇧

3 documents

1 / 20

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Chapter 3 Expression of Genes
03
CSLS / THE UNIVERSITY OF TOKYO 47
Part I
Relationship between Cells and Genetic Information
Chapter 3
Expression of Genes
A protein gene is a piece of DNA that determines the amino acid sequence of a
protein, and the synthesis of a protein based on genetic information is called
gene expression. Specifically, genetic information refers to the nucleotide (base)
sequence of DNA strands; mRNA is synthesized using DNA as a template, by
which the genetic information of the DNA is transcribed as the sequence of the
mRNA. The base sequence of mRNA is defined as a series of genetic codes, and
such codes in mRNA are used to synthesize proteins on cytoplasmic granules
called ribosomes. One piece of mRNA code corresponds to one amino acid,
and these amino acids are linked together following the order of the codes, thus
synthesizing proteins. Protein synthesis is called translation, since the mechanism
can be compared to a translation from information in one language (i.e., a base
sequence) to that in another language (i.e., an amino acid sequence).
I. Transcription and Translation of Genes
Central Dogma
The genetic information of a protein specifically refers to the information that
determines its primar y structure (i.e., the amino acid sequence) and, at the
substance level, to the nucleotide sequence (the base sequence) of DNA. The
genetic information of DNA is copied to mRNA (messenger RNA; see part II of
this chapter) molecules synthesized using DNA as a template and is
consequently converted to the amino acid sequence of a protein. The concept of
genetic information flowing in one direction from DNA to mRNA to proteins is
called the central dogma of molecular biology (Fig. 3-1). This concept is a basic
principle common to all organisms both prokaryotes and eukaryotes – including
bacteria and humans. mRNA synthesis means the transcription of the genetic
information in DNA (the base sequence) to the base sequence of RNA, while
protein synthesis refers to the translation of information in one language (the
sequence of mRNA) into that of another (the amino acid sequence) (Fig. 3-2).
Figure 3-1 Central dogma
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14

Partial preview of the text

Download Gene Expression: Transcription and Translation and more Study notes Genetics in PDF only on Docsity!

Part I Relationship between Cells and Genetic Information

Chapter 3

Expression of Genes

A protein gene is a piece of DNA that determines the amino acid sequence of a protein, and the synthesis of a protein based on genetic information is called gene expression. Specifically, genetic information refers to the nucleotide (base) sequence of DNA strands; mRNA is synthesized using DNA as a template, by which the genetic information of the DNA is transcribed as the sequence of the mRNA. The base sequence of mRNA is defined as a series of genetic codes, and such codes in mRNA are used to synthesize proteins on cytoplasmic granules called ribosomes. One piece of mRNA code corresponds to one amino acid, and these amino acids are linked together following the order of the codes, thus synthesizing proteins. Protein synthesis is called translation, since the mechanism can be compared to a translation from information in one language (i.e., a base sequence) to that in another language (i.e., an amino acid sequence).

I. Transcription and Translation of Genes

Central Dogma

The genetic information of a protein specifically refers to the information that determines its primary structure (i.e., the amino acid sequence) and, at the substance level, to the nucleotide sequence (the base sequence) of DNA. The genetic information of DNA is copied to mRNA (messenger RNA; see part II of this chapter) – molecules synthesized using DNA as a template – and is consequently converted to the amino acid sequence of a protein. The concept of genetic information flowing in one direction from DNA to mRNA to proteins is called the central dogma of molecular biology (Fig. 3-1). This concept is a basic principle common to all organisms – both prokaryotes and eukaryotes – including bacteria and humans. mRNA synthesis means the transcription of the genetic information in DNA (the base sequence) to the base sequence of RNA, while protein synthesis refers to the translation of information in one language (the sequence of mRNA) into that of another (the amino acid sequence) (Fig. 3-2). Figure 3-1^ Central dogma

Genetic Codes

Specifically, the genetic information of DNA is its base sequence. On the other hand, the genetic code is defined as the base sequence of mRNA transcribed using DNA as a template, and a particular three-base sequence known as a codon corresponds to one amino acid. There are 4 3 = 64 codons, encoding 20 amino acids (Fig. 3-2). As an example, 5’-AUG-3’ – a code of mRNA – corresponds to the amino acid methionine (Fig. 3-3). A protein consisting of 400 amino acids linked together is derived from 1,200 DNA bases and 1,200 mRNA bases. Here, the 1,200-base section of the entire DNA is the gene for this protein. AUG encodes methionine in addition to serving as the initiation codon for protein synthesis. Following the determination of the first amino acid, the next three-base sequence determines the next amino acid, and so on. Figure 3-2 includes three termination codons. When protein synthesis proceeds to the termination codon (which does not encode any amino acids), the protein synthesis is terminated. The region between the initiation codon and the termination codon is called the coding region.

Sense Strand of DNA

Of the double strands of DNA, the one complementary to the template strand for RNA synthesis is called the sense strand (Fig. 3-3). The base sequence of mRNA can be obtained by replacing the Ts in the sense strand with Us. Codons on the sense strand are almost the same as those on mRNA, e.g., ATG on the sense

Figure 3-2 Genetic code table Figure 3-3 Genes and genetic information

❖ mRNA

mRNA (messenger RNA) transcribes the genetic information for the primary structure of a protein and carries the information to the protein synthesis system. Types of mRNA are as numerous as those of genes, and since the size of proteins varies, the range of mRNA sizes also varies greatly. mRNA makes up less than 1% of the total amount of RNA in a cell.

❖ rRNA

rRNA (ribosomal RNA) in prokaryotes consists of the three types of 5S, 16S and 23S*1 (Table 3-1), while that in eukaryotes includes 5S, 5.5S, 18S and 28S. Approximately 95% of the RNA found in a cell is rRNA, and it forms complexes called ribosomes with many proteins. Ribosomes function as sites for protein synthesis.

❖ tRNA

tRNA (transfer RNA) is a small type of RNA with a size of around 4S, consisting of less than 100 nucleotides. There are 40 to 50 known types, which represent

S: The Svedberg unit, which describes the rate of sedimentation by ultracentrifuge. Although higher molecular weights result in higher S values, there is no linear relationship between the molecular weight and the S value (e.g., a doubling of the molecular weight does not mean a doubled S value).

Figure 3-4 Roles of the three RNA types

approximately 5% of RNA overall. During protein synthesis, they bind to amino acids and carry them to the site of protein synthesis. A particular tRNA binds to a particular amino acid; for example, tRNA bound with phenylalanine is denoted as tRNAPhe, and tRNA bound with methionine is denoted as tRNAMet.

Cells contain these RNA types, and RNA not translated into the primary structure of a protein (i.e., that other than mRNA) is collectively referred to as non-coding RNA. Although the regulation characteristics of RNA synthesis vary among RNA types, the basic method of synthesis is common to all RNA types.

Characteristics of Transcription

In DNA synthesis, the entire sequence of the parental DNA strand is accurately copied from one end to the other, and the entire DNA region is passed on from the parent cell to the daughter cells. On the other hand, RNA transcription occurs for gene regions only, rather than for the whole DNA (Fig. 3-5). The DNA region shown in Figure 3-5A has five genes ( a to e ), meaning that five mRNAs are synthesized. Genes c and d in the figure show that the other DNA strand is being read in the reverse direction. In fact, RNA transcription occurs on sections containing information for the amino acid sequence (i.e., the coding regions) as well as the extra portions on both sides of the sections (Fig. 3-5B). A promoter is a DNA region to which RNA polymerase attaches (discussed later).

(A)

Figure 3-5 Transcription of RNA

(B)

❖ Binding of Polymerase to a Promoter

Promoter regions in eukaryotes (Fig. 3-6) include unique base sequences recognized by general transcription factors3 (proteins that promote transcription) such as TATA boxes and CCAAT boxes4. Prokaryotes have several types of protein called σ-factors that promote the binding of RNA polymerase to a particular promoter. The processes generally referred to as recognition and binding mean that a protein and a DNA molecule come close and, if their surface structures fit, connect with each other. Eukaryotes have a more complex mechanism with a higher number of gene types and many kinds of promoter sequence; however, the basic mechanism of eukaryotes and prokaryotes is similar in that both have frequently used basic promoter sequences to which transcription factors bind, thereby recruiting RNA polymerase.

❖ Roles of Promoters and the Initiation of Transcription

An important role of promoters is to determine the binding location and direction of RNA polymerase. Since RNA is synthesized in the 5’ to 3’ direction, the template DNA strand is read by RNA polymerase in the 3’ to 5’ direction. The basal transcription factors and RNA polymerase complex bound to DNA separates the DNA double strands, initiating RNA synthesis.

❖ Elongation of Transcription

The 5’-triphosphate of the first nucleotide in the synthesized RNA strand stays connected, and the 5’ end of the RNA is either pppA or pppG. The basal transcription factors involved in the binding of RNA polymerase do not move with the enzyme, and only the polymerase moves on DNA. The RNA strand synthesized is immediately released from DNA, and the two unwound DNA strands reform their original double strand on completion of RNA synthesis.

Figure 3-6 Structure of a promoter

Basal transcription factors: Proteins needed when RNA polymerase binds to a promoter (transelements). These factors bind to a particular sequence on the promoter, which recruits RNA polymerase to DNA, thereby initiating RNA synthesis.

TATA and CCAAT boxes: DNA sequences in eukaryotes that are necessary when basal transcription factors bind to DNA. TATA boxes have the sequence TATAAA, while CCAAT boxes have the sequence GGCCAATCT, and transcription factors that recognize one of the two boxes exist. Many other sequences also exist.

Until very recently, it had been commonly thought that only genes (rather than other regions) were transcribed from the genome DNA. This idea is correct for prokaryotes, since their genome DNA mostly consists of genes. Although one human cell contains approximately 1,000 times as much DNA as E. coli , humans have only five times as many genes as E. coli ; genes represent only a small portion of genome DNA in humans. However, it was recently reported as a major revelation that most of the genome of eukaryotes is transcribed. According to a paper published in Science magazine in September 2005, a comprehensive analysis of transcription products in mice, in which the transcription origin of 4.5 million RNA molecules was investigated, showed that they were transcribed from 70 % of the entire DNA. It was surprising that so many RNA molecules were transcribed from DNA regions previously not thought to be genes; if this was the case in mice, it would also hold true for humans. This RNA is believed to be non-coding RNA functioning as expression regulation RNA.

❖ Termination of Transcription

A DNA sequence that signals the termination of transcription in prokaryotes is called a terminator. A number of RNA dissociation mechanisms are known, such as synthesized RNA that forms a double-stranded shape (or hairpin structure) within itself, thereby detaching from the template DNA. The termination mechanism in eukaryotes is not clearly understood.

❖ Genes for rRNA and tRNA

The number of functioning (transcribed) genes in E. coli is over 2,000, a figure believed to be much higher in humans. Based on the information of mRNA transcribed from these genes, proteins – gene products – are all synthesized on ribosomes. There must therefore be a large number of protein synthesis systems in order to deal with the translation of the numerous mRNA molecules generated by all genes. This requires a large number of rRNA and tRNA molecules within cells. These molecules are therefore actively transcribed, and there are many genes for them. It can be said that the genes have been amplified; this is a mechanism with finality.

Column (^) The Possible Existence of More Non-coding RNA in Eukaryotes

Base Modification

After the formation of an RNA strand, rRNA and tRNA undergo base modification. mRNA also undergoes base modification, but to a lesser extent. The main modification made to rRNA is methylation, in which the methyl group of S-adenosylmethionine is transferred. tRNA receives many types of base modification, and compounds known as minor bases*5 (such as pseudouridine, 4-thiouridine, thymidine, dihydrouridine and 1-methylguanosine) are generated as a result. Minor bases are necessary for tRNA to function. Another important modification type is the enzymatic addition of a three-base sequence, CCA, to the 3’ end of tRNA in eukaryotes. tRNA in prokaryotes has CCA at the 3’ end from the beginning; the 3’ end of tRNA in both eukaryotes and prokaryotes therefore has CCA.

mRNA Processing in Eukaryotes

mRNA in eukaryotes is first transcribed from DNA as pre-mRNA (Fig. 3-8), which becomes complete mRNA after going through the following three main changes (processing):

❖ Capping (Cap Formation)

A special structure with a phosphate bond between 5’ and 5’ is added to the 5’ end of mRNA. No other nucleotide bonds that form a bond between 5’ and 5’ are known. This is called the cap structure (Fig. 3-9), and is essential when mRNA is used for protein synthesis and binds to ribosomes via special proteins attached to the cap. mRNA in prokaryotes, which have no cap structure, would not function in the protein synthesis apparatus of eukaryotes.

❖ Addition of Poly-A

A poly-A signal sequence (e.g., AAUAAA) is located near the 3’ end of pre- mRNA, and following enzymatic cleavage at a site approximately 20 bases downstream from the sequence, many As (adenosines) are added to the end. The number of nucleotides added can be from several dozen to thousands. This synthesis does not require a template. Since even mRNA molecules of the same type have different poly-A lengths, the size of the complete mRNA varies. It is suggested that the poly-A strand is necessary for the initiation of protein synthesis

Minor bases: In addition to the five main base types, high molecular DNA and RNA also contain other bases. These are known as minor bases, and are thought to play important functions despite their small quantities.

Figure 3-8 Modification leading up to the completion of mRNA in eukaryotic cells

and the inhibition of mRNA degradation. In an experiment, mRNA with poly-A can be condensed and purified by attaching it in a complementary fashion to oligo dTs attached to the surface of a fine resin.

❖ Splicing

The most remarkable part of processing in eukaryotes is splicing. Genes in eukaryotes consist of exons, which contain amino acid sequence information (codes), and introns, which do not, and pre-mRNA containing both exons and introns is first synthesized. In splicing, only introns are removed from the pre- mRNA synthesized, and the exons remaining are connected to form mRNA (Fig. 3-8). To connect two distant exons generated by the removal of introns, a spliceosome – a complex containing non-coding snRNA (small nuclear RNA) – binds near the two breakpoints, pulling them together (Fig. 3-10). During the process of splicing, some introns may be retained, for example, or two introns and one exon between them may be removed altogether, thereby creating several types of complete mRNA. This mechanism is called alternative splicing. As a result, several types of protein with different amino acid sequences can be synthesized from such mRNA, each exhibiting different functions. By exploiting this mechanism, one gene can produce several protein types, thus functioning as

Figure 3- Cap structure

Figure 3-10 Mechanism of splicing

Ribosomes are schematically drawn in the shape of a flattened snowman consisting of large and small subunits, but their actual shape is complex (Column Fig. 3-1). The size of ribosome RNA in eukaryotes is larger, with a higher number of proteins.

Ribosomes

❖ What is a ribosome?

Ribosomes are the places where protein synthesis occurs. In both prokaryotes and eukaryotes, a ribosome is a pairing of one large subunit and one small subunit (Column Fig. 3-1). Each subunit is a complex consisting of rRNA and many types of protein. Since RNA is larger and the number of protein types is

Column Figure 3-1 Ribosomes of E. coli

Column (^) Structure of E. Coli Ribosomes

higher in eukaryotes, prokaryotes have 70S ribosomes and eukaryotes have 80S ribosomes. Ribosomes contain many types of protein, but quantitatively they are rich in RNA, with proteins covering only parts of the surface (two thirds are RNA and one third is proteins). In particular, the space between the two subunits

  • the place where protein synthesis occurs – consists almost entirely of RNA. Ribosomes bind to mRNA and interact with aminoacyl-tRNA, activating both of them, and perform enzymatic reactions such as cleaving ester bonds between tRNA and peptide chains and forming peptide bonds between peptides and amino acids. These important functions of ribosomes are carried out by rRNA. Ribosomes are considered to be ribozymes consisting of RNA with enzymatic activity.

The initiation of translation is in fact a complex reaction (Column Fig. 3-2). First, initiation factors (IFs) dissociate the large and small subunits, and the small subunit is bound to Met-tRNA (fMet-tRNA in prokaryotes) with mRNA and IFs attached. The large subunit then binds to it, forming a complex consisting of a ribosome, mRNA and Met-tRNA. This is known as an initiation complex.

Column (^) Initiation of Translation

Column Figure 3- Formation of initiation complexes in eukaryotic organisms An “e” as the first letter in the names of initiation factors represents eukaryotes.

Structure of mRNA

In both prokaryotes and eukaryotes, the functional structure of mRNA schematically consists of a 5’ non-coding region, a coding region and a 3’ non-coding region arranged side by side (Fig. 3-12). AUG, the translation initiation codon, is located at the first part of the coding region. The 5’ non-coding region in prokaryotes often contains a sequence complementary to 16S rRNA, to which ribosomes bind. In eukaryotes, no such sequence exists; instead, there are proteins that bind to the cap structure at the 5’ end, forming an appropriate bond between mRNA and ribosomes. The 3’ non-coding region of mRNA in eukaryotes has a sequence related to the degradation rate of mRNA. The coding region between the two non-coding regions encodes an amino acid sequence of a protein.

The termination reaction occurs when the next codon of mRNA is the termination codon (Column Fig. 3-4). A releasing factor (RF) involved in this reaction enters the A site corresponding to the termination codon of mRNA. A peptidyl-tRNA moves to the P site, and a peptide and tRNA are hydrolyzed by the enzymatic action of rRNA, which releases the protein and subsequently tRNA and mRNA, thereby terminating translation.

Figure 3-12 Structure of mRNA

Column (^) Termination of Translation

Column Figure 3- Termination reaction of translation

Protein Synthesis

During protein synthesis, a reaction continuously occurs in which three bases of mRNA (a codon) and three bases of aminoacyl-tRNA (an anticodon) form pairs. In this reaction, amino acids are arranged by tRNA in accordance with the order of the mRNA codes (Fig. 3-13), the amino acids and tRNA are dissociated and the amino acids are linked. Through this process, amino acids are connected following the order of the mRNA codes. The series of reactions that occur on ribosomes is known as translation. One strand of mRNA has multiple ribosomes attached that concurrently synthesize proteins, and longer strands of mRNA have more ribosomes attached to them. Clusters of ribosomes bound to mRNA are called polysomes (or polyribosomes), and cells that actively synthesize proteins have many polysomes. The rate at which amino acids are linked is thought to be around 20 per second in prokaryotes. Assuming that the average molecular weight of amino acids is 114, one minute is needed to synthesize a protein with a molecular weight of approximately 135,000. This means that most proteins, with their molecular weights being around 50,000, are generated within 30 seconds. The rate is slower in eukaryotes at around two amino acids per second. See the Column for more details on the rather complex processes of initiation, elongation and termination of translation.

Figure 3-13 Schamatic diagram of protein synthesis

  • Protein genes are DNA regions that determine the amino acid sequences of proteins.
  • rRNA and tRNA are categorized as non-coding RNA without protein information, and function as RNA. This RNA is also transcribed from rRNA and tRNA genes on DNA.
  • The information unit of protein genes is a three-base sequence on a DNA strand, which corresponds to one amino acid.
  • Gene function (or gene expression) refers to the process by which RNA is synthesized based on genetic information and a protein is then synthesized using the RNA information.
  • In RNA synthesis, the base sequence of a gene is read using one of the DNA double strands as a template, thereby synthesizing an RNA strand with a sequence complementary to the DNA strand.
  • RNA synthesis is known as transcription because DNA sequence information is copied to the RNA sequence.
  • An enzyme that synthesizes RNA is called an RNA polymerase.
  • A DNA region to which RNA polymerase binds is called a promoter.
  • The roles of promoters are to recruit RNA polymerase and determine the initiation point of transcription and the DNA strand to be used as the template.
  • A sequence involving a gene with protein information is transcribed to an mRNA sequence. mRNA types are as numerous as gene types, and both correspond to the number of protein types.
  • In prokaryotes, transcription and translation are coupled.
  • mRNA in eukaryotes is first transcribed in the form of precursors called pre-mRNA, which undergo modifications in the nucleus (such as capping, poly-A addition and splicing) to become complete mRNA. This is then transferred to the cytoplasm, where it is used for protein synthesis.
  • A three-base set corresponding to one amino acid on an mRNA strand transcribed from DNA is called a gene codon.
  • The first AUG on mRNA encodes methionine, and is also the initiation codon for protein synthesis.
  • Protein synthesis occurs on granules called ribosomes.
  • A three-base set of mRNA (i.e., a codon) and a three-base set of anticodon aminoacyl-tRNA (an amino acid bound with tRNA) form pairs on a ribosome, through which amino acids are arranged by tRNA in accordance with the order of the mRNA codes.
  • A reaction is continuously repeated in which an amino acid and tRNA are dissociated and amino acids are then connected together. As a result, amino acids are linked following the order of the mRNA codes, thus forming proteins.
  • There are three types of code (known as termination codons) that do not correspond to any amino acids. Protein synthesis stops at one of the termination codons on mRNA.

Summary (^) Chapter 3

[1]

Briefly explain the characteristics shared and not shared by the processes of replication and transcription.

[2] Using the terms below, briefly outline how the genetic information of genomic DNA is eventually converted to proteins: Codon, messenger RNA (mRNA), amino acid, aminoacyl- tRNA (AA-tRNA), nucleus, cytoplasm, ribosome.

[3] Briefly outline the process that occurs during mRNA synthesis in eukaryotes between the transcription and completion of mRNA.

[4]

In humans, one chromosomal genome set is inherited from each of the mother and father.

  1. Consider a case in which a DNA sequence inherited from a parent has a mutation, resulting in illness. If a disease manifests itself as a phenotype only when mutation occurs in both copies of a gene derived from both the mother and father, is it a dominant or recessive hereditary disease?
  2. For the scenario in 1), it is assumed that the mutation occurs at a site that encodes an amino acid. What kind of mutation is generally considered to occur in this case?
  3. If a disease manifests itself as a phenotype when a mutation occurs in one of the two copies of a gene (derived from either the mother or the father), is it a dominant or recessive hereditary disease?
  4. For the scenario in 3), it is assumed that the mutation occurs at a site that encodes an amino acid. What kind of mutation is generally considered to occur in this case?

Problems

(Answers on p.251)