Genomics and Gene Recognition - Bioinformatics - Lecture Slides, Slides for Bioinformatics. English and Foreign Languages University

Genomics and Gene Recognition - Bioinformatics - Lecture Slides, Slides for Bioinformatics. English and Foreign Languages University

PDF (480 KB)
36 pages
1Number of download
1000+Number of visits
Main points of this lecture are: Genomics and Gene Recognition, Eukaryotes, Prokaryotic Genomes, Gene Structure, Single Circular Chromosome, Promoter Elements, Assemblies of Proteins, Specific Nucleotide Sequences, Polyc...
20 points
Download points needed to download
this document
Download the document
Preview3 pages / 36
This is only a preview
3 shown on 36 pages
Download the document
This is only a preview
3 shown on 36 pages
Download the document
This is only a preview
3 shown on 36 pages
Download the document
This is only a preview
3 shown on 36 pages
Download the document

Genomics and Gene Recognition

Genomics and Gene Recognition

• How do we recognize the genes given the raw sequence data?

• Two different cases: – Prokaryotes: relatively easy – Eukaryotes: relatively difficult

• Much “junk DNA” to search through

• Signals determine the beginnings and ends of genes – Need to find the signals

Prokaryotic Genomes

• Genomic information of prokaryotes dedicated mainly to basic tasks – Make and replicate DNA – Make new proteins – Obtain and store energy

• Over 60 prokaryotic genomes have been completely sequenced since mid-1990s

Prokaryotic Genomes

• Recall - prokaryotes have a single circular chromosome

• Also - no cell nucleus, therefore no splicing out of introns

• Therefore, prokaryotic gene structure is quite simple

Transcriptional start site

Promoter region

Operator sequence

Open Reading Frame

Transcriptional stop site

Translational start site (AUG)

Translational stop site

Promoter Elements

• Gene expression begins with transcription – RNA copy of a gene made by an RNA polymerase – Prokaryotic RNA polymerases are assemblies of

several different proteins • β’ protein binds to DNA template • β protein links nucleotides • α protein holds subunits together • σ protein recognizes specific nucleotide sequences of


Promoter Elements

• β’, β and α often very similar from one bacterial species to another

• σ can vary (less well conserved) – Several variants often found in a cell – The ability to use several different σ factors allows

a cell to turn on or off expression of whole sets of genes

• For example, σ32 turns on gene expressions for genes associated with heat shock while σ54 does the same for nitrogen stress and genes that always need to be expressed are transcribed by polymerases with σ70

Promoter Elements

• Each σ factor recognizes a particular sequence of nucleotides upstream from the gene – σ70 looks for -35 sequence TTGACA and -10

sequence TATAAT – Other σ factors look for other -35 and -10

sequences • The match need not always be exact • The better the match, the more likely transcription will

be initiated

Promoter Elements

• Protein products from some genes are always used in tandem with those from some other genes – These related genes may share a single promoter in

prokaryotic genomes and be arranged in an operon – When one gene is transcribed, so are all of the others -

one polycistronic RNA molecule is produced – The lactose operon contains three genes involved in

metabolism of the sugar lactose in bacterial cells


• The protein encoded by the regulatory gene (pLacI) can bind to lactose or to the operator sequence of the operon – So when lactose is abundant, less likely to bind to

operator sequence • When it does, it blocks transcription, thus acting as a

negative regulator • Even without negative regulation, we have low levels of

operon expression due to poor match of consensus sequence for the σ factor

– A positive regulator (CRP) promotes expression

Open Reading Frames

• Recall - 3 of the 64 codons are stop codons (UAA, UAG, UGA) - they cause translation to stop

• Most prokaryotic proteins are longer than 60 amino acids – Since on average we expect to find a stop codon once in

every 21 (3/64) codons, the presence of a run of 30 or more codons with no stop codons (an Open Reading Frame - ORF) is good evidence that we are looking at the coding sequence of a prokaryotic gene

Open Reading Frames

• AUG is a start codon – Defines where translation begins – If no likely promoter sequences are found

upstream of a start codon at the start of an ORF before the end of the preceding ORF, assume the two genes are part of an operon whose promoter sequence is further upstream

Termination Sequence

• Most prokaryotic operons contain specific signals for the termination of transcription called intrinsic terminators – Must have a sequence of nucleotides that includes an

inverted repeat followed by – A run of roughly six uracils – The inverted repeat allows the RNA to form a loop

structure that greatly slows down RNA synthesis • Together with the chemical properties of uracil, this is enough to

end transcription

GC Content in Prokaryotic Genomes

• For every G within a double-stranded DNA genome there must be a C - likewise an A for every T – Only constraint on fraction of nucleotides that are G/C as

opposed to A/T is that the two must add to 100% – Can use genomic GC content to identify bacterial species

(ranges from 25% to 75%) – Can also use GC content to identify genes that have been

obtained from other bacteria by horizontal gene transfer

Prokaryotic Gene Densities

• Gene density within prokaryotic genomes is very high – Between 85% and 88% of the nucleotides are

typically associated with coding regions of genes – Just as large portions of chromosomes can be

acquired, they can also be deleted • Portions left are those which code for essential genes

Gene Recognition in Prokaryotes

• Long ORFs (60 or more codons) • Matches to simple promoter sequences • Recognizable transcriptional termination

signal (inverted repeats followed by run or uracils)

• Comparison with nucleotide (or amino acid) sequences of known protein coding regions from other organisms

Eukaryotic Genomes

• Much more complex – Internal membrane-bound compartments allows

wide variety of chemical environments in each cell – Multicellular organisms

• Each cell type has distinct gene expression – Size of genome may be larger

• Allows for “junk DNA”

• Gene expression more complex and flexible than in prokaryotes

Promoter Elements

• Each different cell type requires different gene expression – Therefore eukaryotes have elaborate mechanisms

for starting transcription – Prokaryotes have a single RNA polymerase -

eukaryotes have three • RNA polymerase I - Ribosomal RNAs • RNA polymerase II - Protein-coding genes • RNA polymerase III - tRNAs, other small RNAs

Promoter Elements

• Most RNA polymerase II promoters contain a set of sequences known as a basal promoter where an initiation complex is assembled and transcription begins

• Also have several upstream promoter elements (typically at least 5) to which other proteins bind – Without the proteins binding upstream,

initiation complex assembly is difficult

Promoter Elements

• RNA polymerase II does not directly recognize the basal sequences of promoters – Basal transcription factors

including a TATA-binding protein (TBP) and at least 12 TBP- associated factors bind to the promoter in a specific order, facilitating binding of RNA polymerase

• TATA-box 5’-TATAWAW-3’ (W is A or T) at -25 relative to transcriptional start site

• Initiator sequence 5’-YYCARR-3’ (Y is C or T and R is G or A) at transcriptional start site

Regulatory Protein Binding Sites

• Transcription initiation in eukaryotes relies heavily on positive regulation – Constitutive factors work on many genes and

don’t respond to external signals – Regulatory factors have limited number of genes

and respond to external signals • Response factors (e.g. heat shock) • Cell-specific factors (e.g. pituitary cells only) • Developmental factors (e.g. early embryo organization)

Open Reading Frames

• Before translation, a heterogeneous RNA (hnRNA) is transformed into mRNA by being – Capped

• 5’ end chemically altered – Spliced

• Various splicings can occur – Polyadenylated

• Long stretch of A’s added at 3’ end

Introns and Exons

• The introns are spliced out of the hnRNA – Protein-coding genes conform to the GU-AG rule

• These are the nucleotides at the 5’ and 3’ end of the intron

• Other nucleotides are examined as well – Most of these are inside the intron – These signals constrain introns to be at least 60 bp long - but

there is no upper limit

Alternative Splicing

• About 20% of human genes give rise to more than one type of mRNA sequence due to alternative splicing

• Splice junctions can be masked, causing an exon to be spliced out

• The following slide shows how alternative splicing based on different splicing factors (proteins) can stop a useful protein from being produced

Alternative Splicing

GC Content

• Overall GC content between different genomes does not vary as much in eukaryotes as in prokaryotes – However variations in GC content within a

genome can help us to recognize genes – Of all of the pairs of nucleotides, statistically, CG is

found only at 20% of its expected value • No other pair is under or over represented

no comments were posted
This is only a preview
3 shown on 36 pages
Download the document