Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

.introducing dna sequence.v.0.1, Study notes of Biogenetics and Computers

DNA sequencing in Bio-Informatics

Typology: Study notes

2014/2015

Uploaded on 02/02/2015

Qazi.Masud
Qazi.Masud 🇧🇩

1 document

Partial preview of the text

Download .introducing dna sequence.v.0.1 and more Study notes Biogenetics and Computers in PDF only on Docsity! Bi o- Bi o- 1 Exploring Bioinformatics DIU and Team Bio-Bio-1 January 24, 2015 Bi o- Bi o- 1 Contents 2 Introducing DNA Sequence 1 2.1 DNA Sequencing . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2.2 History of DNA Sequencing . . . . . . . . . . . . . . . . . . . . . 2 2.3 Methods of DNA Sequencing . . . . . . . . . . . . . . . . . . . . 3 2.4 DNA Sequencing Process . . . . . . . . . . . . . . . . . . . . . . 4 2.5 Automated DNA Sequencing . . . . . . . . . . . . . . . . . . . . 5 2.6 Computer Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.7 DNA Sequencing in Real Time . . . . . . . . . . . . . . . . . . . 7 2.8 Next Generation DNA Sequencing . . . . . . . . . . . . . . . . . 8 2.9 Sequencing Larger DNA Sequences . . . . . . . . . . . . . . . . . 8 2.10 Complete Genome Sequencing . . . . . . . . . . . . . . . . . . . . 8 2.11 Shotgun Sequencing . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.12 Challenges of DNA Sequencing . . . . . . . . . . . . . . . . . . . 11 2.13 Applications of DNA Sequencing . . . . . . . . . . . . . . . . . . 11 2.14 DNA Sequencing: Where to Next . . . . . . . . . . . . . . . . . . 12 2.15 Case Study: Human Genome Project (HGP) . . . . . . . . . . . 13 2.16 All Life depends on 3 critical molecules . . . . . . . . . . . . . . 14 2.17 DNA Replication . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.18 DNA Polymerase . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.19 DNA Replication Process . . . . . . . . . . . . . . . . . . . . . . 16 2.20 RNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.20.1 RNA vs DNA . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.20.2 Major RNA Types . . . . . . . . . . . . . . . . . . . . . . 17 2.21 Sequence Formats . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.21.1 GenBank . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.21.2 Fasta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.22 Some Important Concepts . . . . . . . . . . . . . . . . . . . . . . 19 2.22.1 Prokaryotes vs Eukaryotes . . . . . . . . . . . . . . . . . . 19 2.22.2 Genome . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.22.3 Gene . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.22.4 Gene Coding Region . . . . . . . . . . . . . . . . . . . . . 22 2.22.5 Exon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.22.6 Intron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.22.7 Proteins . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 i Bi o- Bi o- 1 iv Bi o- Bi o- 1 List of Tables 2.1 Prokaryotes vs Eukaryotes . . . . . . . . . . . . . . . . . . . . . . 19 v Bi o- Bi o- 1 Chapter 2 Introducing DNA Sequence —Fokhruzzaman —Saddam Hossain —Nazmun Nessa Moon 2.1 DNA Sequencing Single mother of all humans is Eve and Single father of all humans is Adam - how we could be able explore this hypothesis if we don’t know the code of human being that is encoded in the DNA and suffers from evolutionary changes in the form of mutations. DNA is the representative of a species. Now a days it has become very routine job for any bio-molecular lab to try to read some DNA to find out its code, which is called the DNA Sequence. Definition 1. DNA Sequence- DNA Sequence is the ordered arrangement of nucleotides that have made up that DNA. The length of a DNA sequence is measured in terms of number of nucleotides that have built it up, the popular unit to measure this is called base-pair (bp). Definition 2. DNA Sequencing- DNA Sequencing is the process of determin- ing the complete ordered sequence of nucleotides (A, T, C, G’s) of a complete or partial DNA of an organism. In the figure - 2.1 DNA and DNA Sequence have been illustrated. This sequence has been extracted from the DNA as an output of DNA Sequencing. • DNA sequencing is the biochemical process of determining the exact order of the chemical building blocks (called bases and abbreviated A, T, C, and G) in a DNA oligonucleotide • DNA serves as the blue-print of life • Determining DNA sequence is basic to understand biological processes 1 Bi o- Bi o- 1 2.4 DNA Sequencing Process DNA sequencing is a very long process. It encompasses the following steps of task to result out a sequenced DNA fragment. The following steps are main- tained in several bioinformatics industry for DNA sequencing process. Extraction of Genomic DNA and Sample Preparation The first step is to extract the high quality DNA from the organism to sequence. Different kits and protocols are available to extract clean and efficient genomic DNA from the respective organism. Starting with whole genome DNA or targeted gene fragments, the initial step in the process is a universal library preparation for any sample. Genome Fragmentation and DNA Amplification The genome map is the must pre-requisite for DNA sequencing task. From the genome map the span region of genome to be sequenced is identified first. The DNA from the genome is then chopped into bits, as whole chromosomes are too large to deal with, so the DNA is broken into manageably-sized overlapping segments. Identity set of clones from this region is selected, these are the mapped clones. In order to sequence a piece of DNA, first need to amplify it. This is sometimes done by a process called polymerase chain reaction (PCR). An alternative to PCR for DNA amplification is to clone the DNA piece, inserting the piece of DNA into the DNA of a bacterium. Replicating the bacterium thus replicates the DNA. Library Creation The amplified DNAs, generated by either PCR or cloning, are gathered in a library. This pool of clones act as a clone library for further sequencing work. The DNA fragments are already denatured (ie. melted), so that the two strands split apart. Template Preparation In this stage DNA is purified from smaller clones. The denatured DNA is added to reaction using the wet-lab set up for sequencing chemistries (using any of above discussed methods). Gel Electrophoresis The fragments are separated out on a gel by gel-electrophoresis and their lengths are calculated. Now working out the DNA sequence is just like a jigsaw puzzle. Sequence Assembly Using Genome Mapping As quoted previously that the genome map is pre-requisite to DNA sequencing, it is really needed in this step again. During the previous steps, originally sequencing has been performed by cutting the chromosomes into large pieces which have been cloned into bac- teria, creating a whole library of DNA segments. The segments have been cut open to look for common sequence landmarks in overlapping fragments. These have been used to fingerprint the fragments, so that it is known where in the chromosome the fragment is- this is called mapping. The fragments have been 4 Bi o- Bi o- 1 cut into smaller pieces and the process repeated and the small fragments have been sequenced. Finally the whole sequence is built up by assembling the frag- ments using the corresponding genome maps and fingerprints. If shotgun sequencing has been applied, it dispenses with the need for map- ping and so is much faster. It involves chopping the DNA into fragments of size of 2, 000 base pairs (bps) and 10, 000bps. It sequences the first and last 500bps of each fragment. It then uses computer algorithms to assemble the entire sequence from the sequenced fragments to get the initial DNA fragment sequenced. Shotgun sequencing is much faster- it takes a matter of months to obtain a draft sequence of the fruit fly, Drosophila Melanogaster (135Mbps), while the conventional sequencing effort may take several years to achieve a similar level of completion. But assembly of pieces for large genome like the human (3 × 109bps), requires very powerful computers And repetitive DNA, which is common in eukaryotic genomes, causes great difficulties in the assembly process and may get it wrong. Pre-finishing Some special techniques of sequencing are used to produce high quality sequences. This is very crucial step as because cleaning up the DNA se- quence and generating less error prone DNA sequence are done here. Genomes must be sequenced several times over on average, both to ensure complete cov- erage of the genome is achieved, and because sequencing data is somewhat error-prone. Finishing This is the stage where the final product of sequenced DNA is achieved, this is the final in-laboratory quality pass step. Quality of sequence data may vary, depending on purity and concentration of template DNA, pres- ence of extra PCR bands, quality of dye-terminators, electrophoresis matrix, and other reagents. Ensuring the required level of quality, this sequenced DNA is now ready to publish as DNA sequence data for further uses. Data Editing and DNA Annotation To make the sequenced DNA avail- able to the next bioinformatics researches, it is need to store in a library or genome bank. Before the submission to public databases, some steps of quality assurance, verification and biological annotation are needed. 2.5 Automated DNA Sequencing In the automated DNA sequencing, we don’t even have to ”read” the sequence from the gel - the computer does that for us! A computer read-out of the gel generates a ”false color” image where each color corresponds to a base. Then the intensities are translated into peaks that represent the sequence. This is a plot of the colors detected in one ”lane” of a gel (one sample), scanned from smallest fragments to largest. The computer even interprets the colors by printing the 5 Bi o- Bi o- 1 Figure 2.2: DNA Sequencing Process Figure 2.3: Automated DNA Sequence Readouts nucleotide sequence across the top of the plot. This is just a fragment of the entire file, which would span around 700 or so nucleotides of accurate sequence. Automated DNA sequencing works on laser dyes, capillary gel electrophore- sis, uorescent detection. This is just an example of ”old” autoradiogram with radioactive products. Fluorescent dye coupled to reaction allows visualization of di-deoxy termination events by means of a laser that detects the colored product. This shows four different reactions as done with the old manual sequencing. 2.6 Computer Analysis • The computer displays the information as an electropherogram, a trace of signal received by the photodetector in different wavelengths • Computer assigns false colors to each of the four tracings • Also prints the letter of the appropriate base below each peaks 6 Bi o- Bi o- 1 Figure 2.6: Sequencing Larger DNA Sequences process. As no machine can sequence a large (> 5, 000bp) at a stretch, the principle (as per today) of complete genome sequencing is - break the genome into smaller pieces, sequence the smaller DNA fragments and reconstruct the complete sequenced DNA from these sequenced fragments. Based on genome map, the genome is fragmented. Then the fragments are cloned to build a clone- library. These DNA fragments are sequenced first (based on the previously discussed methodologies). After that, computational models, software packages are used to identify overlapping clones with common restriction fragments and assembles them into a contig. These contigs are edited and aligned to the genome map. Gaps between clones are filled with other clones (such as fosmids) in this step, or by generating PCR products from BAC clones or genomic DNA. Contigs are assembled into the complete genome in this way. Based on the principles discussed above, there are three major strategies for complete genome sequencing. i) Hierarchical or Clone-by-Clone, where the genome is broken into many long pieces. Each long piece is then mapped onto the genome. And each piece is sequenced with shotgun. This strategy is applied to Yeast, Worm, Human, Rat etc. ii Walking, which is the online version of (i). Here the genome is broken into many long pieces and each piece is started to be sequenced with shotgun, then construction of map is done. Rice genome has been sequenced in this manner. iii) Whole genome shotgun, in which one large shotgun pass on the whole genome. Genome from many organism like Drosophila, Human (Celera), Neurospora, Mouse, Rat, Fugu etc have been se- quenced using this strategy. 2.11 Shotgun Sequencing • Bottom up approach that sequences solely from overlaps in large number of random sequences 9 Bi o- Bi o- 1 Figure 2.7: Whole Genome Shotgun Sequencing and Hierarchical Shotgun Se- quencing 10 Bi o- Bi o- 1 • Assembles a linear map from sub-clone sequences without knowing their order on the chromosome • Contigs are assembled based on alignments of all possible sequence pairs in the computer • Up to ten levels of redundancy used to make it correct • First used for H. influenzae. Now routinely used for microbial genomes and genome fragments • Can shotgun sequencing used for genomes with repetitive sequences? · Celara Genomics used it for Drosophila removing repetitive regions · Used some other methods to avoid the problem 2.12 Challenges of DNA Sequencing The main challenge of DNA sequencing is that there is no machine that takes long DNA as an input, and gives the complete sequence as output The available methods can only sequence around 500 letters (base-pairs) at a time. Increase of sensitivity of current instruments (in terms of sequence length) is essential. In the chemistry lab, There is a need for additional Fluor combinations to enable reaction multiplexing, which can save time and money. Lowering the cost of sequencing in another challange ahead of us, along with increasing the throughput. In the history, most cost decreases have been incremental, rather than monumental. But, in this case there is needed large cost decreases, which may require some revolutionary approaches on this. Making the application and related instruments available one of the concerns in DNA Sequencing. An statistics tells that current set-ups (laboratory standards) for DNA sequencing (i.e.: 3100 Genetic Analyzer) on an average can sequence around 100 samples in one day. 16 − 20 samples make up one run, 6 − 10 runs in a plate, and 2 plates at once. So the average capability for daily sequencing is about 200 samples. This is very low for sequencing throughput to support the current and up-coming demand for sequenced DNA. A well maintained machine is also vital to a successful sequence. The final challenge of DNA sequencing is in the analysis of the data. 2.13 Applications of DNA Sequencing There are about 100 million species. And each individual has different DNA. Even within individual, some cells have different DNA (i.e. cancer). How many sequences are there? Really they are needed to be sequenced to study before having the study of the population from any direction. If we want to explore what genes are on when and in which cell, we need to know the sequence first. Where do molecules bind to DNA? - this study also needs sequenced DNA. The 11 Bi o- Bi o- 1 Figure 2.8: Tiling Path teams of biologists, chemists, engineers, and computational scientists, among others. A sampling follows of some research challenges in geneticswhat we still don’t know, even with the full human DNA sequence in hand. 2.16 All Life depends on 3 critical molecules • DNAs · Hold information on how cell works • RNAs · Act to transfer short pieces of information to different parts of cell · Provide templates to synthesize into protein • Proteins · Form enzymes that send signals to other cells and regulate gene ac- tivity · Form body’s major components (e.g. hair, skin, etc.) · Are life’s laborers! 14 Bi o- Bi o- 1 Figure 2.9: DNA Replication 2.17 DNA Replication • DNA synthesis is the process of copying a double-stranded DNA molecule. • Each strand of double helix can server as a template for synthesis of the other strand • Two new DNA molecule is synthesized. Each contains- · One original DNA strand · Another newly synthesized strand • Semi-conservative replication 2.18 DNA Polymerase • An enzyme that catalyzes the replication and or repair of new DNA and RNA from an existing strand of DNA or RNA 15 Bi o- Bi o- 1 • Activity · All: direct the synthesis in 5’ to 3’ direction · Many: possess 3’ to 5’ exonuclease activity (sort of “proof-reading”) · Some: posses 5’ to 3’ exonuclease activity (help to join discontinuous DNA fragments in lagging strand) 2.19 DNA Replication Process Three Phases of DNA replication 1. Initiation • At specific points: origins of replication (rich in A−T pairs) can be single or multiple (20, 000 in human) • Two replication forks extends • Promotes the melting of the DNA helix • RNA primase initiate the synthesis • Other proteins and enzymes are involved 2. Elongation • DNA polymerase can replicate (in 5′ − 3′ direction) • Leading strand in continuous manner • Lagging strand forms a loop to invert the orientation • Okazaki Fragments (1000to2000bp) • DNA ligase joins the fragments 3. Termination • Correct termination is important • For circular genome termination point should be half- way round the circle • Terminator sequence: Defined DNA sequence 2.20 RNA • RNA is transcribed from DNA • It is usually only a single strand. • Some forms of RNA can form secondary structures by “pairing up” with itself. This can have impact on its properties dramatically. 16 Bi o- Bi o- 1 Figure 2.12: Fasta Sequence Format • An optional ∗ indicating the end of the sequence 2.22 Some Important Concepts 2.22.1 Prokaryotes vs Eukaryotes Prokaryotes include Archaea (“ancient ones”) and bacteria. Eukaryotes are kingdom Eukarya and includes plants, animals, fungi and certain algae. Table 2.1: Prokaryotes vs Eukaryotes Prokaryotes Eukaryotes Single cell Single or multi cell No nucleus Nucleus No organelles Organelles One piece of circular DNA (plasmid) Chromosomes No mRNA post transcriptional modification Exons/Introns splicing 2.22.2 Genome Genome is an organisms genetic material. • An organisms complete set of DNA. • a bacteria contains about 600, 000 DNA base pairs • human and mouse genomes have some 3 billion. • human genome has 24 distinct chromosomes. • Each chromosome contains many genes 19 Bi o- Bi o- 1 Figure 2.13: Gene Structure 2.22.3 Gene A gene is an interval (or collection of intervals) of DNA whose nucleotides are transcribed into mRNA and eventually expressed in the cell’s function by being translated into protein. Because the creation of proteins determines the organ- ism’s function, genes are also viewed as units of heredity; alleles are encoded by (often slight) variations in genes. • basic physical and functional units of heredity. • specific sequences of DNA bases that encode instructions on how and when to make proteins. Not every nucleotide from a gene must be eventually translated into pro- tein; after a pre-mRNA molecule is formed from transcription, its genes are interspersed with introns and exons. The introns are then excised during RNA splicing to form a final mRNA molecule. As a result, exons are genetically more important than introns, and the exons belonging to a gene form that gene’s coding region. Genes form only about a quarter of the human genome, and exons make up only about 1% of the total. The rest of the genome serves a variety of purposes. 20 Bi o- Bi o- 1 Figure 2.14: Eukaryotic Gene Structure Figure 2.15: Eukaryotic Gene Structure (Zoom-In) 21 Bi o- Bi o- 1 Figure 2.19: Promoter Elements Coding Sequence is the base-pair sequence that includes coding information for the protein specified by the gene. AKA Open Reading Frame (ORF) Terminator is the base-pair sequence that specifies the end of the mRNA transcript. Transcriptional Unit • Start Codon: Triplet of DNA base (codon): ATG • End Codon: Triplet of DNA base (codon): TGA, TTA, TAG • Coding sequence: at least 100 codons (300bps) • Others: binding sites for DNA polymerase and other enzymes 2.22.9 TATA Box A short sequence (with consensus TATAAA) that often occurs in eukaryotic promoters. TATA-box is found in 70% of promoters. 2.22.10 GC Box A GC box is a distinct pattern of nucleotides found in the promoter region of some eukaryotic genes upstream of the TATA-box and approximately 110 bases upstream from the transcription initiation site. It has a consensus sequence GGGCGG which is position dependent and orientation independent. The GC elements are bound by transcription factors and have similar functions to en- hancers. 24 Bi o- Bi o- 1 Figure 2.20: Reverse Complement 2.22.11 CAAT Box A CCAAT-box (also sometimes abbreviated a CAAT-box or CAAT-box ) is a distinct pattern of nucleotides with GGCCAATCT consensus sequence that occur upstream by 75 − 80 bases to the initial transcription site. The CAAT- box signals the binding site for the RNA transcription factor, and is typically accompanied by a conserved consensus sequence. 2.22.12 Reverse Complement The reverse complement of a DNA string is formed by reversing the string and taking the complement of each symbol. We must reverse the string in addition to taking complements because of the directionality of DNA: DNA replication and transcription occurs from the 3′ end to the 5′ end, and the 3′ end of one strand is opposite from the 5′ end of the complementary strand. Thus, if we were to simply take complements, then we would be reading the second strand in the wrong direction. 2.22.13 Reverse Palindrome A DNA string that is equal to its reverse complement. 2.22.14 Mutation A mutation is a change of the nucleotide sequence of the genome of an organism, virus, or extrachromosomal genetic element. Mutations result from unrepaired damage to DNA or to RNA genomes (typically caused by radiation or chemical mutagens), errors in the process of replication, or from the insertion or deletion of segments of DNA by mobile genetic elements. 2.22.15 Single Nucleotide Polymorphism (SNP) A Single Nucleotide Polymorphism (SNP, pronounced snip; plural snips) is a DNA sequence variation occurring commonly within a population (e.g. 1%) in which a single nucleotide A, T,CorG in the genome (or other shared sequence) differs between members of abiological species or paired chromosomes. For 25 Bi o- Bi o- 1 Figure 2.21: Five Types of Chromosomal Mutations 26
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved