



















































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
A comprehensive overview of key concepts in genomics and genetic mapping, focusing on techniques for distinguishing genetic variations and mapping genes. It covers topics such as sequencing methods (sanger, maxam-gilbert), genetic markers (rflp, sslp, snp), and mapping strategies (restriction mapping, fish, sts). The document also discusses sequence alignment algorithms and scoring matrices used in genomics. It is useful for students studying genetics, molecular biology, and bioinformatics, offering insights into both theoretical and practical aspects of genome analysis and mapping. Definitions and explanations of various genetic and genomic concepts, making it a valuable resource for exam preparation and understanding complex topics in genetics.
Typology: Exams
1 / 59
This page cannot be seen from the preview
Don't miss anything!




















































Distinguishing benign from pathogenic genetic variation - ANSWER large sample size of sick and healthy patients
Size of human genome - ANSWER 3 billion base pairs
Personalized/precision medicine - ANSWER use genetic info to tailor treatments, anticipate onset of disease, manage health
Hybrid crops - ANSWER superior than either parental lineages; hard to maintain
Challenge with hybrid crops - ANSWER epistatic effects of multiple alleles
Genomics - ANSWER Discipline of mapping, sequencing, analyzing, and comparing genomes; molecular biology, genetics, bioinformatics
Goals of genomics - ANSWER sequence/assemble genomes, identify functional elements, maintenance/evolutionary processes, diversity, expression, phenotypes, altern/design living organisms
Statistical significance - ANSWER calculated as P(mutation | disease)
Confidence that mutation causes disease - ANSWER depends on occurence of that mutation in the general population
Koch's postulates - ANSWER 1. Microbe found in sick, not healthy organsims
Limitations of Koch's postulates - ANSWER 1. can isolate from healthy & diseased patients
Sequencing by synthesis - ANSWER uses DNA polymerase, sequence template, oligo primer, 4 dNTPs; add one nucleotide type at a time and quantify incorporation into synthesized strand
Sanger-Coulson method - ANSWER 1 or 3 nucleotides added to synthesis reactions to stop synthesis at specific locations; requires 8 reactions and 8 electrophoresis lanes
Minus - ANSWER polymerase stops at missing base
Plus - ANSWER T4 DNA polymerase 3' exonuclease stalled by dNTP
Maxam-Gilbert method - ANSWER ds DNA chemically altered to cleave fragments at specific basis; 4 reactions & 4 electrophoresis lanes required; A+G, G, C, C+T
Problems with Maxam-Gilbert method - ANSWER cumbersome, limited resolution, repeats of same nucleotide difficult to resolve
Sanger Sequencing chain termination - ANSWER dNTPs = DNA elongation, ddNTPs = chain terminators, causes polymerization to stop randomly but at specific base
DNA sequencing by chain-termination - ANSWER family of fragments with different sizes recovered from reaction based on final nucleotide; can view repeats
Polyacrylamide gels - ANSWER can resolve DNA fragment sizes w/ a resolution of a single base; ~200 nucleotides read with high accuracy
Sanger sequencing improvements - ANSWER capillaries allow for faster electrophoresis, smaller footprint, fluorescently labeled ddNTPs allow for laser detection; automate and parallelize for higher throughput; ~1,000 bases
Limitations of Sanger sequencing - ANSWER large amount of template DNA, need flanking DNA of known sequences, expensive ddNTPs, have to sequence 10x to get high confidence
Clone library - ANSWER convenient way to prep template DNA for sequencing
How are clone libraries made? - ANSWER DNA fragments cloned into vectors, transformed into bacterial/yeast hosts to replicate DNA archiving
How are clone libraries used? - ANSWER isolate unique fragments from single colonies and sequence then mapped
EST - ANSWER Expressed Sequence Tag library that contains transcribed genes used to sequence full large genomes
What do mRNAs represent? - ANSWER small fraction of the genome but contain protein coding sequences/introns
3 laws of genetics - ANSWER allele segration, independent assortment, allele dominance
Genetic linkage - ANSWER alleles on the same chromosome are inherited together
Partial linkage - ANSWER some alleles are neither independent nor fully linked;
digestion
FISH (fluorescent in situ hybridization) - ANSWER hybridization of fluorescently labeled DNA probes that can reveal physical locations of specific sequences on chromosome
STS (sequence tagged site mapping) - ANSWER map positions of short DNA sequences unique in the genome using PCR/hybridization
What is needed for STS? - ANSWER collection of overlapping DNA fragments from a single chromosome
How does STS work? - ANSWER if 2 fragments have same STS, then must overlap; if 1 fragment has two different STSs, then those two STSs must be near each other
Shotgun sequencing of small genomes - ANSWER genome assembly from small fragments rely on ability to find overlapping sequences; requiers efficient computational methods
Sequence alignment - ANSWER arranging two or more sequences to identify regions of similarity; identify matches, insertions/deletions, substitutions
Dot plots - ANSWER provide visual overview of possible alignments but do not determine optimal alignment
Needleman-Wunsch algorithm - ANSWER recursive alignment process that aligns each position relative to best alignment; only considers subset of all possible alignments but guarantees to find optimal mathematical solution
Dynamic programming for optimal alignment - ANSWER get alignment with best score; not always biologically meaningful
What determines best scoring alignment? - ANSWER depends on scoring matrix used; calculate statistical significance of alignment score
Sequence homology - ANSWER two similar sequences found in genomes of two different organisms may be related through evolution
Selective pressure - ANSWER random mutations make sequences diverge but selective pressure to preserve function will impose sequence conservation at critical positions
Multiple sequence alignment (MSA) - ANSWER identify conserved sequences across organisms; create evolutionary trees
Problems with MSA - ANSWER computationally expensive to find optimal alignments; have to use heuristic approach that doesn't guarantee optimal alignments
Optimal pairwise alignment - ANSWER to find biologically meaningful alignments, can't use percent identity because not all AA subsitutions are equally likely
P(ab) - ANSWER probability of substitution; determined empirically from trusted set of
aligned sequences
PAM (point accepted mutations) - ANSWER observed mutations in closely related proteins, does not work well for aligning distantly related proteins
BLOSUM (block substitution matrix) - ANSWER observed mutations in distantly related proteins, only use blocks of conserved positions to calculate substitution rates
BLOSUM62 - ANSWER blocks database of trusted alignments that are highly conserved and gap free sequence blocks, keeps alignments with < 62% identity, calculate background frequency of each residue, calculate substitution rate for each residue pair
What does scoring matrix choice depend on? - ANSWER source of sequences to be aligned and evolutionary distance between sequences
What do specialized scoring matrices do? - ANSWER take into account protein structure, develop sequence context to improve alignment accuracy and biological relevance
Common amino acids vs rare amino acids - ANSWER low weights vs high weights
Positive vs negative - ANSWER more likely substitutions vs less likely substitutions
Sequence databases - ANSWER relies on sequence alignments, never perfect because of mutations/sequencing errors, have to return imperfect matches and provide statistical significance
BLAST (Basic Local Alignment Search Tool) - ANSWER compares a sequence of interest with all other DNA or protein sequences deposited in sequence databases; seed and expand alignment approach
Score - ANSWER pairwise alignment score calculated from the scoring matrix (ie BLOSUM62)
Query cover - ANSWER percentage overlap between query sequence and database hit
E-value - ANSWER expected number of hits returned by chance given size of the query seqsuence (false positives)
Max identity - ANSWER number of identical positions between aligned sequences
GenBank - ANSWER NIH database that is an annotated collection of all publicly available DNA sequences
Hamiltonian cycle - ANSWER represent sequencing results as nodes in a graph, connect overlapping sequences with arrow, find the path that travels through every node exactly once and ends at the starting node
Why is the Hamiltonian cycle difficult? - ANSWER requires trillions of pairwise alignments; no efficient algorithm to find it with millions of nodes; feasbile for microbial
What do transcription factors do? - ANSWER control activity of RNA polymerase
Transcription terminator loop - ANSWER formation of hairpin loop made from complementary bases followed by poly-U that destabilizes RNA polymerase; can recognize palindromic sequence
Non-coding tRNA - ANSWER conserved structure of tRNA can be used to identify tRNA genes
Why is it harder to detect genes in eukaryotes? - ANSWER introns interrupt reading frame and can be large, complex promoters, pseudogenes, variable RNA pol/TF binding sies
How can you discover exons and introns? - ANSWER map RNA molecule sequence back to genome sequence; can identify alternative splicing
Problem with RNA seq - ANSWER not all genes transcribed all the time in all tissues
Physical separation of DNA limitation - ANSWER 1 template per reaction, labor intensive to create BAC libraries, fragment can be lost or incompatible with host
Template amplification limitation - ANSWER mistakes in amplification, hard to amplify some regions, need reagents and to know flanking sequences
Sequencing limitations - ANSWER synthesis/sequencing are separate steps, expensive fluorescent terminators, large reaciton volume, limited read length
Template library with water-oil emulsion setup - ANSWER ligate templat to universal primers and attach to beads that are separated using water-oil emulsion, where each droplet contains a bead
Water-oil emulsion sequencing - ANSWER amplify in each droplet, create large library with different fragments in each drop, sequence on glass surface/wells
Benefits of water-oil emulsion - ANSWER no cloning host so don't lose incompatible sequences
Pyrosequencing - ANSWER after a base is added, pyrophosphoric acid is released, used as substrate to generate ATP by sulfurylase, which is used by luciferase to create detectable light
Advantages of pyrosequencing - ANSWER natural nucleotides reduces sequencing errors
Ion torrent - ANSWER release of protons from the addition of a nucleotide can be detected with a voltage meter
Solid phase template amplification - ANSWER attach DNA template to solid surface covered with universal primers; bridge amplification; high density
Cylic reversible termination - ANSWER solid phase chemiluminescent signal not strong enough to detect, determine sequence using reversible blocking group that allows elongation after signal detection
3' blocked reversible terminators - ANSWER unincorporated nucleotides washed away, 4 color imaging acquired, cleavage step removes terminating group and fluorescent to restore 3'-OH
How are solid phase sequences read? - ANSWER scan glass slide after each cycle of fluorescent terminator addition which creates light dot; short reads because of errors
Single molecule real time long read sequencing - ANSWER RNA pol anchored to well, sequences phospholinked hexaphosphate nucleotides, read fluorescent intensity
What is machine learning used for? - ANSWER annotate functional elements, generate new testable hypotheses, combine to build predicrive models
Supervised learning - ANSWER requires a set of known examples (training set) to build a predictive model, which is used to make predictions on a new data set (testing set); can continue cycle to improve
Example of supervised learning - ANSWER created a position frequency matrix from an alignment of promoter sequences to determine probability that a sequence is a promoter
Unsupervised learning - ANSWER discover structures unlikely to appear by chance; may reveal undiscovered biological functions
Iterative improvement of models - ANSWER train model/discover unlikely signal, predict or generate new hypotheses, validate predictions and use them to continue to train model
Generative model - ANSWER build from all features characterizing a class (Hidden Markov, Position Frequency Matrix), interpretable description of class and quantifiable variablity; generates example; requires more training
Discriminiative models - ANSWER focuses on distinctive features, more efficient but not uesful when context changes
Precision - ANSWER true positive / (true positive + false positive); how many selected items are actually the item of interest
Sensitivity (recall) - ANSWER true positive / (true positive + false negative); how many of the relevant items are selected
Specificity - ANSWER true negative / (true negative + false positive); how many of non-relevant items are actually selected
Accuracy - ANSWER true positive + true negative / everything; how often is the model actually correct
Precision recall curve - ANSWER tradeoff between specificity and sensitivity; high specificity means few false positives, high sensitivity means few false negatives
Genome replication in bacteria - ANSWER signle origin of replication, proceeds in two forks in opposite direction, termination happens when ter elements trap replication forks
DNA pol III - ANSWER main polymerase for DNA synthesis
DNA pol I and II - ANSWER mainly used for DNA repair
Origin of replication - ANSWER cluster of DnaA binding sites
DnaA - ANSWER protein that binds and loops genomic DNA to promote unwinding at AT rich region
DnaB - ANSWER helicase binds melted region with the help of DnaC to initate genome replication
RNA primase - ANSWER make RNA primer for DNA to extend
DNA polymerase I - ANSWER excise RNA primer, fills gaps to synthesize remaining DNA
Tus protein - ANSWER binds ter sequences, inhibits helicase activity when polymerase hits site facing the opposite direction and stops replication
leads to aging/cell death
Telomerase enzyme - ANSWER contains catalytic built in RNA template, attaches to 3' end of chromosome and elongates
Why are telomeres repeated sequences? - ANSWER easy to create repeated short sequence with telomerase
G1 stage - ANSWER initiate replication at multiple locations
S phase - ANSWER replication forks move outward and replicate genome
G2 phase - ANSWER fully replicate genome, prep for mitosis
How where yeast cells synchronized? - ANSWER checkpoint deficient mutants, limited DNA replication to regions close to origin of replication
Marker frequency analysis - ANSWER quantify number of copies in a cell; localize origin of replications and calculate speed of replication forks along the chromosome
How to determine copy number of each genomic region - ANSWER count the number of times fragments were sequenced with shotgun sequencing and mapping approach; should see regions close to origin replicated 2x as often
Why normalize marker frequencies against non-replicating cells? - ANSWER raw sequence efficiency not equal throughout genome; some sequences are more difficult to read
What does in peak mean in relative copy number profile - ANSWER corresponds to number of times the fragment was seen; means replication started there
How can dynamics of replication be characterized? - ANSWER time course experiment on synchronized cells, look at relative copy number to determine time it takes to replicate
Asynchronous experiment - ANSWER genomic regions close to the origin of replications have copy number higher than regions further away
Where do more errors occur? - ANSWER lagging DNA strand; more complex
DNA polymerase error rate - ANSWER 1 error in 10^7 nucleotdies; overall rate reduced to 1 in 10^10 because of repair mechanisms (1 in every 2000 genome copies)
Transition - ANSWER purine to purine (A to G) or pyrimidine to pyrimidine (C to T)
Transversion - ANSWER Purine to pyrimidine (A/G to C/T)
When is a mutation propagated? - ANSWER when the error is replicated
What happens when bases are mismatched? - ANSWER polymerase is slowed, exonuclease is faster (normally it is the opposite)
Proof reading - ANSWER competition beteween polymerase and exonuclease; when mismatch occurs, polymerase activity reduced, mismatched nucleotide excised
Alternative nucleotide tautomers - ANSWER normally biased towards correct structural isoform, but occasional shift in replication causes incorporation of wrong nucleotide
Base analogs - ANSWER incorporated into DNA and cause mutations; 5-bromouracil
Enol form of thymine - ANSWER base pairs G not A
detected by protein complex; recongition of stalled RNA pol at damaged nucleotides
How does excision machinery work? - ANSWER entire segment of a dozen nucleotides containing damaged nucleotides excised and resynthesized using DNA pol and ligase
Most common mechanism to repair damaged DNA - ANSWER excision repair
DNA adenine methylase (Dam) - ANSWER adds methyl group to adenine bases at GATC site
Strand specific mismatch repair - ANSWER repair machinery uses methylation patterns to distinguish newly synthesized strand (won't have methyl group) from old template, mismatches repaired on new strand
Double strand breaks - ANSWER most catastrophic, difficult to repair
Repairing double strand breaks - ANSWER can rejoin directly, but leads to major chromosomal rearrangements if there are multiple breaks
Homologous recombination - ANSWER if multiple copies of the genome are available, repair breaks with this method, single strand of non-damaged DNA invades double strand break and allows replication to occur
What does homologous recombination often lead to? - ANSWER resolution of branch migration can lead to a crossover
Denococcus radiodurans - ANSWER one of the most radiation resistant organisms; efficient DNA repair; each cell has 4 copies of genome in stationary phase; 8-10 during replication
How does D. radiodurans reconnect chromosome fragments? - ANSWER single strand
annealing; homologous recombination to reconstrcut full genome from multiple pieces
What is needed to map double strand breaks into genome? - ANSWER proximal (linked to biotin) and distal DNA linkers w/ hairpin structures, barcodes, deep sequencing
Aphicolidin - ANSWER DNA polymerase inhibitor that stalls replication fork, increases DSBs
Neocarzinostatin - ANSWER induces DSBs directly, more randomly
What did mapping of DSBs show? - ANSWER distribution not uniform in genome
Where are DSBs enriched? - ANSWER satellite repeats that form hairpins, pericentromeric and centromeric regions; transcribed regions with high density protein coding
Autopolyploidy - ANSWER more than two sets of chromosomes; common in plants
What does autopolyploidy allow? - ANSWER potential for gene expansion/diversification because extra genes can get mtuations without strong selective pressure
Allopolyploidy - ANSWER interbreeding between two different species with viable hybrid
What can homologous DNA recombination cause? - ANSWER generate large genome rearrangements, fragment losses, sequence duplications
Segmental duplication - ANSWER homologous/non-homologous recombination underlie large chromosomal rearragnements that cause gene duplication/deletion/fusion
Unequal crossing over - ANSWER repeat sequences in homologous chromosome pairs
Gene transfer agents - ANSWER prophages under control of host, transfer random pieces of genome
Intracellular transfer - ANSWER endosymbiont can exchange genetic material with eukaryotic hosts; mitochondria and plastids, gene transfer through phagotrophy
Web of Life - ANSWER ubiquity of HGT in microbes makes it difficult to determine lineages
How to detect HGT - ANSWER disagreements between tree structures
Reference tree construction - ANSWER use genes unlikely to be transferred (ribosomal proteins, core processes), multi-locus sequence analysis
Pathogenicity islands - ANSWER large genomic regions that contain genes involved in the synthesis of virulence factors; probably acquired by HGT; found in pathogenic strains of species
Sequence characteristic evidence for HGT - ANSWER GC content, codon use, regions that are different content probably came from somewhere else
HGT frequency - ANSWER occur more frequently between closely related organisms; sequence similarity promotes homologous recombination
Genome reduction in symbiotic bacteria - ANSWER live in stable/predictable conditions, need fewer functions, without selective pressure, lose genes through large chromosomal rearrangements
Core genome - ANSWER gene families present in all genomes of a group
Pan genome - ANSWER collection of all genes present in members of a group; shows
wide range of biological function; represents set of genes potentially available via HGT
Accessory genome - ANSWER genes present in only one or a few members
Original phylogeny basis - ANSWER look at physical features, limited to what you can see, difficult/impossible for microbes
Molecular phylogeny - ANSWER look at rRNA, use sequence information, each position in the gene is a trait
Phylogenetic tree inference - ANSWER comes from available data and assumptions about the mechanisms of evolution; many possible different trees
Maximum parsimony - ANSWER minimizes number of changes to explain evolution, no evolutionary basis
Drawbacks to maximum parsimony - ANSWER does not correct for multiple substitutions at the same site, sequence evolution not always parsimonious
Distance based phylogeny - ANSWER calculate distances between sequences, create matrix, construct tree based on pairwise distances, dependent on data scoring
Neighbor-joining method - ANSWER find nodes close to each other, far from everyone else, join together and recalculate distance to remaining nodes
Agglomerative clustering method - ANSWER look at matrix, find smallest distance between neighbors and link
Benefits of neighbor-joining method - ANSWER computationally efficient, useful for large datasets