Genomics and Genetic Mapping: Techniques and Applications, Exams of Advanced Education

A comprehensive overview of key concepts in genomics and genetic mapping, focusing on techniques for distinguishing genetic variations and mapping genes. It covers topics such as sequencing methods (sanger, maxam-gilbert), genetic markers (rflp, sslp, snp), and mapping strategies (restriction mapping, fish, sts). The document also discusses sequence alignment algorithms and scoring matrices used in genomics. It is useful for students studying genetics, molecular biology, and bioinformatics, offering insights into both theoretical and practical aspects of genome analysis and mapping. Definitions and explanations of various genetic and genomic concepts, making it a valuable resource for exam preparation and understanding complex topics in genetics.

Typology: Exams

2024/2025

Available from 07/13/2025

Prof-Cornel
Prof-Cornel 🇺🇸

4.9K documents

1 / 59

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
MMG 433 EXAM WITH COMPLETE
SOLUTION
Distinguishing benign from pathogenic genetic variation - ANSWER large sample size of
sick and healthy patients
Size of human genome - ANSWER 3 billion base pairs
Personalized/precision medicine - ANSWER use genetic info to tailor treatments,
anticipate onset of disease, manage health
Hybrid crops - ANSWER superior than either parental lineages; hard to maintain
Challenge with hybrid crops - ANSWER epistatic effects of multiple alleles
Genomics - ANSWER Discipline of mapping, sequencing, analyzing, and comparing
genomes; molecular biology, genetics, bioinformatics
Goals of genomics - ANSWER sequence/assemble genomes, identify functional
elements, maintenance/evolutionary processes, diversity, expression, phenotypes,
altern/design living organisms
Statistical significance - ANSWER calculated as P(mutation | disease)
Confidence that mutation causes disease - ANSWER depends on occurence of that
mutation in the general population
Koch's postulates - ANSWER 1. Microbe found in sick, not healthy organsims
2. Must be able to isolate from diseased organism
3. Microbe causes disease when introduced to healthy organism
4. Reisolate from inoculated, diseased nhost
Limitations of Koch's postulates - ANSWER 1. can isolate from healthy & diseased
patients
2. can't always grow in pure culture
3. differeing susceptibilities; can't infect humans
Sequencing by synthesis - ANSWER uses DNA polymerase, sequence template, oligo
primer, 4 dNTPs; add one nucleotide type at a time and quantify incorporation into
synthesized strand
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b

Partial preview of the text

Download Genomics and Genetic Mapping: Techniques and Applications and more Exams Advanced Education in PDF only on Docsity!

MMG 433 EXAM WITH COMPLETE

SOLUTION

Distinguishing benign from pathogenic genetic variation - ANSWER large sample size of sick and healthy patients

Size of human genome - ANSWER 3 billion base pairs

Personalized/precision medicine - ANSWER use genetic info to tailor treatments, anticipate onset of disease, manage health

Hybrid crops - ANSWER superior than either parental lineages; hard to maintain

Challenge with hybrid crops - ANSWER epistatic effects of multiple alleles

Genomics - ANSWER Discipline of mapping, sequencing, analyzing, and comparing genomes; molecular biology, genetics, bioinformatics

Goals of genomics - ANSWER sequence/assemble genomes, identify functional elements, maintenance/evolutionary processes, diversity, expression, phenotypes, altern/design living organisms

Statistical significance - ANSWER calculated as P(mutation | disease)

Confidence that mutation causes disease - ANSWER depends on occurence of that mutation in the general population

Koch's postulates - ANSWER 1. Microbe found in sick, not healthy organsims

  1. Must be able to isolate from diseased organism
  2. Microbe causes disease when introduced to healthy organism
  3. Reisolate from inoculated, diseased nhost

Limitations of Koch's postulates - ANSWER 1. can isolate from healthy & diseased patients

  1. can't always grow in pure culture
  2. differeing susceptibilities; can't infect humans

Sequencing by synthesis - ANSWER uses DNA polymerase, sequence template, oligo primer, 4 dNTPs; add one nucleotide type at a time and quantify incorporation into synthesized strand

Sanger-Coulson method - ANSWER 1 or 3 nucleotides added to synthesis reactions to stop synthesis at specific locations; requires 8 reactions and 8 electrophoresis lanes

Minus - ANSWER polymerase stops at missing base

Plus - ANSWER T4 DNA polymerase 3' exonuclease stalled by dNTP

Maxam-Gilbert method - ANSWER ds DNA chemically altered to cleave fragments at specific basis; 4 reactions & 4 electrophoresis lanes required; A+G, G, C, C+T

Problems with Maxam-Gilbert method - ANSWER cumbersome, limited resolution, repeats of same nucleotide difficult to resolve

Sanger Sequencing chain termination - ANSWER dNTPs = DNA elongation, ddNTPs = chain terminators, causes polymerization to stop randomly but at specific base

DNA sequencing by chain-termination - ANSWER family of fragments with different sizes recovered from reaction based on final nucleotide; can view repeats

Polyacrylamide gels - ANSWER can resolve DNA fragment sizes w/ a resolution of a single base; ~200 nucleotides read with high accuracy

Sanger sequencing improvements - ANSWER capillaries allow for faster electrophoresis, smaller footprint, fluorescently labeled ddNTPs allow for laser detection; automate and parallelize for higher throughput; ~1,000 bases

Limitations of Sanger sequencing - ANSWER large amount of template DNA, need flanking DNA of known sequences, expensive ddNTPs, have to sequence 10x to get high confidence

Clone library - ANSWER convenient way to prep template DNA for sequencing

How are clone libraries made? - ANSWER DNA fragments cloned into vectors, transformed into bacterial/yeast hosts to replicate DNA archiving

How are clone libraries used? - ANSWER isolate unique fragments from single colonies and sequence then mapped

EST - ANSWER Expressed Sequence Tag library that contains transcribed genes used to sequence full large genomes

What do mRNAs represent? - ANSWER small fraction of the genome but contain protein coding sequences/introns

3 laws of genetics - ANSWER allele segration, independent assortment, allele dominance

Genetic linkage - ANSWER alleles on the same chromosome are inherited together

Partial linkage - ANSWER some alleles are neither independent nor fully linked;

digestion

FISH (fluorescent in situ hybridization) - ANSWER hybridization of fluorescently labeled DNA probes that can reveal physical locations of specific sequences on chromosome

STS (sequence tagged site mapping) - ANSWER map positions of short DNA sequences unique in the genome using PCR/hybridization

What is needed for STS? - ANSWER collection of overlapping DNA fragments from a single chromosome

How does STS work? - ANSWER if 2 fragments have same STS, then must overlap; if 1 fragment has two different STSs, then those two STSs must be near each other

Shotgun sequencing of small genomes - ANSWER genome assembly from small fragments rely on ability to find overlapping sequences; requiers efficient computational methods

Sequence alignment - ANSWER arranging two or more sequences to identify regions of similarity; identify matches, insertions/deletions, substitutions

Dot plots - ANSWER provide visual overview of possible alignments but do not determine optimal alignment

Needleman-Wunsch algorithm - ANSWER recursive alignment process that aligns each position relative to best alignment; only considers subset of all possible alignments but guarantees to find optimal mathematical solution

Dynamic programming for optimal alignment - ANSWER get alignment with best score; not always biologically meaningful

What determines best scoring alignment? - ANSWER depends on scoring matrix used; calculate statistical significance of alignment score

Sequence homology - ANSWER two similar sequences found in genomes of two different organisms may be related through evolution

Selective pressure - ANSWER random mutations make sequences diverge but selective pressure to preserve function will impose sequence conservation at critical positions

Multiple sequence alignment (MSA) - ANSWER identify conserved sequences across organisms; create evolutionary trees

Problems with MSA - ANSWER computationally expensive to find optimal alignments; have to use heuristic approach that doesn't guarantee optimal alignments

Optimal pairwise alignment - ANSWER to find biologically meaningful alignments, can't use percent identity because not all AA subsitutions are equally likely

P(ab) - ANSWER probability of substitution; determined empirically from trusted set of

aligned sequences

PAM (point accepted mutations) - ANSWER observed mutations in closely related proteins, does not work well for aligning distantly related proteins

BLOSUM (block substitution matrix) - ANSWER observed mutations in distantly related proteins, only use blocks of conserved positions to calculate substitution rates

BLOSUM62 - ANSWER blocks database of trusted alignments that are highly conserved and gap free sequence blocks, keeps alignments with < 62% identity, calculate background frequency of each residue, calculate substitution rate for each residue pair

What does scoring matrix choice depend on? - ANSWER source of sequences to be aligned and evolutionary distance between sequences

What do specialized scoring matrices do? - ANSWER take into account protein structure, develop sequence context to improve alignment accuracy and biological relevance

Common amino acids vs rare amino acids - ANSWER low weights vs high weights

Positive vs negative - ANSWER more likely substitutions vs less likely substitutions

Sequence databases - ANSWER relies on sequence alignments, never perfect because of mutations/sequencing errors, have to return imperfect matches and provide statistical significance

BLAST (Basic Local Alignment Search Tool) - ANSWER compares a sequence of interest with all other DNA or protein sequences deposited in sequence databases; seed and expand alignment approach

Score - ANSWER pairwise alignment score calculated from the scoring matrix (ie BLOSUM62)

Query cover - ANSWER percentage overlap between query sequence and database hit

E-value - ANSWER expected number of hits returned by chance given size of the query seqsuence (false positives)

Max identity - ANSWER number of identical positions between aligned sequences

GenBank - ANSWER NIH database that is an annotated collection of all publicly available DNA sequences

Hamiltonian cycle - ANSWER represent sequencing results as nodes in a graph, connect overlapping sequences with arrow, find the path that travels through every node exactly once and ends at the starting node

Why is the Hamiltonian cycle difficult? - ANSWER requires trillions of pairwise alignments; no efficient algorithm to find it with millions of nodes; feasbile for microbial

What do transcription factors do? - ANSWER control activity of RNA polymerase

Transcription terminator loop - ANSWER formation of hairpin loop made from complementary bases followed by poly-U that destabilizes RNA polymerase; can recognize palindromic sequence

Non-coding tRNA - ANSWER conserved structure of tRNA can be used to identify tRNA genes

Why is it harder to detect genes in eukaryotes? - ANSWER introns interrupt reading frame and can be large, complex promoters, pseudogenes, variable RNA pol/TF binding sies

How can you discover exons and introns? - ANSWER map RNA molecule sequence back to genome sequence; can identify alternative splicing

Problem with RNA seq - ANSWER not all genes transcribed all the time in all tissues

Physical separation of DNA limitation - ANSWER 1 template per reaction, labor intensive to create BAC libraries, fragment can be lost or incompatible with host

Template amplification limitation - ANSWER mistakes in amplification, hard to amplify some regions, need reagents and to know flanking sequences

Sequencing limitations - ANSWER synthesis/sequencing are separate steps, expensive fluorescent terminators, large reaciton volume, limited read length

Template library with water-oil emulsion setup - ANSWER ligate templat to universal primers and attach to beads that are separated using water-oil emulsion, where each droplet contains a bead

Water-oil emulsion sequencing - ANSWER amplify in each droplet, create large library with different fragments in each drop, sequence on glass surface/wells

Benefits of water-oil emulsion - ANSWER no cloning host so don't lose incompatible sequences

Pyrosequencing - ANSWER after a base is added, pyrophosphoric acid is released, used as substrate to generate ATP by sulfurylase, which is used by luciferase to create detectable light

Advantages of pyrosequencing - ANSWER natural nucleotides reduces sequencing errors

Ion torrent - ANSWER release of protons from the addition of a nucleotide can be detected with a voltage meter

Solid phase template amplification - ANSWER attach DNA template to solid surface covered with universal primers; bridge amplification; high density

Cylic reversible termination - ANSWER solid phase chemiluminescent signal not strong enough to detect, determine sequence using reversible blocking group that allows elongation after signal detection

3' blocked reversible terminators - ANSWER unincorporated nucleotides washed away, 4 color imaging acquired, cleavage step removes terminating group and fluorescent to restore 3'-OH

How are solid phase sequences read? - ANSWER scan glass slide after each cycle of fluorescent terminator addition which creates light dot; short reads because of errors

Single molecule real time long read sequencing - ANSWER RNA pol anchored to well, sequences phospholinked hexaphosphate nucleotides, read fluorescent intensity

What is machine learning used for? - ANSWER annotate functional elements, generate new testable hypotheses, combine to build predicrive models

Supervised learning - ANSWER requires a set of known examples (training set) to build a predictive model, which is used to make predictions on a new data set (testing set); can continue cycle to improve

Example of supervised learning - ANSWER created a position frequency matrix from an alignment of promoter sequences to determine probability that a sequence is a promoter

Unsupervised learning - ANSWER discover structures unlikely to appear by chance; may reveal undiscovered biological functions

Iterative improvement of models - ANSWER train model/discover unlikely signal, predict or generate new hypotheses, validate predictions and use them to continue to train model

Generative model - ANSWER build from all features characterizing a class (Hidden Markov, Position Frequency Matrix), interpretable description of class and quantifiable variablity; generates example; requires more training

Discriminiative models - ANSWER focuses on distinctive features, more efficient but not uesful when context changes

Precision - ANSWER true positive / (true positive + false positive); how many selected items are actually the item of interest

Sensitivity (recall) - ANSWER true positive / (true positive + false negative); how many of the relevant items are selected

Specificity - ANSWER true negative / (true negative + false positive); how many of non-relevant items are actually selected

Accuracy - ANSWER true positive + true negative / everything; how often is the model actually correct

Precision recall curve - ANSWER tradeoff between specificity and sensitivity; high specificity means few false positives, high sensitivity means few false negatives

Genome replication in bacteria - ANSWER signle origin of replication, proceeds in two forks in opposite direction, termination happens when ter elements trap replication forks

DNA pol III - ANSWER main polymerase for DNA synthesis

DNA pol I and II - ANSWER mainly used for DNA repair

Origin of replication - ANSWER cluster of DnaA binding sites

DnaA - ANSWER protein that binds and loops genomic DNA to promote unwinding at AT rich region

DnaB - ANSWER helicase binds melted region with the help of DnaC to initate genome replication

RNA primase - ANSWER make RNA primer for DNA to extend

DNA polymerase I - ANSWER excise RNA primer, fills gaps to synthesize remaining DNA

Tus protein - ANSWER binds ter sequences, inhibits helicase activity when polymerase hits site facing the opposite direction and stops replication

leads to aging/cell death

Telomerase enzyme - ANSWER contains catalytic built in RNA template, attaches to 3' end of chromosome and elongates

Why are telomeres repeated sequences? - ANSWER easy to create repeated short sequence with telomerase

G1 stage - ANSWER initiate replication at multiple locations

S phase - ANSWER replication forks move outward and replicate genome

G2 phase - ANSWER fully replicate genome, prep for mitosis

How where yeast cells synchronized? - ANSWER checkpoint deficient mutants, limited DNA replication to regions close to origin of replication

Marker frequency analysis - ANSWER quantify number of copies in a cell; localize origin of replications and calculate speed of replication forks along the chromosome

How to determine copy number of each genomic region - ANSWER count the number of times fragments were sequenced with shotgun sequencing and mapping approach; should see regions close to origin replicated 2x as often

Why normalize marker frequencies against non-replicating cells? - ANSWER raw sequence efficiency not equal throughout genome; some sequences are more difficult to read

What does in peak mean in relative copy number profile - ANSWER corresponds to number of times the fragment was seen; means replication started there

How can dynamics of replication be characterized? - ANSWER time course experiment on synchronized cells, look at relative copy number to determine time it takes to replicate

Asynchronous experiment - ANSWER genomic regions close to the origin of replications have copy number higher than regions further away

Where do more errors occur? - ANSWER lagging DNA strand; more complex

DNA polymerase error rate - ANSWER 1 error in 10^7 nucleotdies; overall rate reduced to 1 in 10^10 because of repair mechanisms (1 in every 2000 genome copies)

Transition - ANSWER purine to purine (A to G) or pyrimidine to pyrimidine (C to T)

Transversion - ANSWER Purine to pyrimidine (A/G to C/T)

When is a mutation propagated? - ANSWER when the error is replicated

What happens when bases are mismatched? - ANSWER polymerase is slowed, exonuclease is faster (normally it is the opposite)

Proof reading - ANSWER competition beteween polymerase and exonuclease; when mismatch occurs, polymerase activity reduced, mismatched nucleotide excised

Alternative nucleotide tautomers - ANSWER normally biased towards correct structural isoform, but occasional shift in replication causes incorporation of wrong nucleotide

Base analogs - ANSWER incorporated into DNA and cause mutations; 5-bromouracil

Enol form of thymine - ANSWER base pairs G not A

detected by protein complex; recongition of stalled RNA pol at damaged nucleotides

How does excision machinery work? - ANSWER entire segment of a dozen nucleotides containing damaged nucleotides excised and resynthesized using DNA pol and ligase

Most common mechanism to repair damaged DNA - ANSWER excision repair

DNA adenine methylase (Dam) - ANSWER adds methyl group to adenine bases at GATC site

Strand specific mismatch repair - ANSWER repair machinery uses methylation patterns to distinguish newly synthesized strand (won't have methyl group) from old template, mismatches repaired on new strand

Double strand breaks - ANSWER most catastrophic, difficult to repair

Repairing double strand breaks - ANSWER can rejoin directly, but leads to major chromosomal rearrangements if there are multiple breaks

Homologous recombination - ANSWER if multiple copies of the genome are available, repair breaks with this method, single strand of non-damaged DNA invades double strand break and allows replication to occur

What does homologous recombination often lead to? - ANSWER resolution of branch migration can lead to a crossover

Denococcus radiodurans - ANSWER one of the most radiation resistant organisms; efficient DNA repair; each cell has 4 copies of genome in stationary phase; 8-10 during replication

How does D. radiodurans reconnect chromosome fragments? - ANSWER single strand

annealing; homologous recombination to reconstrcut full genome from multiple pieces

What is needed to map double strand breaks into genome? - ANSWER proximal (linked to biotin) and distal DNA linkers w/ hairpin structures, barcodes, deep sequencing

Aphicolidin - ANSWER DNA polymerase inhibitor that stalls replication fork, increases DSBs

Neocarzinostatin - ANSWER induces DSBs directly, more randomly

What did mapping of DSBs show? - ANSWER distribution not uniform in genome

Where are DSBs enriched? - ANSWER satellite repeats that form hairpins, pericentromeric and centromeric regions; transcribed regions with high density protein coding

Autopolyploidy - ANSWER more than two sets of chromosomes; common in plants

What does autopolyploidy allow? - ANSWER potential for gene expansion/diversification because extra genes can get mtuations without strong selective pressure

Allopolyploidy - ANSWER interbreeding between two different species with viable hybrid

What can homologous DNA recombination cause? - ANSWER generate large genome rearrangements, fragment losses, sequence duplications

Segmental duplication - ANSWER homologous/non-homologous recombination underlie large chromosomal rearragnements that cause gene duplication/deletion/fusion

Unequal crossing over - ANSWER repeat sequences in homologous chromosome pairs

Gene transfer agents - ANSWER prophages under control of host, transfer random pieces of genome

Intracellular transfer - ANSWER endosymbiont can exchange genetic material with eukaryotic hosts; mitochondria and plastids, gene transfer through phagotrophy

Web of Life - ANSWER ubiquity of HGT in microbes makes it difficult to determine lineages

How to detect HGT - ANSWER disagreements between tree structures

Reference tree construction - ANSWER use genes unlikely to be transferred (ribosomal proteins, core processes), multi-locus sequence analysis

Pathogenicity islands - ANSWER large genomic regions that contain genes involved in the synthesis of virulence factors; probably acquired by HGT; found in pathogenic strains of species

Sequence characteristic evidence for HGT - ANSWER GC content, codon use, regions that are different content probably came from somewhere else

HGT frequency - ANSWER occur more frequently between closely related organisms; sequence similarity promotes homologous recombination

Genome reduction in symbiotic bacteria - ANSWER live in stable/predictable conditions, need fewer functions, without selective pressure, lose genes through large chromosomal rearrangements

Core genome - ANSWER gene families present in all genomes of a group

Pan genome - ANSWER collection of all genes present in members of a group; shows

wide range of biological function; represents set of genes potentially available via HGT

Accessory genome - ANSWER genes present in only one or a few members

Original phylogeny basis - ANSWER look at physical features, limited to what you can see, difficult/impossible for microbes

Molecular phylogeny - ANSWER look at rRNA, use sequence information, each position in the gene is a trait

Phylogenetic tree inference - ANSWER comes from available data and assumptions about the mechanisms of evolution; many possible different trees

Maximum parsimony - ANSWER minimizes number of changes to explain evolution, no evolutionary basis

Drawbacks to maximum parsimony - ANSWER does not correct for multiple substitutions at the same site, sequence evolution not always parsimonious

Distance based phylogeny - ANSWER calculate distances between sequences, create matrix, construct tree based on pairwise distances, dependent on data scoring

Neighbor-joining method - ANSWER find nodes close to each other, far from everyone else, join together and recalculate distance to remaining nodes

Agglomerative clustering method - ANSWER look at matrix, find smallest distance between neighbors and link

Benefits of neighbor-joining method - ANSWER computationally efficient, useful for large datasets