




Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Assignment; Class: BIOINFO ALGS, DB, TOOLS; Subject: Computer Science; University: University of Maryland; Term: Spring 2007;
Typology: Assignments
1 / 8
This page cannot be seen from the preview
Don't miss anything!





shearing
sequencing
assembly
original DNA
1
2
3
4
5
6 Coverage
Contig
Reads
Imagine raindrops on a sidewalk
L = read length
T = minimum overlap G = genome size
N = number of reads
c = coverage (NL / G)
= 1 – T/L
E(#islands) = Ne-c
E(island size) = L(ec^ – 1) / c + 1 –
contig = island with 2 or more reads
A^ k-mer
B
C H^ D
I
F
G
E
probes - all possible k-mers
Main entity: oligomer (overlap) Relationship between oligomers: adjacency
ACCTGATGCCAATTGCACT...
CTGAT follows CCTGA (they share 4 nucleotides: CTGA)
Problem: given all the k-mers, find the original string
In assembly: fake the SBH experiment - break the reads into k-mers
ACCTAGATTGAGGTCG
ACCTAGATTGAGGTC CCTAGATTGAGGTCG