






























Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Notes of Introduction to Bioinformatics
Typology: Lecture notes
1 / 38
This page cannot be seen from the preview
Don't miss anything!































p
occurrences of P in a text S = S[1..n]
[1..n 1
] and S 1
[1..n 2
] find their 2
l^
t^
b t i
longest common substring.• find i, j, k such that S
[i .. i+k-1] = S 1
[j .. j+k-1] and k is^2
l^
ibl
as large as possible.
problems (efficiently)?
p
g
g
microarrays
y
i t
ffi
into m suffixes.– abcdefgh (length: 8)
g
(^
g
)
h, gh, fgh, efgh, defgh, cdefgh, bcefgh, abcdefgh
thi
t^
b
t d i
O( ) ti
this tree can be generated in O(
n
time
m
can be searched
g p
g
in this suffix tree in O(
m
) time.
Whereas a regular sequential search would take
O(
n
) time.
n
be a string of length
n
over a
g
g
fixed alphabet
. A suffix tree for
is a tree
with
n
leaves (representing
n
suffixes) and
with
n
leaves (representing
n
suffixes) and
the following properties:
Every internal node other than the root has at least 2
children
S
S
.
different letters.different letters.
to leaf
i^
spells out the
i-
th suffix
S
[ i ..
n ]
of
S
. We denote
p^
[^
]
S
[i..n] by
S
. i
g
x a b x a c
Does a suffix treealways exist?always exist?
p
we cannot have a tree with the propertiesdefined in the previous slidesdefined in the previous slides.– e.g.
xabxa
The fourth suffix
xa
or the fifth suffix
a
won’t be
represented by a leaf node.
p
we cannot have a tree with the propertiesdefined in the previous slidesdefined in the previous slides.– e.g.
xabxa
The fourth suffix
xa
or the fifth suffix
a
won’t be
represented by a leaf node.
the end such as $ Therefore xa$ will not be athe end such as $. Therefore xa$ will not be aprefix of the suffix xabxa.
Start with a root and a leaf numbered 1, connectedby an edge labeled
S
$
.
-^
Enter suffixes
S
[2..
n
]$;
S
[3...
n
]$; ... ;
S
[ n
]$ into the
[^
] ;
[^
] ;
;^
[ ]
tree as follows:
-^
To insert
K
=i
S
[ i
n
]$ follow the path from the root
To insert
K
i^
S
[ i
..
n
]$, follow the path from the root
matching characters of
K
until the first mismatch ati
character
K
[^ i
j^
] (which is bound to happen)
character
K
[^ i
j^
]^
(which is bound to happen)
(a) If the matching cannot continue from a node, denotethat node by
w
that node by
w
(b) Otherwise the mismatch occurs at the middle of anedge, which has to be splitedge, which has to be split
edge
e
[ u
v
a
1
a
l
let the label of that edge be
a
... 1
a
l
a
, then k
create a new node
w
and replace
e
by two edges
create a new node
w
, and replace
e
by two edges
S[u ... u+k-1] and S[u+k ... v] labeled by
a
... 1
a
k and
a
a
a
k+
...
a
l
new leaf numbered
i
, and connect
w
to it by
an edge labeled with
[ i
j^
|] i
g
[ i j^
|] i
g
first edge to be split:
$
$
$
$
thi
h
f^
th
d
d
when ac$ is inserted.
g
g
will be completed:
substrings
[ i
j
] of
in the edges, but only
their start and end indices (
i^
j )
their start and end indices (
i ,
j
l b l
b t i
f^
labels as substrings of
n
p
p
y
p
p
q
info/Luminy-98/albert/JAVA+html/SuffixTreeGrow html98/albert/JAVA html/SuffixTreeGrow.html