Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Computational Biology: More on Sequence Operations, Slides of Computational Biology

Carnegie Mellon University (CMU)Computational Biology

This document, part of a larger set of notes for a computational biology course, focuses on sequence operations. It discusses the representation and matching of sequences, including bit-coding and matching one or more characters. It also covers automating probability calculations using nucleotide frequencies.

Typology: Slides

2010/2011

Uploaded on 11/02/2011

blueeyes_11 🇺🇸

4.7

(18)

261 documents

1 / 29

This page cannot be seen from the preview

Don't miss anything!

Computational Biology, Part A

Partial preview of the text

Download Computational Biology: More on Sequence Operations and more Slides Computational Biology in PDF only on Docsity!

Computational Biology, Part A

More on Sequence Operations

Robert F. Murphy

Representation and Matching of

Sequences

Matching one character - with

character variables

 Assume two character variables "C” and “Q” test for exact match  If(Q=C) {...} need complicated statements to handle wildcards  If(Q=C | (Q=„A‟&(C=„A‟|C=„R‟‟| C=„W‟ | C=„M‟ | C=„D‟ | C=„H‟‟| C=„V‟ | C=„N‟)|Q=„C‟&...)) {...} can build into a function  If(TestBase(Q,C)) {...}

Efficient method to match one

character

 Convert char to int 0-

 Create 26x26 matrix showing which matches which

 Lookup two characters to be compared to find value

Matching one character - with bit

coding

 Assume two integer variables “I” and “J”  test for exact match  If(I=J) {...}  test for match with wildcards (no lookup!)  If(I&J) {...}

Matching more than one

character - pattern matching

 Example: recognition site for a restriction enzyme  Input sequence string into variable Seq  Define Site as string of characters or masks  EcoRI recognizes GAATTC  AccI recognizes GTMKAC  Create function to search a sequence for that site  Find(Site,LenSite,Seq,LenSeq)  for each position in Seq, see if Site matches starting there

Automating the Calculation

 Goal: Calculate probability of occurrence of a sequence that may include ambiguous bases

 What we need is a way to consider all possible allowed nucleotides at each position in all allowed combinations

 When using dinucleotide probabilities, have to be careful about how the probabilities are combined

Illustration

 Question: What is the probability of observing sequence feature ART (A followed by a purine {either A or G}, followed by a T) using dinucleotide probabilities?

Expansions

 pART=pA(pAA+pAG)(pAT+pGT) [eq.1]

 pART=pApAApAT + pApAApGT

pApAGpAT + pApAGpGT)

 pART=pA(pAApAT+pAGpGT) [eq.2]

 pART= pApAApAT + pApAGpGT

Proof

 pART=pAAT+pAGT

 pAAT=pApAApAT

 pAGT=pApAGpGT

 pART= pApAApAT + pApAGpGT

 This matches equation 2 on previous slide

More complicated probability

illustration

 What is the probability of observing the sequence feature ARYT (A followed by a purine {either A or G}, followed by a pyrimidine {either C or T}, followed by a T)?

 Using equal mononucleotide frequencies

 pA = pC = pG = pT = 1/ pARYT = 1/4 * (1/4 + 1/4) * (1/4 + 1/4) * 1/ = 1/

Illustration (continued)

 Using observed mononucleotide frequencies: pARYT = pA (pA + pG) (pC + pT) pT

 Using dinucleotide frequencies:

pARYT = pA (pAA (pACpCT + pATpTT) + pAG (pGCpCT + pGTpTT) )

Multiply then add

 We conclude that for such strings our rule should be “multiply dinucleotide probabilities along each allowed path and then add the results”

How do we program this?

 “for” loops?

 Nested “if” structure?

 Other?

Computational Biology: More on Sequence Operations, Slides of Computational Biology

Related documents

Partial preview of the text

Download Computational Biology: More on Sequence Operations and more Slides Computational Biology in PDF only on Docsity!

Computational Biology, Part A

More on Sequence Operations

Representation and Matching of

Sequences

Matching one character - with

character variables

Efficient method to match one

character

Matching one character - with bit

coding

Matching more than one

character - pattern matching

Automating the Calculation

Illustration

Expansions

Proof

More complicated probability

illustration

Illustration (continued)

Multiply then add

How do we program this?