Genome Analysis: Comparing Genome Sizes and Codon Usage in Prochlorococcus marinus, Assignments of Biology

Programming problems related to genome analysis, focusing on the comparison of genome sizes and codon usage in prochlorococcus marinus ss120 and synechocystis pcc 6803. Students will learn how to identify genes and intergenic sequences, analyze codon usage, and investigate the correlation between codon usage and amino acid frequency.

Typology: Assignments

Pre 2010

Uploaded on 02/12/2009

koofers-user-kqn
koofers-user-kqn 🇺🇸

10 documents

1 / 3

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Programming problems (biology) - 1
Problem Set – Introduction to Programming (Biology)
1. Why are genome sizes different?
a. What is the length of the genome of Prochlorococcus marinus ss120 (ss120)? How about
the genome of Synechocystis PCC 6803 (S6803)?
For the following, it might be helpful to think of a genome as consisting of coding regions
(the genes) and the sequences in between (the intergenic sequences).
b. Does S6803 have more genes than ss120?
c. Does S6803 have bigger genes than ss120?
d. Does S6803 have bigger intergenic sequences than ss120?
e. Summarize what you've found. Why are the genome sizes different?
2. Sequences of genomes, genes, and proteins
a. Display the first 2000 nucleotides of the ss120. Not very illuminating!
b. Copy the display into a word processor and modify it by highlighting to make evident
where are the genes in this region of the chromosome and where are the intergenic
sequences. You may find helpful the FROM and TO keywords in the GENES-OF
function and the View command on a result's Action menu.
c. Do the same with the ss120 chromosome from 9000 to 11000. How might you account
for the differences between these two regions?
d. Display the same two regions, this time using READING-FRAMES-OF (obtainable
through either the All menu or the Genes-Proteins menu, Translation submenu). Note in
this display where your genes begin and end.
e. Display the amino acid sequence for the proteins encoded by the first gene identified in
6a and in 6c. DISPLAY-SEQUENCE-OF will work for protein as well as genes, and to
get the protein, you can use the PROTEIN-OF function (Genes-Proteins menu) or simply
enter the gene's nickname (the part to the right of the decimal) preceded by "p-".
f. Compare the amino acid sequences of the two genes to
3. Codons and amino acids
Take a look at a table showing the codons and the amino acids they encode. You'll see that
some amino acids are encoded by several codons, others by only one. Why is that? Consider
the simple hypothesis that there is a correlation between the number of codons that encode an
amino acid and how common that amino acid is in proteins. Are serine and leucine, with 6
codons apiece, 6-times more common than tryptophan and methionine, with only 1 codon
apiece? Find out, using as a test the coding genes of the organism Prochlorococcus marinus
ss120.
a. What information do you need to know in order to answer this question?
b. What kinds of functions do you need in order to gather that information?
pf3

Partial preview of the text

Download Genome Analysis: Comparing Genome Sizes and Codon Usage in Prochlorococcus marinus and more Assignments Biology in PDF only on Docsity!

Problem Set – Introduction to Programming (Biology)

  1. Why are genome sizes different?

a. What is the length of the genome of Prochlorococcus marinus ss120 (ss120)? How about the genome of Synechocystis PCC 6803 (S6803)? For the following, it might be helpful to think of a genome as consisting of coding regions (the genes) and the sequences in between (the intergenic sequences). b. Does S6803 have more genes than ss120? c. Does S6803 have bigger genes than ss120? d. Does S6803 have bigger intergenic sequences than ss120? e. Summarize what you've found. Why are the genome sizes different?

  1. Sequences of genomes, genes, and proteins

a. Display the first 2000 nucleotides of the ss120. Not very illuminating! b. Copy the display into a word processor and modify it by highlighting to make evident where are the genes in this region of the chromosome and where are the intergenic sequences. You may find helpful the FROM and TO keywords in the GENES-OF function and the View command on a result's Action menu. c. Do the same with the ss120 chromosome from 9000 to 11000. How might you account for the differences between these two regions? d. Display the same two regions, this time using READING-FRAMES-OF (obtainable through either the All menu or the Genes-Proteins menu, Translation submenu). Note in this display where your genes begin and end. e. Display the amino acid sequence for the proteins encoded by the first gene identified in 6a and in 6c. DISPLAY-SEQUENCE-OF will work for protein as well as genes, and to get the protein, you can use the PROTEIN-OF function (Genes-Proteins menu) or simply enter the gene's nickname (the part to the right of the decimal) preceded by "p-". f. Compare the amino acid sequences of the two genes to

  1. Codons and amino acids

Take a look at a table showing the codons and the amino acids they encode. You'll see that some amino acids are encoded by several codons, others by only one. Why is that? Consider the simple hypothesis that there is a correlation between the number of codons that encode an amino acid and how common that amino acid is in proteins. Are serine and leucine, with 6 codons apiece, 6-times more common than tryptophan and methionine, with only 1 codon apiece? Find out, using as a test the coding genes of the organism Prochlorococcus marinus ss120. a. What information do you need to know in order to answer this question? b. What kinds of functions do you need in order to gather that information?

Here are some functions that might be of use to you:

c. Investigate SPLIT (STRINGS-SEQUENCE, STRING-PRODUCTION menu)

  • Try SPLIT ting "123456789"
  • Try SPLIT ting "123456789" using the EVERY 2 option
  • You might even try plowing through the Help screen for SPLIT. You can find this by mousing over the green action triangle at the upper left of the SPLIT box, clicking on Help, and clicking on Full Documentation.
  • Replace "123456789" with the sequence of a gene (e.g. pro0029). How can SPLIT help you extract the codons of a gene?
  • How would you describe what SPLIT does?

d. Investigate COUNTS-OF (STRINGS-SEQUENCE, STRING-ANALYSIS menu)

  • Try getting the COUNTS-OF "A" in the sequence of a gene.
  • Replace "A" with nucleotides. Notice that the result now is a list of counts. Does any number in the list correspond to the first result you got (with "A")?
  • What about the rest of the numbers? It might help to know what nucleotides means. To do this, execute just the box with nucleotides in it. From the result, form a hypothesis as to what the four numbers mean. Test that hypothesis.
  • It might be easier on you if you could label each result with the name of the thing that was counted. You can! Try out the LABEL keyword.
  • How would you describe what COUNTS-OF does?

e. Investigate ALL-DNA-SEQUENCES (STRINGS-SEQUENCES, STRING-PRODUCTION menu)

  • The function evidently calls for a number governed by the keyword LENGTH-OF. Give it a number and execute the resulting function.
  • How would you describe what ALL-DNA-SEQUENCES does?

f. Combine the elements above with the needs you formulated in Steps a and b to determine codon usage in a single gene. You may choose to do this in multiple steps or to combine the elements into one humongous function. That's a matter of style. I would advise that at least at first you use the multiple-step approach so that you can see the results of each operation.

g. Replace the single gene with all coding genes of the organism ss120 and go through the same steps. Did it work? Of course it worked! The computer accomplished what you asked it to do. It always does (almost). But you may realize now that what you asked for is not exactly what you wanted.

h. Analyze the problem and formulate a plan. How might you intervene (in theory) in the sequence of events so that the result would be more to your liking?

i. Investigate SIMPLIFY-LIST (LIST-TABLES, LIST-PRODUCTION menu)

  • Give this function the complicated result of Step g. How do you interpret the product of the function?
  • Test your hypothesis. You can do so readily by giving the function literal lists. A literal list consists of a single quote (' = "Interpret what follows literally, without trying to evaluate it") followed by a list within parentheses. For example '(1 2 a bc) is a