

Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Programming problems related to genome analysis, focusing on the comparison of genome sizes and codon usage in prochlorococcus marinus ss120 and synechocystis pcc 6803. Students will learn how to identify genes and intergenic sequences, analyze codon usage, and investigate the correlation between codon usage and amino acid frequency.
Typology: Assignments
1 / 3
This page cannot be seen from the preview
Don't miss anything!


Problem Set – Introduction to Programming (Biology)
a. What is the length of the genome of Prochlorococcus marinus ss120 (ss120)? How about the genome of Synechocystis PCC 6803 (S6803)? For the following, it might be helpful to think of a genome as consisting of coding regions (the genes) and the sequences in between (the intergenic sequences). b. Does S6803 have more genes than ss120? c. Does S6803 have bigger genes than ss120? d. Does S6803 have bigger intergenic sequences than ss120? e. Summarize what you've found. Why are the genome sizes different?
a. Display the first 2000 nucleotides of the ss120. Not very illuminating! b. Copy the display into a word processor and modify it by highlighting to make evident where are the genes in this region of the chromosome and where are the intergenic sequences. You may find helpful the FROM and TO keywords in the GENES-OF function and the View command on a result's Action menu. c. Do the same with the ss120 chromosome from 9000 to 11000. How might you account for the differences between these two regions? d. Display the same two regions, this time using READING-FRAMES-OF (obtainable through either the All menu or the Genes-Proteins menu, Translation submenu). Note in this display where your genes begin and end. e. Display the amino acid sequence for the proteins encoded by the first gene identified in 6a and in 6c. DISPLAY-SEQUENCE-OF will work for protein as well as genes, and to get the protein, you can use the PROTEIN-OF function (Genes-Proteins menu) or simply enter the gene's nickname (the part to the right of the decimal) preceded by "p-". f. Compare the amino acid sequences of the two genes to
Take a look at a table showing the codons and the amino acids they encode. You'll see that some amino acids are encoded by several codons, others by only one. Why is that? Consider the simple hypothesis that there is a correlation between the number of codons that encode an amino acid and how common that amino acid is in proteins. Are serine and leucine, with 6 codons apiece, 6-times more common than tryptophan and methionine, with only 1 codon apiece? Find out, using as a test the coding genes of the organism Prochlorococcus marinus ss120. a. What information do you need to know in order to answer this question? b. What kinds of functions do you need in order to gather that information?
Here are some functions that might be of use to you:
c. Investigate SPLIT (STRINGS-SEQUENCE, STRING-PRODUCTION menu)
d. Investigate COUNTS-OF (STRINGS-SEQUENCE, STRING-ANALYSIS menu)
e. Investigate ALL-DNA-SEQUENCES (STRINGS-SEQUENCES, STRING-PRODUCTION menu)
f. Combine the elements above with the needs you formulated in Steps a and b to determine codon usage in a single gene. You may choose to do this in multiple steps or to combine the elements into one humongous function. That's a matter of style. I would advise that at least at first you use the multiple-step approach so that you can see the results of each operation.
g. Replace the single gene with all coding genes of the organism ss120 and go through the same steps. Did it work? Of course it worked! The computer accomplished what you asked it to do. It always does (almost). But you may realize now that what you asked for is not exactly what you wanted.
h. Analyze the problem and formulate a plan. How might you intervene (in theory) in the sequence of events so that the result would be more to your liking?
i. Investigate SIMPLIFY-LIST (LIST-TABLES, LIST-PRODUCTION menu)