CS440 Homework 1: Python Programming Assignment, Assignments of Computer Science

A python programming assignment for a university course. The assignment consists of two parts: the first part involves calculating the sum, average, and standard deviation of numerical data from a text file, while the second part requires writing a python module to count the occurrences and percentages of di-nucleotides in a dna sequence from a fasta-formatted file. Students are expected to write the code in separate files, named part1.py and part2.py, respectively.

Typology: Assignments

Pre 2010

Uploaded on 03/10/2009

koofers-user-ykt
koofers-user-ykt 🇺🇸

10 documents

1 / 2

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CS440
Homework 1 (due Thurs 8/31)
The purpose of this is assignment is for you to familiarize yourself with Python. Example
input files for the programs can be found on the homework section of the course website.
1. Some elementary calculations (40%)
A numerical data set (nums.txt) will be provided as a text file, and your task is to
read each line and calculate the sum, average (x), and standard deviation (σ) of the
line. Write these numbers to a file in tab-delimited format such that Each line in the
output file will contain the sum average standard deviation of the corresponding line
in nums.txt. The name of the input file and output file will be given as command-line
arguments. The standard deviation can be calculated as:
σ=v
u
u
t
1
N
N
X
i=1
(xix)2
Write your code in a file named part1.py. Your program for this part should execute
from a linux shell as $python part1.py infile outfile. An example python script is
provided.
Warning: Python distinguishes between integer and float division. Cast a number to
the appropriate type to ensure the correct operation.
Hint: You might find the Python function sum useful.
2. A biological application (60%)
DNA is a complex biological molecule that carries the genetic material for all living
organisms. It is a long molecule composed of the four nucleotides adenine (A), cyto-
sine (C), guanine (G), thymine (T). DNA can therefore be represented as a string of
characters over the alphabet A, C, G, T. Your task for this part is to write a python
module that contains a function that counts the number of times each di-nucleotide
(substring of nucleotides of length = 2) occurs in a DNA sequence. The counts for the
sequence CGTGTGAC, are found in Table 1.
Your function, named dinucleotide count will take the name of a fasta-formatted file
as its only argument and calculate how many times each di-nucleotide occurs. The
function will return a python dictionary whose keys are di-nucleotides, and its values
are a tuple (count, percentage), where count is the number of times a di-nucleotide has
occurred in the file and percentage is the percentage of the di-nucleotide over all records
in the file, rounded to the nearest tenth of a percent. The data file b anthracis.fasta
(the DNA sequences of the proteins in the bacterium bacillus anthracis, a.k.a. anthrax)
has been provided to test your function.
pf2

Partial preview of the text

Download CS440 Homework 1: Python Programming Assignment and more Assignments Computer Science in PDF only on Docsity!

CS

Homework 1 (due Thurs 8/31)

The purpose of this is assignment is for you to familiarize yourself with Python. Example input files for the programs can be found on the homework section of the course website.

  1. Some elementary calculations (40%) A numerical data set (nums.txt) will be provided as a text file, and your task is to read each line and calculate the sum, average (x), and standard deviation (σ) of the line. Write these numbers to a file in tab-delimited format such that Each line in the output file will contain the sum average standard deviation of the corresponding line in nums.txt. The name of the input file and output file will be given as command-line arguments. The standard deviation can be calculated as:

σ =

N

∑N

i=

(xi − x)^2

Write your code in a file named part1.py. Your program for this part should execute from a linux shell as $python part1.py infile outfile. An example python script is provided. Warning: Python distinguishes between integer and float division. Cast a number to the appropriate type to ensure the correct operation. Hint: You might find the Python function sum useful.

  1. A biological application (60%) DNA is a complex biological molecule that carries the genetic material for all living organisms. It is a long molecule composed of the four nucleotides adenine (A), cyto- sine (C), guanine (G), thymine (T). DNA can therefore be represented as a string of characters over the alphabet A, C, G, T. Your task for this part is to write a python module that contains a function that counts the number of times each di-nucleotide (substring of nucleotides of length = 2) occurs in a DNA sequence. The counts for the sequence CGTGTGAC, are found in Table 1. Your function, named dinucleotide count will take the name of a fasta-formatted file as its only argument and calculate how many times each di-nucleotide occurs. The function will return a python dictionary whose keys are di-nucleotides, and its values are a tuple (count, percentage), where count is the number of times a di-nucleotide has occurred in the file and percentage is the percentage of the di-nucleotide over all records in the file, rounded to the nearest tenth of a percent. The data file b anthracis.fasta (the DNA sequences of the proteins in the bacterium bacillus anthracis, a.k.a. anthrax) has been provided to test your function.

Homework 1

Table 1: Example Counts

CG 1 GT 2 TG 2 GA 1 AC 1

You have been provided a fasta parser that you will import into your code and use to help read the fasta file. Look for the file fasta.py on the homework section of the website, and use the class fasta itr. Information about the fasta format can be found at: http://en.wikipedia.org/wiki/Fasta_format. Visit/Email the TA for help on using this parser. Put your function into a file called part2.py.

Submission Please tar your source code files, part1.py and part2.py into a file named assign1.tar and submit the tar file via WebCT.