

Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
A homework assignment for a computational biology course, where students are required to implement hierarchical clustering and k-nearest neighbors classification methods to distinguish between ovarian and breast cancer tumor samples based on microarray expression vectors. Students must submit their program source code, cluster tree, and cluster outputs in hardcopy.
Typology: Exercises
1 / 2
This page cannot be seen from the preview
Don't miss anything!


Submit your answers in hardcopy; also submit your program source code for problem 1 via provide using the syntax
% provide comp167 hw4 myfilename1.here myfilename2.here
Make sure to include a readme file with instructions how to compile (if ap- plicable) and run your code.
a) Compute the percentage of the cell-line-validation samples correctly clas- sified based on the training data for k = 1, 3 , 5 and 7, and report this in a table.
b) Which value(s) of k do best? Choose a value of k that does best, call it Kopt.
c) Now classify the unlabelled sample in the file cell-line-test.txt using Kopt-nearest-neighbors. Do you predict it to be a sample from breast or ovarian cancer cell lines?