Assignment 4 with Solution for Applied Bioinformatics | BIT 150, Assignments of Bioinformatics

Material Type: Assignment; Class: Applied Bioinformatics; Subject: Biotechnology; University: University of California - Davis; Term: Fall 2008;

Typology: Assignments

Pre 2010

Uploaded on 07/30/2009

koofers-user-kbl
koofers-user-kbl 🇺🇸

5

(1)

10 documents

1 / 16

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
BIT150 – Fall 2008 –
Homework 4 KEY
Due on Thursday October 23th by email to TA: [email protected] as
Hwk4_Lastname BEFORE the Lab
1. 20 points Using Pregap4 and Gap4, assemble the 21 sequences that are part of
Triticum monococcum L. BAC clone 322N9.
In the ‘Configure Modules’ tab of Pregap4:
General Configuration: For Get entry names from trace files, select No.
Estimate Base Accuracies: Select Logarithmic (Phred) scale.
Trace Format Conversion: Leave default parameters.
Initialize Experiment Files: You cannot modify it.
Augment Experiment Files: Do not add any extra information.
Quality Clip: Leave default parameters.
Sequencing Vector Clip: In Select Vector-primer subset, select pBS/HindIII.
Screen for Unclipped Vector Clip: OK.
Cloning Vector Clip: Unselect this option.
Gap4 Shotgun Assembly: In Gap4 database name, put a name for your output, and
make sure you change the version every time you perform a new assembly with
different parameters. Create new database, and RUN.
Answer the following questions:
ANSWERS
- According to the database information:
1.1. Were all the sequences provided used to perform the assembly?
Yes. The 21 sequences were used to perform the assembly.
1.2. How many contigs were created? How many sequences were included in
each contig? What is the length (bp) of each contig?
2 contigs were created. The larger contig included 20 sequences, and had a
length of 3,600 bp, and the smaller contig included 1 sequence, and had a length
of 54 bp.
- Look at the confidence values used for the base-calling:
1.3. Present a confidence value graph for all contigs.
1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Assignment 4 with Solution for Applied Bioinformatics | BIT 150 and more Assignments Bioinformatics in PDF only on Docsity!

BIT150 – Fall 2008 –

Homework 4 KEY

Due on Thursday October 23th^ by email to TA: [email protected] as

Hwk4_Lastname BEFORE the Lab

1. 20 points Using Pregap4 and Gap4 , assemble the 21 sequences that are part of

Triticum monococcum L. BAC clone 322N9.

In the ‘ Configure Modules ’ tab of Pregap4 :

 General Configuration : For Get entry names from trace files , select No.

 Estimate Base Accuracies : Select Logarithmic (Phred) scale.

 Trace Format Conversion : Leave default parameters.

 Initialize Experiment Files : You cannot modify it.

 Augment Experiment Files : Do not add any extra information.

 Quality Clip : Leave default parameters.

 Sequencing Vector Clip : In Select Vector-primer subset , select pBS/HindIII.

 Screen for Unclipped Vector Clip : OK.

 Cloning Vector Clip : Unselect this option.

 Gap4 Shotgun Assembly : In Gap4 database name, put a name for your output, and

make sure you change the version every time you perform a new assembly with

different parameters. Create new database, and RUN.

Answer the following questions:

ANSWERS

- According to the database information:

1.1. Were all the sequences provided used to perform the assembly?

Yes. The 21 sequences were used to perform the assembly.

1.2. How many contigs were created? How many sequences were included in

each contig? What is the length (bp) of each contig?

2 contigs were created. The larger contig included 20 sequences, and had a

length of 3,600 bp, and the smaller contig included 1 sequence, and had a length

of 54 bp.

- Look at the confidence values used for the base-calling:

1.3. Present a confidence value graph for all contigs.

1.4. Present a quality plot for the larger contig and indicate, approximately,

a. for how many bases the quality of the consensus sequence was OK on

both strands, and

For ~1,400 bp the quality of the consensus sequence was OK on both

strands.

b. for how many bases the quality of the consensus sequence was OK

only on the plus strand and only on the minus strand.

For ~750 bp the quality of the consensus sequence was OK on the

minus strand, and for ~1,400 bp the quality of the consensus sequence

was OK on the plus strand.

1.5. According to the list of confidence values for the consensus sequence,

a. How many bases in the consensus sequence had a confidence value of

12 bases in the consensus sequence had a confidence value of 20.

b. How many errors do you expect to be contained in those bases?

I expect 0.12 errors to be contained in those bases.

a. How many mismatches can you see? Where are they located

(beginning, middle, end of the assembly) in the consensus sequence?

I can see 4 mismatches. They are located at the beginning of the

consensus sequence.

b. Inspect the chromatograms to see their quality for each mismatch.

Using Shift/PrintScreen show, for one mismatch, the quality of the

chromatograms with the confidence values, and indicate if Gap

performed well in the calling of this particular base or not.

According to the quality of the chromatograms and the confidence values

with which T was called, Gap4 performed well in the calling of this base.

2. 20 points Use the PlantGDB GenSeqer

(http://www.plantgdb.org/PlantGDB-cgi/GeneSeqer/PlantGDBgs.cgi) to predict the exon

  • intron structure of the Triticin gene present in this genomic region of T. monococcum.

Select the rice splice site model.

Use the correct protein sequence provided below the genomic DNA sequence to be

entered in Step 3 (in the right window). The run will take some minutes.

Tm Triticin genomic DNA ATAATTTTTTGAATATATGGTATTGTTTTCGAAAATATGAGCAAAAGTAAATAAAATCGAATGAATAAACAAATAAAACTGAAGTAACC TGCAGTAGGATACTTGGGCCGGGCGAACCAGACGCTCGCTGTAGGCGAGGAGGGTTTCTATCTCGTGTCGGGCGAGGTATAGCCAATGT TTGGACATCCTTACTCTTAAATAGGGCACTCCATCAGTGTAGAAATGTTGGGTCAACAGCACATAAGATAATTAAGACTGGACCTATTA CTAATACGGTCTCATTTGCGTGGAGATGCCTCTTCTTCTCTCTTAAACGGGAACAAAACAAGTGTGAATCTTTTCCCTCAAAAAAAAAG TTGTGATTAGTTTGCTCCAACAAGTGCCTTTTCTTCTTATCTGAACTTTTCATATCTGCACTCCTTGTATCCAGATGGATAATGCATCA TCGTGGAAAGCTTCTTTTTTTCCTTATCCTGATACGTATTAAGTGTGTGCCTACGTGGATGTGTGTCATCCAATAAACACTACTTTGGC GAAGCAAAATCTCATCGTCACACAAGTGCAAAAGTATCCAAATGTATTATTACGCAGCAACTAAAGTGATGTATTCAGTTAGCGGTGCA GTAAACAAACAAGCATAAAAGCGAGCACTAGACAGAAGTTGGTACAGATATGTTTGAGAAGTTTGTAAGTTACGTGAAAAAGAAACACT TAACCAACGTGTACGAAATCCTTTCTAGAAATAATGATGTAGATGATCGTTAAAGACAACACAGAAAGGATTTTCTATGTTTGCGTGCA GTCATATATAACTTTAACTCATGAGAGTTGGTTAGAGCCACTCAACTTTCCAATTGTCGAAATTATCATATGTTCTACATATTTATCTT AACAAATCTTGTCTCTACATAAGTTCTAGGTAGATTTTGTTGGC TATAAAA GCCACACACATCTCCCAACGCTAACATCAAGAAAACTT CTCTCTCCTCTTCAAACAGCT ATG GCAGCCACTAGTTTCGCTTCGCTCTCTTTTTACTTTTGCATTTTGCTCTTGTGCCATAGCTCCAT GGCACAACTGTTTGGCATGAGCTTTAACCCATGGCAAAGCTCTCGCCAAGGGGGTTTCAGAGAGTGTACATTCAATAGGCTTCAAGCAT CTACACCACTTCGTCAAGTGAGGTCACAAGCAGGCCTGACCGAGTATTTTGATGAGGAAAATGAGCAATTTCGTTGTACTGGTGTATTT GTCATCCGTCGTGTAATCGAACCTCGTGGTTATTTGTTACCGCGATACCACAACACTCACGGATTAGTCTACATCATCCAAG GT TTGTG TAGTAATTTAATTAATATAGTTACCATTTCATATTACTAATAGTTTCTTGAGATAAAGGTTACTATGTTTAGTATTTTATTTATTAACA CGTTTCTAATACTAACATGCAGATATGTTGTCACCC AG GAAGTGGTTTCGCCGGATTGTCTTTTCCTGGATGCCCGGAGACATTCCAAA AACAGTTTCAAAAATATGGGCAAGCACAATCGGTACAGGGACAAAGCCAAAGCCAAAAGTTCAAAGATGAGCACCAAAAAGTTCACCGT TTCAGACAAGGAGATGTCATTGCACTACCGGCAGGCATTGTACATTGGTTCTACAATGACGGTGATGCGCCAATTGTGGCTATCTATGT TTTCGACGTAAACAACTATGCTAATCAACTTGAGCCTAGGCATAAG GT AACAAATGATCTTTGAGACAAATCTATGTGGGGTCAATAAG TCTATTCAACTAACCTGTTGTATTTAATGTAGTTTACAAAGTGACATGTTGTTTAATTTCTTTTCTTGATCAATCTTGT AG GAATTTTT GTTCGCTGGCAACTATAGGAGTTCGCAACTTCACTCTAGTCAAAACATATTCAGTGGTTTCGATGTTCGATTGCTTGCTGAGGCCTTGG GTACAAGCGGAAAAATAGCGCAAAGGCTTCAAAGTCAAAATGATGACATAATTCATGTGAATCATACCCTTAAATTTCTGAAGCCTGTT TTTACACAACAACGAGAGCCAGAATCCTACCCACACACTCAATATGAGGAAGGGCAATCTCAGGCAAAACCCTCTCAGGAAGAGCAACC TCAAATGGGGCAGTCACAGGGAGACCAACCTCAAATGGGGCAGTCTCAGGGAGAGCAACCTCAAACGGGGCAGTCTCAGGGAAAGCACA TTCAGGGAGAGCAACCTCAAATGGGGCAGTCTCAGGCAAAACACTATCAAGGAGACCAACCTGAAGAAGGGCAGGGAGGGCAATCTCAA GAAGAACAATCTCAGGCAGGGCCATATCCGGGATGTCAACCTCATGCAGGGCAATCTCATGCATCACAATCAACTTATGGTGGTTGGAA TGGTTTGGAGGAGAACTTTTGTGATCATAAGCTAAGTGTGAACATCGACGATCCCAGTCGTGCTGACATATACAACCCGCGTGCCGGTA CGATAACCCGTCTCAACACCCAAACGTTCCCCATCCTTAACATCGTGCAAATGAGTGCTACAAGAGTACATCTCTACCAG GT AATTGTG ATATTGTGTTTTTTCATACTCTTTTATATTCAAAGCTTCACAATGCAATTCTAACGTTATACCTTACATAATTTATGATCGC AG AATGC CATTATTTCACCATTATGGAACATTAATGCTCATAGTGTGATGTACATGATCCAAGGACATATCTGGGTTCAGGTTGTCAATGACCATG GTCGAAATGTGTTCAATGGCCTTCTTAGCCCGGGGCAACTATTAATCATACCACAGAACTATGTTGTTCTCAAGAAGGCACAACGTGAT GGAAGCAAGTACATTGAATTCAAGACTAACGCAAACTCCATGGTTAGTCACATCGCGGGAAAGAGCTCAATCCTCGGCGCCTTGCCCGT TGATGTCATCGCCAATGCATACGGCATCTCTAGGACAGAAGCTCGAAGCCTCAAATTTAGCAGGGAAGAGGAGCTCGGAGTATTCGCTC CTAAATTCAGTCAAAGTATCTTCCATAGTTCTCCTACCAGCGAAGAAGAGTCATCT TAA GAGCGCATGAGCTAATGTCAAAACTAGCTC ATGGACTAA AATAAA CACATCATTGAGTGTGTAGCACTTTGATGTTTCCATATATGGTCGTCTCAATAAGATATCAACAAAGGTCCATT GTGTTTCATATGTTTACCTTTCTGGAAATTTCATGAACTTTGTTTTGCAAGTTGCATTCGCGAATTCTTCATCTAGATAGTGTGCATAT GCTATCGTATTTGTACTACTATTCTATGTGGTAGTGGTTCTCGTTTCTCATTGTAGCGATACAAATTCTCACCATAGCAATAAACACCA ATGTGTCAAAGCCGGTCTGTATAGTTGGTCGCCGGTGCCGAACCCTGTCTATGTAACCATCAGCTACTCTATGTTTCTTCTTCATCAAT GAAAATCATCTCTAGCTGCTTTTTCGTCAAAAAAATAAAATAAACAGCAATGTGTTTTTCGTTTGTGTTTGCACTGACATAAACATCAC TCACTTGCTAACTAACCCTATTAAACACCAGTGTGAGGTGGCTACGGTCAGCTCAATACATTCTTCTATGTGCCATTGGCCCCATTTCA CTTGTGTTGTCTTCTACAAAAATCATATGGTCGTTGGGTGAGTAATTTCTCATTCGCGCTTAAGACGAATGAACGATGATGTAGATTTG TAGAGTGCCCTTGAGTTCTTCCTTTGCAATTGGGTTCAACTCTTTTTTTATGGGGAAATTAGGTCTGGCCACTATCCAACTTGATATTT GATGCACGCCACTTTTCTTTTGAAATTGTGGTTGTGCCCTAACGATTGCGAATAAATTGGGCAGAAGCCCTATGCGTTTGCCACCAAGT TTTCACCTCATTTGACCCATTTTTTTTTCTTTTGTGAGTCTCAATCGTGGCAATAACGGAGGGGAGACTCATATGAAACATCCATTCGA TGTGTTGGCCATCATTTGGCCATGTTGTCAACTATGAATAGGAGAGGTCCGTCCCAAGAGTGACGGGTTGGTTTTCCTCT Tm Triticin protein MAATSFASLSFYFCILLLCHSSMAQLFGMSFNPWQSSRQGGFRECTFNRLQASTPLRQVRSQAGLTEYFDEENEQFRCTGVFVIRRVIEP RGYLLPRYHNTHGLVYIIQGSGFAGLSFPGCPETFQKQFQKYGQAQSVQGQSQSQKFKDEHQKVHRFRQGDVIALPAGIVHWFYNDGDAP IVAIYVFDVNNYANQLEPRHKEFLFAGNYRSSQLHSSQNIFSGFDVRLLAEALGTSGKIAQRLQSQNDDIIHVNHTLKFLKPVFTQQREP ESYPHTQYEEGQSQAKPSQEEQPQMGQSQGDQPQMGQSQGEQPQTGQSQGKHIQGEQPQMGQSQAKHYQGDQPEEGQGGQSQEEQSQAGP YPGCQPHAGQSHASQSTYGGWNGLEENFCDHKLSVNIDDPSRADIYNPRAGTITRLNTQTFPILNIVQMSATRVHLYQNAIISPLWNINA HSVMYMIQGHIWVQVVNDHGRNVFNGLLSPGQLLIIPQNYVVLKKAQRDGSKYIEFKTNANSMVSHIAGKSSILGALPVDVIANAYGISR TEARSLKFSREEELGVFAPKFSQSIFHSSPTSEEESS*

ANSWERS

Predicted protein(s):

FGENESH: 1 4 exon (s) 1001 - 3085 577 aa, chain + MAATSFASLSFYFCILLLCHSSMAQLFGMSFNPWQSSRQGGFRECTFNRLQASTPLRQVR SQAGLTEYFDEENEQFRCTGVFVIRRVIEPRGYLLPRYHNTHGLVYIIQGSGFAGLSFPG CPETFQKQFQKYGQAQSVQGQSQSQKFKDEHQKVHRFRQGDVIALPAGIVHWFYNDGDAP IVAIYVFDVNNYANQLEPRHKEFLFAGNYRSSQLHSSQNIFSGFDVRLLAEALGTSGKIA QRLQSQNDDIIHVNHTLKFLKPVFTQQREPESYPHTQYEEGQSQAKPSQEEQPQMGQSQG DQPQMGQSQGEQPQTGQSQGKHIQGEQPQMGQSQAKHYQGDQPEEGQGGQSQEEQSQAGP YPGCQPHAGQSHASQSTYGGWNGLEENFCDHKLSVNIDDPSRADIYNPRAGTITRLNTQT FPILNIVQMSATRVHLYQNAIISPLWNINAHSVMYMIQGHIWVQVVNDHGRNVFNGLLSP GQLLIIPQNYVVLKKAQRDGSKYIEFKTNANSMVSHIAGKSSILGALPVDVIANAYGISR TEARSLKFSREEELGVFAPKFSQSIFHSSPTSEEESS*

This predicted peptide showed:

Score = 1179 bits (3050), Expect = 0. Identities = 577/577 (100%), Positives = 577/577 (100%), Gaps = 0/577 (0%)

after being blasted with the correct Triticin protein whose sequence is provided in Q2.

3.2. GENSCAN (http://genes.mit.edu/GENSCAN.html)

GENSCAN 1.0 Date run: 22-Oct-106 Time: 19:59: Sequence 19:59:38 : 4085 bp : 40.81% C+G : Isochore 1 ( 0 - 43 C+G%) Parameter matrix: Arabidopsis .smat **Predicted genes/exons: Gn.Ex Type S .Begin ...End .Len Fr Ph I/Ac Do/T CodRg P.... Tscr..


1.01 Intr + 1063 1328 266 1 2 77 76 198 0.357 18. 1.02 Intr + 1463 1737 275 0 2 66 87 135 0.995 12. 1.03 Intr + 1862 2572 711 1 0 89 91 354 0.997 31. 1.04 Term + 2666 3085 420 1 0 50 48 396 0.950 31. 1.05 PlyA + 3125 3130 6 1. Image of the predicted gene(s) Predicted peptide sequence(s):**

19:59:38|GENSCAN_predicted_peptide_1|557_aa XSMAQLFGMSFNPWQSSRQGGFRECTFNRLQASTPLRQVRSQAGLTEYFDEENEQFRCTG VFVIRRVIEPRGYLLPRYHNTHGLVYIIQGSGFAGLSFPGCPETFQKQFQKYGQAQSVQG QSQSQKFKDEHQKVHRFRQGDVIALPAGIVHWFYNDGDAPIVAIYVFDVNNYANQLEPRH KEFLFAGNYRSSQLHSSQNIFSGFDVRLLAEALGTSGKIAQRLQSQNDDIIHVNHTLKFL KPVFTQQREPESYPHTQYEEGQSQAKPSQEEQPQMGQSQGDQPQMGQSQGEQPQTGQSQG KHIQGEQPQMGQSQAKHYQGDQPEEGQGGQSQEEQSQAGPYPGCQPHAGQSHASQSTYGG

WNGLEENFCDHKLSVNIDDPSRADIYNPRAGTITRLNTQTFPILNIVQMSATRVHLYQNA

IISPLWNINAHSVMYMIQGHIWVQVVNDHGRNVFNGLLSPGQLLIIPQNYVVLKKAQRDG

SKYIEFKTNANSMVSHIAGKSSILGALPVDVIANAYGISRTEARSLKFSREEELGVFAPK

FSQSIFHSSPTSEEESS*

This predicted peptide showed:

Score = 1137 bits (2941), Expect = 0. Identities = 556/556 (100%), Positives = 556/556 (100%), Gaps = 0/556 (0%)

after being blasted with the correct Triticin protein whose sequence is provided in Q2.

Explanation Gn.Ex : gene number, exon number (for reference) Type : Init = Initial exon (ATG to 5' splice site) Intr = Internal exon (3' splice site to 5' splice site) Term = Terminal exon (3' splice site to stop codon) Sngl = Single-exon gene (ATG to stop) Prom = Promoter (TATA box / initation site) PlyA = poly-A signal (consensus: AATAAA) S : DNA strand (+ = input strand; - = opposite strand) Begin : beginning of exon or signal (numbered on input strand) End : end point of exon or signal (numbered on input strand) Len : length of exon or signal (bp) Fr : reading frame (a forward strand codon ending at x has frame x mod 3) Ph : net phase of exon (exon length modulo 3) I/Ac : initiation signal or 3' splice site score (tenth bit units) Do/T : 5' splice site or termination signal score (tenth bit units) CodRg : coding region score (tenth bit units) P : probability of exon (sum over all parses containing exon) Tscr : exon score (depends on length, I/Ac, Do/T and CodRg scores)

to predict genes within the same sequence of T. monococcum containing the Triticin gene

you annotated above ( Q2. ). Compare the predictions performed by each of the programs

with the results from the correct annotation you made in Q2.. For each program describe:

a. How many of the exons were identified?

b. How many of those were predicted correctly without errors, according to the

correct annotation you made in Q2.?

FGENESH is the program that makes the best prediction of the Triticin gene,

compared with the results from the correct annotation made in Q2. GENSCAN

performs a good prediction, differing only in part of one exon, Exon1.

4. 20 points In the following sequence from T. monococcum annotate the repeat elements

present. The Triticeae Repeat Database TREP

(http://wheat.pw.usda.gov/ggpages/Repeats/blastrepeats3.html) will help you to identify

the elements. In TREP , select blastn program and Cereal repeat sequences, complete

set database.

ANSWERS

FGENESH GenScan

Identified exons 4 4

Correctly predicted exons 4 3

GATTTTCTTCCACATTGAAAGGCGGGTTCTTCTCCAGGGCTATCTTCCACGTGTCAAGCGAATGCGAGGAGCCAG

GGCATTTTTACATTTGCCACCCTAGCCATGTAAGTAAACTGTCTATTAAAGAGACGGGGATTCTTAGATCCAGAC

CACACCATCCTACCACGACGAGTATCCATCGGAGCGCGCTCGGAAAAATCCATTCCAACATGGCCGGCCATCGCA

GCTCCTCCTCCCGCTCCCGTAGCCCTAAGCCCGGTGATTGAGAGAAGTGTTCCATCCCACATAGCCAATTAGTAG

AGTTACAGACCCAGTGATTTCTCCCTCCAGTGTATACGGTCCCCGTTCGAGCCGGGCTCGCCACCTATAATGGCG

AGGAGCAAGCGGAGAGCTTCCCCAATCCCTCCAAGGGGGAGCGGGTATGCCTTGTCCCTTATTTACTAAGGGGAC

TTGGATTTCTAATCCATCCATTCCTCCGCGGGCTCCTGGAGTTTTATGGCCTCCAGTTGCATAATTTCACCCCCG

CTTCCATATTACACATCGCTGGTTATGTTGCCCTTTGTGAGCTGTTTCTGGGCTGTGAAGCTCATTTCGAGCTGT

GGAAGAGGCTATTATGCCTTGTGCCCCGTACACAGAAGGGGTCAACATATCAAGTGGGCGGAGCCGAAATATGGC

GCATCATTGGGACCGGATACCTGTCCGATACCCCGAAGAAGACGTCTGAACACTGGCCTTTGGAGTGGTTTTACA

TGGAGGACGTCCCTCTTCCAGATCCTATTCGGATGGGTCTTCCCGAGTTTGACAATGGACCTTTGAAGAAACACC

GAAGCTGGCCCCCACGGAGCCCTCAGGAGGAAGATAACGGAGAAGTCTTTTACCTGATGGGCCGGATAAAAGTGC

TGGCTCAATCAAGACTGACAATGATCGAGGTTATCTCAATATGTATAATGCGGGGGTGCAGCCACTTCAATATCG

AGGGCACCCCATGTGGCGCTTTAACGGGGAAGATGATGCCACCCGCTGCGGGCGTAAGGGACCGGACTCAGTTGC

TGCTCTAGCAAAAATTCTGCCCGATTTGTTCAAGGGAGAGGGAGAGGAGTTCATCCGTATTAAGCCACGGGATGG

ATTCTCCATGTACAACCCTCCAAGCTGGGTGAGCTGTCACTTTTTACTCCCAACCTTCCTGCATTCTTCGTCATA

AATACTCACCTTGCCGTTTCATGGCAGGAACTATGAAAAGCTGTAAGGGAGGTCAACATCACACCTCCACAACCG

GAGGACCCTGACCGGGCCCTTGATCCCGGACTCGAAGAGGATCCGGACATATTTGTGGAGCTGATCGACCGGACA

TTCTATCAATTGAGCTGCGATGATGCCATGGTGGCCATCATAGCCGACTATCCTGGGCTACTCCCTACATCGCAT

GTAAGAAAAACCGGAGTCCCAACTTTCGAGAAGGATCCCCTTTTATGCACTTCACCTACCGTACACATATGGTGT

TTTACAGGGAAGGCCCTCGGGACGCCATGCCGAACCACAGTGACTCGCCAACAAGGGACACCGAAGCCGGGTAGG

CTAAAAAGGAAGGCGGTCAGGACTGAGACGCTGGCGCAGAGGTATGATACAGAACATCGCTCTATGGTTGTCGTC

TTTCTTAAAATATAATGACGTCCATGTTTCCTAGTAGGAAAAACGCTCGCCGGACTATATCCGGAGAGGTTGCCG

ATCACGCCTCCACCAGTTGGGCTCCAAGGCCGGACATAGAGGTGGAAGCTAACACGGGGCGAACGCCGGATGCTC

CTCCTACAGAGGATGCAGATAGACTATCCGCTACAAATTCTGAAGTGGAGAGCGCCATGAATCACAGGCATCGCC

GGGCCATACTCCGTGACGCTTGTTTCTCCCAAGATGCGTTCGATGCCTTCAACTCAAGAGACGCGTACATCCGTG

CTGCTCAAAACGCTCTTGCCAGAGCTACAGACCAGTATGTAAAGGATATACGGGTAAGAAAATCTGACGATTATA

TATATCAGTAGCCCCTGAGACTTGAAACAATTGACACAACTGATTTAAGGATCATTATTTGTGCAGGTTCTTACA

GAGAAGAATACCCATTTGTCTCAAGAGCTGGAAGAGTGCCAAGCCCAGCTATGGGCCGCGGTTGCCGCATTAAAG

GAGTACGAGAAGGCTCCCTCTGGTAACAATTGTTTCCACTGCAAAGAATGACAGGTACCAGATAGTATGTGGCAT

GCATGGAAATCTAACAATAGATGTTGCAGATGACTCCAGAGTGAATCCGGAGGCCGTGCCGGAGAATGCGCAACA

TGCTAATCGACAGCTAACTGCTAGCGAGCGCGTGCTTACGCGGGTCAGGCAGGCGAAAAATAATCTCCAGGACGC

CAATATCCAGCTGGGCGTGGAACTGAAAGATGTCCGAGCCCAGCTTGCGGACTTCGTGAAGGAGAATAAGCGGCT

TTGACGGGACATTTTCCATAAAGGGTTTGAAGGAATTTTTGAAAGAGTTGGGGGAAGAAACCGGCTTAACAGAGT

TAATTCTGCAGGTATGCTGAGGGGTCATCCCCCCGAGGGAGATGCCCGGGTTTGGGGGGTAATCTTCTACCAGAG

CTCTCACAACTGCACGAACAAGTTCGGCAGGTGATGCAGGGTATTGCCCAACCCTTGTGGCCTTCCGCCTCCCCG

CCCAGGAGGCATGGGGGAGCTTATAGAGAAGCTCAAGGGAGCGCAGCGGCGCTTCCGATTATGAAAGATATCAGC

CTGTCGACAAGACGCAAGGGAAGCCTGGGCCATGGTGAAGACGCGATATACCAAAGCTGACCCAAACCACATGGT

CGAGGTCGGACCTATGGGGCCCGATGGGAAAGAATTCCCCGTCAGTTTGGTGTATGACCAAATAGAATTAGCTGC

AAAGTATTCCCAACAGGATTGTAGGCTAGACAGCCTGTTGGATGGTATAGAAGAAGAATTTAGTCAGTCTTAGTG

ACTGTGTACTTCAAGTGACATATATAATGTCCCTAGCCGGATTGTAATCCATTTTCATGGAGGACCTTTTTGCTT

CGACCTCT GGACCTGTCGGTGTACTAAAGTAGGGGTACCTTAGTATCCCGAACTTGTGCACGGGCAGTCGCAGCG

ACCCGCGGCAAGGCTTGCCGGGCGACTGCCAAGACCCTCCTTAGCTCCATTGGGGCCATTCAAGAACAAAGTATC

CAAGAGCAAAACGAGACCCCGGCAAGAGGAGCTTGCCGGGAAGGAGACAAGGCCCCGGCAAGAGGATCTTGCCAG

GAAGGAGACGAGACCCCGGCAAGAGGAGCTTGCCGGGAAGACCAACCAAAGCATCTCAAGAAGCTCGCTGCGGCG

CACCATGCGTCCCGACAAGACCCGGTGAGCGACAAGCTCCCGGACGCGGCAAGGCAGCATCCGCGGCAAGGCGCT

TGCCGCGGCAAGCTACTACTCCGCAGTCCGCGCTCCAGCTCACCCACCAACGTGTCGCTCTGGGACCGCTCCTGG

CGTACGTGGCGGGAGGCTGTGCAGCCGGGGGCATGCGGTGGCAAGCAACCGCTGACAAGATAGCCATCGTGGCAA

ACGGTGGCGTTCCTAGCGGTCCCTCTCTATACCGTCTTGGTGACACATACGGGCATTTAATGCCCTTGTCCCCTG

CCATCAGGGTTAGGTATGATACGACTGTAGCAGGTGGTGGCGCCAACTACCACCCCTTTTACATTTTTACCCTCA

CCTCTGTTGCCACATGTCGGTGACCCCTTAGACATATAAAAGGAGGCCCATGCGCAACAGAGAGGGGGGGACGGC

CATTCTCACGCTCGGTCTCGCCAGTTGCTTGAGTGTAATGTAGCACTCGCCGCTCCCGAGCAAGAACTCAATACA

ACCACCAAAGTAGGAGTAGGGTTTTACGCATCCGTGCGGCCCGAACATGGGTAAACCAATCATCTCGCTCGTGCG

TTCGCCTCGATCTGCTCTTTGTGCATCCACCGCCCCCTGCCGAACCGAAAGGGGCTCGGTCCGCCGATCCCCATA

GGTGTGCATGGGATCAGCAACCCCCGACA TCTTTGGCGCGCCAGGTAGGGGGTGCGTCGAGGTTGTGCGAACCTG

ATCCGGCGTTCGCACGAGCCGAATCTCCATCGTCTTCATCAACATGCCACCGAAGAAGAAGGCTTCGGAGGTAGC

CGCTCCGTCTGCGCCGAGCCAGCCATCGccGGAGCGAACGGCTGGTGGAGAAGGCACCGACGGAAGAGCGGGCGC CGACGGGGAGACCCATGGTGCTGCCAGGTCCAAGGACCAGGTTGCACCGCATGTCGGCGGCGCGGTCGCCTCCCA CAACGACCACGCGCTCTCCAAGACCTCGGCGTCCATACACGCACCGCGTCCATCCCAGGAGGCGCGGGACCAGCA GCGTCATGGTACCCACGGTACCATACGTCTGCTGGGCCAAGATCAAGCTGGCGGATCCCGAAGTGCTCATGACTA GCATGCACGCCAGCATGCTGGCGCGAGCGTGGCGCGGTCGCTTGGACGAGATGTTGCTCCTTCTAACGCGGAAGC GGAGGCTGTGCTGCAAGACTTGAAGAGGTACCTGTCCTCCACTCCGACACTTGTCGCGCCTAAACCACAAGAGAA GTTGCTGCTGTACATAGCGGCAACCAATCAAGTGGTTAGTGCTGCGTTAGTAGCGGAGAGGGAGGCAGATGACGA GCCAGCGACCGCGGCAAGCACATCCAGCGACAAGCAGGGGGGCTTTCCCGACAAGCTCTGGTCCCAACAAGCAAG GGTCTGCGCAGATGCAAGAGGAGATACAGAAGAAGATGGTGCAGCGCCCAGTTTACTTTGTCAGTTCCCTTTTGC AGGGGGCTAGGTCAAGGTACTCTGGTGTGCAGAAGCTGCTTTTCGACCTTCTCATGGCCTCGAGAAAGCTGCGCC ATTACTTCCAAGCACATGAGATCAAAGTTGTCACTCGCTTTCCGCTGAAGAGGATATTGCAAAATCCAGAAGCAA TAGGCAGGATTGTCGAGTGGGCACTGGAACTGTCAAGCTTTGGCCTCAAGTTTGAGAGTACATCAACAATCCAGA GCAGAGCATTGGCAGAATTCATAGCAGAGTGGACGCCAACTCCAGACGAAGAAATTCCGGAGACGAGCATCCCCG CCAAGGAAGCAAGCAAAGAGTGGCTCATGTACTTTGACGGTGCTTTCTCGCTGCAAGGCGCCGGTGCTGGTGTAC TGCTTGTCGCACCCACCGGAGAGCACCTCAAGTACATAGTCCAGATGCACTTCCCCAAGGAGCAAGCGACAAACA

ATACTGCAGAGTATGAAGGCTTGCTTGCCGGTCTCAGGATCGCGGCAGAACTTGGGATTAAGAAGCTCATTGTCA

GGGGTGACTCGCAGCTTGTCGTCCGCCAAGTGAACATGGTTAACCAGAGTCCGTTGATGGAAGCATACGTTGATG

AAGTGAGGAAGCTAGAAGAGCACTTTGACGGCCTGCAAATGGAACATGTTCCAAGAGCTGAGAACGACATTGCCG

ATGGCCTGTCAAAGTGCGCAGCACTTAAGTTACCTGTGGAACCAGGGATCTTTGTGCTCAAGCTGACTCAGCCAT

CCGTAACACCATCAACTGGACAGAGCAAGAAGAGGAAGTTGATTTCTGGTGACTATCTTCCGGCAGAGCTTCCTG

AAGCCGCCGCCAAGAAGGTCCCCAGGATCGACGCCAAGAGTGTTGAGGAGCAGTCTACTCCGGCAAACCCCAGGG

TTTGTTCCGTTGCAGCAGACACTCCCGGCAAGCTCTGCTCCGGCAAGCTTGTCGGGGAGCGTCAAGCTCCGGCAG

AGTCGCAGGCTCTCGCCGTAGAAGCAGACGTTCCCGCAGCAGCAGATGTGCCTTTAGTCCTTGTTGTTGAGCCAC

AGGCTCCAACATGGGCGCAGCAGATTGTCCGTTTCCTTCAGACAGGAGAGCTTCCTGAGGAACTAGAAGAAGCGG

AAAGAGTAGCCCGGCAGTCTAGTATGTACCAGTTTGTCGACAACACACTATGCAGAAGAAGACTCAACGGTGTGA

AATTGAAGTGTATTCCTCAGGAAGATGGACAGAAGCTGTTGGCAAAGATACATGGAGGCATATGTGGTCACCACA

TTGGCACGATAGCTCTTGTCGGAAAAGCATTCCGGCAAGGCTTCTTCTGGCCGACAGCCCTCCAGGATGCAACTG

CACAAGTAACCAAGTGTGAAGCATGCCAGTTCCATTCCAAGCAGATACACCAGCCAGCTCAAGCTCTCCAGACGA

TCCCTTTATCCTAGCCATTTTCGGTCTGGGGGCTCGACATCCTCGGCCCCTTCCCCCGAGCAGTCGGGGGCTTTG

AGTACTTGTACGTTGCAATCGACAAGTTCACAAAGTGGCCGGAAGTGGAACCAGTGAGGAAGGTGACGGCGTAGT

CCGCGGTCAAGTTCTTCAGGTCGATTGTTTGCCGTTTCGTAATCCCTAACAGGATAATCACCGACAACGGTACAC

AATTTACAAGCCGCACCTTCATGCAGTACGTCTAAGATCTTGGCGCCAAGGTCTGCTTCGCTTCTGTTGCTCACC

CGAGAAGCAACGGTCAAGCGGAGAGGGCAAATGCTGAAGTGCTGCACGGGCTCAAGACCAGAACTTTCGACAGGC

TGCACAAGTGCGGACGCAACTGGATCGAGGAGCTGCCGGTGGTTCTTTGGTCGATCAGGACGACGCCAAATCGAG

CCACCGGACAGACACCCTTTGCTTTAGTCTATGGAGCGGAGGCAGTTATCCCCACGGAACTCGTATACGGGTCAC

CTCAAGTGCTCGCTTATGATGAGCTTGAGCAAGAGCGGCTCCGGCAAGATGACGCGCTACTCCTTGAGGAAGATC

GTCTTCAGGCTGCTGTGCGAGCCGCTCGCTACCAGCAAGCTTTGCGCTGCTATCATAGCCACAAAGTTAACGCCA

GAAGTCTCGAGGAAGGCGACCTTGTTCTTCGGCGCGTTCAGTCCGCCAAGAATTCCAACAAGTTGACGCCGAAGT

GGGAAGGCCCTTACCGGGTGAAACGAGTCACCAGGCCTGGCGCTGTCCGCCTGGAGACCGAAGATGGCGTACCAG

TGAGCAACTCCTGGAATATCGAGCATCTTCGTAAGTTTTACCCGTAAGGCGCGGTTGTCGGACCCTGTCCGGCAA

CCACCCTTTTGTACAAGTTTTGCCGCTGTTGCATGTAATCCTTTGTACAAAGCCAGGCGCAGACCCTGTGCACAA

GTAAATGAAGCGCTGGCGCCCAGAGTTATAGCATGCATGCTAGGTCGAGTCGAATCATCGGTTAGGGTTGATAGT

GCTTAACAGCGGCTAACCCCTAACTTAGAGGTCGAGTGTGCGTCCCTTTGCCTTTCTTTCGTTTCCTTTGGTTCG

CAGGGCTCGCATATCCTTGCCGGACCACCTGAGGACGAAAAGGAGAAGAAGGACGTGCCGGCATCTGGACCCCGG

CAAGCCAGTGTTGCCGGGGGCTTCAAGTACCAAGGTTCGCCCATGGCAAGCCAGAGTTGTCGGGGGCTGCAATCT

CAGATAAGTCTTTCATCCCTCTTCATGCATTCTCTTATGGACTGGGGAATATGAAACCTCGTTTCTTCCCGAAAA

CTACTGTGCTCCCTTAGCTCTAGGCCCTGGGGCTCGTTCCTGGGGCCAAGTTTTGGTCGTAGTCGCGGCAGGAAG

GGGATGAAACGAGGGGCCGCACCCGCCTCGTCCCTACCTCCGTGAAGAGCGTGAGAAAGGTCGAGCGGCGTGGGT

AGACGGCAGGGCCTGTTGCCGGGGAAGGAGTGTCCATCTGTAAGATACCAAGGATTTGCTCCTTGACGTGGGTTC

TACCCGCGAGACAAATGACCCGGCAAACATCCCCTTTTCATTAAAACATTCCACACAGGTACAGTCCATAGAGGA

TGAAAAACATAAAAGAGAGGATTACAAGTTTTAAATGCAACTAAGGCCTGGAGGCTTACAAATTAAAAAGTTAAG

ATGCATGTAATCCTTTTCTTGTCCCTCTTTTCAGGAGGTAGGTCAAGGAGGCCAAAAGGGCAAAAGGAGTGCCTA

GGCGAGCTACGGGTGCCAGAAGACACCAACAGAACACCCCGGGGGCCGTCAAGGCTGGACGGTGATGGCGGCGCC

GGAAGCAGGGCTCGAAGATGAGGCGGTAGGACGTCAGCACGCTGTCGAGGGCACCACGCCCTCGTACTCCGGCTT

GACGGCCTTGCCGCCAGCCCTCTTCCTCTTCCGCAGCCTCCCTACACCAGTCGCCCAAGGAGACGACCCCGAGGA

GTTCAAGGAGCCGGCGACACGGATGATGGGGTCGCCGTTGCCGCACGGAGCGGCGCCCGACACCTAGTCGGACCC

GCCGTCTTGCCGCCCGCTCCCCTCGTTGTCGCTGTCGCTCCCCTCCTCGTCACTCTCGCTTGAGGAGGAGCCTTC

GCTATCGGAGGAGATTTCCACGCAGCACCGTGCCCGGACGCCGAAGCACCCGAAGACCTTGACGGGGAGGAGATC

GCCCTCCAGCAACCTAAAATGGAGGGTGAGCCCCTGCATCAAGCTGTGGATGCGGGCGAACGTCTTCCACCCTCG

ACCAAGATACATCACACCAGGGGCTGGGAATTCGGTGCGGACCCGCGTGCCGCTGATCCCACAGCCCCTCATGTG

TAGTCTCAGTGTCTCGGGCGGGTCCAGCCCCACCACGTCCGCAAACGGAGCTGGGAGTCGGAGGCGACGATGAGG

CGCCCGGCGCAGCCTGACGACGAACTCGCAAGGGTCGTCCCCGGCATGGACCGCCGCGGGAAGAAGAGCCACCGG

CGGAGGCGAGGCTGACGCGCCGCCTCGTCCACCTCTTCCCCGAGCGCCGCGGCCTCGTCCTCGACCGCGTCCACG

ACCTCACCCCTCCCTCTTCCCCTTACGGCTGGCTCCGCCACCACGGGTGCGGGAGAGGGAGGGATGCGCTGCCTG

GCGGCAACCCTTGCCGCCACTGACGAAGGACCAGGGTTCCCTTTCTCCCTGGCGAATGAGAGGAAGGGGAGCAGA

AGCTGAAGGAAGGAGAAGACAAGAGAATGCGGGTTTCTGTCCTCCCCTCCCGCTCTTTATAAAGGAGCGAGGTGG

GGGCTCGGCCGCCCCGCTCAGCGAATCAAGACGTGGAAAGGCAGGGAGCTGGGGCCGACCCGCTGTCTCGACCCA

CGACGGCCCCGTTTCCCGCCTCCGCCGTCCGCGACCCCAAGTGCATGGGAGCCGCGTGGTAGGTGTGCAGAACCG

AGGCGACGAATGGGGCGGCCCCTCCCCACCCCGTGATCGTGGGGAGTGGGCACCTCGAAAACCGCCCACGACCTC

TCCCATGATGCCGCGTTTGACGCGTGCCGTTTGGGGAGGGCGCGGTGGGTGCGGAGTGAGTTATGGCGTGACCCA

GGCCGCGCTTGCCCGTGCACTGTTTTGGGCCTGGCCCAACAGCGCTCGGCACCGTGTATGGCCCAGGCCCGGGGG

CTCC TGTCGGTGTACTAAAGTAGGGGTACCTTAGTATCCCGAACTTGTGCACGGGCAGTCGCAGCGACCCGCGGC

AAGGCTTGTCGGGCGACCGCCAAGACCCTCCATAGCTCCGTTGGGGCCATTCAAGAACAAAGTATCCAAGAGCAA

AACGAGACCCCGGCAAGAGGAGCTTGCCGGGAAGGAGACAAGGCCCCGGCAAGAGGAGCTTGCCGGGAAGGAGAC

GAGACCCCGGCAAGAGGAGCTTGCCGGGAAGGAGACGAGACCCCGGCAAGAGGAGCTTGCCGGGAAGACCAACCA

AAGCATCTCAAGAATCTCGCTGCGGCGCACCACGCGTCCCGACAAGACCCGGTGAGCGACAAGCTCCCGGACGCG

GCAAGGCAGCATCCGCGGCAAGGCACTTGCCGCGGCAAGCTACTACTCCACAGTCCGCGCTCCAGCTCACCCACC

AACGTGTCGCTCTGGGACCGCTCCTGGTGTACGTGGCGGGAGGCTGTGCAGCCGGGGGCATGCGGTGGCAAGCAA

GCGCTGACAAGATAGCCATCGTGGCAAACGGTGGCGTTCCTAGCGGTCCCTCTCTGTACTGTCTAGGTGACACAG

ACGGGCATTTAATGCCCTTGTCCCCTGCCATCAGGGTTAGGTATGATACGACTGTAGCAGGTGGTGGCGCCAACT

ACCACCCCTTTTACATTTTTACCCTCACCTCTGTTGCCACATGTCGGTGACCCCTTAGACATATAAAAGGAGGCC

CATGCGCAACAGAGAGGGGGGACGGCCATTCTCACGCTCGGTCTCGCCAGCTGCTTGAGTGTACTGTAGCACTCG

CCGCTCCCGAGCAAGAACTCAATACAACCACCAAAGCAGGAGTAGGGTTTTACGCATCCGTGCGGCCCGAACCTG

GGTAAACCGATCGTCTCACTCGTGCGTTCACCTCGATCTGCTCTTTGTGCATCCACCGCCCCCTGCCGAACCGAA

AGGGGCTCGGTCCGCCGATCCCCATAGGTGTGCGTGGGATCAGCAACCCCTGACAGGACCCACGACAGGACC TGA

TAGTCCGGAGTGTATCCGAATACCCACTCGGTTATGTAAAAACCGGGGTATGCGTGGACACCAGGCGTAGGGGTC

CTCCCTTTGGTCGGGTTTAAGCCCAATTCGATCGTAGTCTGTAGCGACTCCCAGAGTGGCGATGCGATCCAAGAG

CTCATTCAAAGAGGAGAGCTCCATTGGATCCATCTGCTCTATGAATTTCAAGCCGATGTGGAGGCTATTTTCAAT

GACCCGAGAAGTCATCGTCGGCGCAGTGGCCGAGCGGGCGGTCATGACGAAGCCGCCTAGTCGGAGAGTCTGGCC

TACAGCCAGGGCTCCCTCAGAGGTGATGTTGTTCTTGACAACGAGACGAGCCATCCCTCCTTATGATGACGACAT

AGTGGAACTCTCAATGAAAGCACCAA TGTTGGTGTCAAAACCGGCAGATCTCGGGTAGGGGGTCCCGAACTGTGC

GTCTAAGGTGGATGGTAACAGGAGGCGGGGGACACGATGTTTACCCAGGTTCGTGCCCTCTCGATGGAGGTAATA

CCCTACTTCCTGCTTGATTGATCTTGATGATATGAGTATTACAAGAGTTGTTCTACCACGAGATCGTAGAGGCTA

AACCCTAGAAGCTAGCCTATGGTATGATTGTTGTCGTCCTACGGACTAAACCCTCCGGTTTATATAGACACCGGA

GGGGGCTAGGGTTACAAAGAGTCGGTTACAAGGAAGGAGATCTACATATCCGTATAGCCAAGCTTGCCTTCCACA

CCAAGGAGATTCCCATCCGGACATGGGACGAAGTCTTCAATCTTGTATCTTCATAGTCCAACAGTCCGGCCAAAG

GATATAGTCCGGCTGTCCGGAGACCCCCTAATCCAGGACTCCCTCAATTCAATCCTCGACATCATGGTTCATCCG

ATGAGATCATGGAGGAGCATGTGGGAGCCAACATGGGTATCCAGATCCCGCTGTTGGTTATTGACCGGAGAGTCG

TCTCGGTCATGTCTACGTGTCTCCCGAACCCGTAGGGTCTGCACACTTAAGGTTCGGTGACGCTAGGGTTGTTGA

GATATTAGTATACGGTAACCCGAAAGTTGTTCGGAGTCCCGGGTGAGATCCCGAACGTCACGAGGAGTTCCGGAA

TGGTCCGGAGGTGAAGATTTATATATAGGAAGTCAACTTTCGGCCATCGGGAAAGTTTTGGGGGTAATCGGAATT

GTACCGGGACCACCGGAAGGGTCCCGGGGGTCCACTGGGTGGGGCCACCTATCCCGGAGGGCCTCATGGGCTGAA

GTGGGAGGGGAACCAGCCCCTAGTGGGCTGGTGCTCCCCCCTTGGGCCTCCCCCTGCGCCTAGGGTTGGAAACCC

TAGGGGTGGGGGCGCCCCACTTGGCTTGGGGGGCACTCCAGCCCCCTTGGCCGTCGCCCTCCTTGGAGATCCCGT

CTCCTGGGGCCGGCACCCCCCTAGGGGTCCTATATATAGTGGGGGGAGGGAGGGCAGCCACACCGTAGCCCCTGG

CACCTCCCTCTCCCTCCCGTGACACCTCTCCCTCTTGCTGAGCTTGGCGAAGCCCTGCCGAGATCACCGTTGCTT

CCACCACCACGCCGTCGTGCTGCTGGATCTCCATCAACCTCTCCTTCCCCCTTGCTGGATCAAGTTGGAGGAGAC

ATCTTCCCAATCGTATGTGTGTTGAACGCGGAGGTGTCGTCCGTTCGGCGCTAGGTCATCGGTGATTTGGATCAC

GACGAGTACGACTCCATCAACCCCGTTCTCTTGAATGCTTCCGCTCGCGATCTACAAGGGTATGTAGATGCACTC

CCCTCTCCCTCGTTGCTAGATGACTCCATAGATTAATCTTGGTGATGCGTAGAAAATTTTAAAATTCTGCTATGT

TCCCCAATAACAGG TTCGTTGGGCACCCACCAGAGCTCCTCGTTGGTGGCACACCGGAGCTTATCGGCCCACCTC

CACGCAACGACGAGGCGCTATGACCTTCGTCGCGAAGTACCCGTCTTCGTATACCTCCTCGGCATGACGACACGC

CCGCCCCCAAAGCTTCGCCGCCTCCTTTTCCGGATGACGTCACGGGCTT

5. 20 points This is a genomic DNA sequence from a wheat BAC clone. Use all the tools

you have learned to identify the genes and repetitive elements present.

5.1. How many genes are present this sequence? Highlight the exons of each gene

with different colors and indicate which color corresponds to each exon and each

gene.

There are 2 genes present in the sequence.

References:

The software used was FGENESH.

Highlighted are the exons of the genes predicted by FGENESH , with the splicing

sites bolded in red.

FGENESH

5.2. Are there any repetitive elements? Highlight them with different colors and

indicate which color corresponds to each repetitive element.

There are 2 repetitive elements in the sequence.

TTAACCAACTAATTGCCATATATTTACTGGTCTCTATTGATTCAC AG ATGGACAGGATCAATGGAAATGGCCAGAAGTCT

GGCATTCAA GT GAGTGATCACAACAGTTTTCACCTTGCTGTCTAGCGCATTCATCATACATTCCTCATTTGCAAACTGCA

GTATGGTTTTTTTATCTTTTTTAATTTGGTGTCTCCATACTTATTCACTGGATTTGCTCTCGTTGATTTTCCTATAGAGC

AG GCCTCGACATTCTCCGATGGAGCAGACGAATTCGACGACGACACGCTTATAGCCTCGGAAGGGCTGTCTCGCAGAACC

GGCGGCTTTGAGAGAAATGCAGGCCCTGGGAATTTGGGGATGTTTGGTAGCCTGAAGTTTGTGCTGTTCAAGTCAAAGCT

CAACCTGTTGATACCGTGTGGCTTTGTGGCGATCGTCGTCAAGGACATGACAGGAAATAAT GT ACGTGTGATAAATTATT

GCATCGAAGTGTTTCATCAATTTGTGTTAGTATCTGCTTTATGCTGCATTTTTTGGTGTTCTTGTCCTGATATTACACTG

TTACTCCTTTGTTTGTGCTGA AG GGCTGGGTATTTCCCCTGACTCTGTTGGGCATTATTCCTTTGGCCGAACGACTGGGT

TTCGCCACCGA GT AATTTGTTACAGTTTTCACGCTTGTTTGGTGATCCTTGTTGATCTTACTGAACAAACATCTGACACT

CTGCTTCTTACTGC AG GCAGTTAGCATTCTTCACTGGCCCAACAG GT CTCTCACTCTGTTGCTCTTCTTTGAAGCATTCC

GATCCAAGTAGCATTCGAATTGTCGTTGGCTTGACAGATTTCATGTGCATGC AG TTGGGGGCCTTCTGAATGCTACATTT

GGCAATGCAACCGAGTTGATTATATCAATCCATGCGCTAAGGAGCGGAAAGTTACGAGTTGTCCAGCAGTCTCTGCTAGG

GTCCATCCTGTCAAACTTGCTTCTGGTTCTCGGTTGCGCATTTTTCAGTGGAGGGGTCACTTGTGGCAAAACTGAGCAGA

ATTTCAGCAAG GT GACCATTCTTGTCTGTTCACAGATTTGCACAATGTAGATGAGACATGAGAGTGATCATTCTGTGTTC

TTGTTGC AG TCAGAGGCAGTAGTGAGCTCTGGGCTGCTTTTGATGGCCGTATTGGGGTTGCTGCCTCCTACTGTGCTGCA

TTACACTCATTCAGAAGTCCACTCTGGAAAGTCAGGACTAGCCTTGTCAAGGTTCAGCAGCTGCGTCATGCTTGTTGCTT

ACGCTTGCTACATCTACTTTGAATTAACAACCAGTCGCCGTCGTGAGGAATCAAATGAA GT AAATAATGGATTGCCCATA

CTTATATGCATAGTTTTTCAATGATGCTTTACTGGACTTTCTATTTCTCCGTACCTTCTTTTTCAAGTTTCTGAGTTACA

TATATGCATTTGATGGGACC AG GGAAGAGGTGAGAATGTAGGGGATGCCGACAACAATGAAGCTCCTGACATTTCAAAAT

CGGAAGCCATTGCCTGGCTTGCAATTTTGACAATTTGGATCTCAGTACTCTCTGACTACTTAGTTAATGCAATCGAC GT A

AGCTCTCTTACTGATCCACTAGATGATTACACACACCTTTTTTCAGTTATGTTGCTTATTACTTTTCTTAAAATGAAAAT

GATAATATTAACCTTGCAAATTTATC AG GGGGCTTCCCAGGCCTGGAATATACCAGTTGCGTTCATCAGTGTTATTTTGC

TTCCAATCGTGGGGAATGCTGCTGAACATGCTAGTGCTGTCATGTTTGCAATGAAAGACAAATTA GT AAGAGAAGCTTAA

AATTCTTAACAGTCCTTCCTTTTGCACAGATCACATAAAGTTCTTGTATTTCTCTTGT AG GATCTTTCCCTGGGAGTTGC

AATAGGGTCATCAACACAAATGTCCATGTTTGGG GT AGGTTGGTTGAACCGTTTGTGCCTTACTGGCATTGATCTGAATT

TATCAGATATTGTTTTGCTGACAGATTGCTCTGCATCTCATTTTGC AG ATACCATTTTGTGTAGTGATAGGGTGGATGAT

GGGCCAGCCGATGGACTTGAACTTCCATCTCTTCGAGACCGCAAGCCTCTTAATGACGGTACTAGTTGTCGCGTTTTTGT

TGCAG GT ACTAAAATCATGTACCCTTATTCGTCGACACCTGTCTTGCTTTTGCGTGAATTCACTCTTCTATTTTCCTCTG

AATTTCC AG GATGGGACTTCAAATTGCTTAAAAGGGTTGATGCTGTTTCTATGCTATCTGATAGTAGCTGCTAGTTTCTA

TGTACATTCCGACCAGGATCCTGATG GT AAGTTCTTCACTACCGAACTTCCTGTGTTTTAATCTTTCAGATGAATATTTG

TGCTGCATGAATTTTCAGTTGTTAGGCATTTAGCTACAAACATGGTTGCTCTATTGATACTATTTCCATGAATCTCAAAC

TACAAATCCAGTAAGAAAGTGAGAAAGATCAATATTGTGGTGACTCTGTACAATTCACTAATATTCCGAAATTTATACCT

GC AG GTAATAACCCTGCACAAAAC TAG GAGTACTTGGATGATCCCATGGATTCTACAGAAAGATTTCTACCATGTGACAA

CATTTGCAGTCAGGAGCCAGGACATGGACTATTTATGTTTTTGACTAATTTGGCAGTGTTTCCTCACATTTAGATGTCTT

TCAAATGCACCCATTCTTGAAACTCTTTTTGGAAAGCCACACATCCAGTTTTGTAGAAAGGCATAATCAAGGTAAAAAAA

TATATTAAGTTGTGTAGGCTGAAGTATTCTTGCTATTGTACCAGCAAAGAAAGAAAGAAGTTGAAACAATATTTTAGTTT

TGGTATTTTTTGTATACCAACTAATCGCAGCAAGTGCAGGTATCATCATTACGTAACATGTAAGATCAACCAAACATCCG

GTTCTCAACTTATTCCATTCCATCAAATTCTAAACCCATCGAAATTGCCTCGACTCCTTAGAGGCTTATATATATACTCC

CTCTGTTCACCCCCTTCATGTTCAAATACTGGCTCCGTCCCTGATTAAGGAGTTGTATTTTAGAGAGAAATCCACGTTAA

CCCCGTATGTAGGGTTCACATCGCAACCTCAAACTTTGATTTAATTCAGTTTATCTTGCATGCTTAAGACGCTCACGAGC

CTAAACTGGGAGCAATGCGACATCAATGTTTGAAATAAAACGCTACTTTAATTACATATGAGAATATCTTGTTTTTTCAA

Gene 2 Exon 1 Exon 2 Exon 3 Exon 4 Exon 5 Exon 6 Exon 7 Exon 8 Exon 9 Exon 10 Exon 11 Exon 12 Exon 13