Examen de bio informatique, Exams of Biotechnology

C’est un sujet d’examen concernant le module bio informatique

Typology: Exams

2023/2024

Uploaded on 03/30/2026

kawther-nkz
kawther-nkz 🇩🇿

1 document

1 / 11

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Examen de bioinformatique
Février 2009
Durée : 2h (ou 2h30) - Documents interdits
Première partie (4 points)
1) La séquence ci-dessous est-elle au format fasta (justifiez votre réponse) ? 1 pt
Is the sequence below in fasta format (justify your answer) ?
>tr|A0A098|A0A098_CHLRE
MASMAAELRPSDGGSSLHMLDSLLMMGLSSGGGVGGGGSSQSQILDSAGAAELAALLLPQ
HSNDPLHLMSTGDAALGLAGPMAAAEHHQHHPHHQHHSVPATAGFPSQTPPPPLFSNATA
GAAPATRVRAAGSCGSGGVAGGTTSHSSEDGVFHSADPHHHHQQHLQQPQPQQQQ
2) Quel problème rencontrera-t-on avec cette séquence lors d’une recherche de similarité ? 1.5 pt
Which problem will be encountered with this sequence in a similarity search?
3) Définissez la banque GO. Define the GO database. 1.5 pt
Deuxième partie (7.5 points)
Un alignement multiple de séquences eucaryotes est présenté page 2. A multiple alignment of
eucaryotic sequences is shown page 2.
1) Quels groupes pouvez-vous distinguer dans cet alignement ? 2 pt
Donnez 2 résidus discriminants pour chaque groupe.
Which group can you distinguish in this alignment? Give 2 discriminative residues for each
group.
2) Donnez une erreur de séquence probable dans cet alignement. 1.5 pt
Give a probable sequence error in this alignment.
3) Quelle est la relation d’homologie entre zn143_human et znf76_human ? zn143_human et
q8ci27_mouse ? zn143_human et znf76_mouse ? 1.5 pt
Give the homology relation between zn143_human and znf76_human ? zn143_human and
q8ci27_mouse ? zn143_human and znf76_mouse ?
4) Un arbre a été construit à partir de cet alignement selon la méthode du neigbor-joining. 2.5pt
Dans cet arbre (page 3), 3 identifiants de séquences ont été remplacés par x, y et z.
A tree has been constructed from this alignment using the neighbor-joining method. In this
tree (page 3), 3 sequence identifiers have been replaced par x, y, and z.
a) Est-ce que cet arbre est en accord avec votre analyse de l’alignement ? Justifiez
votre réponse.
Is this tree in agreement with your alignment analysis? Justify your answer.
b) Selon vous, à quelles séquences correspondent respectivement X, Y et Z ?
According to you, which sequences correspond respectively to X, Y and Z?
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Examen de bio informatique and more Exams Biotechnology in PDF only on Docsity!

Examen de bioinformatique

Février 2009

Durée : 2h (ou 2h30) - Documents interdits

Première partie (4 points)

1) La séquence ci-dessous est-elle au format fasta (justifiez votre réponse)? 1 pt

Is the sequence below in fasta format (justify your answer)?

>tr|A0A098|A0A098_CHLRE

MASMAAELRPSDGGSSLHMLDSLLMMGLSSGGGVGGGGSSQSQILDSAGAAELAALLLPQ

HSNDPLHLMSTGDAALGLAGPMAAAEHHQHHPHHQHHSVPATAGFPSQTPPPPLFSNATA

GAAPATRVRAAGSCGSGGVAGGTTSHSSEDGVFHSADPHHHHQQHLQQPQPQQQQ

2) Quel problème rencontrera-t-on avec cette séquence lors d’une recherche de similarité? 1.5 pt

Which problem will be encountered with this sequence in a similarity search?

3) Définissez la banque GO. Define the GO database. 1.5 pt

Deuxième partie (7.5 points)

Un alignement multiple de séquences eucaryotes est présenté page 2. A multiple alignment of

eucaryotic sequences is shown page 2.

1) Quels groupes pouvez-vous distinguer dans cet alignement? 2 pt

Donnez 2 résidus discriminants pour chaque groupe.

Which group can you distinguish in this alignment? Give 2 discriminative residues for each

group.

2) Donnez une erreur de séquence probable dans cet alignement. 1.5 pt

Give a probable sequence error in this alignment.

3) Quelle est la relation d’homologie entre zn143_human et znf76_human? zn143_human et

q8ci27_mouse? zn143_human et znf76_mouse? 1.5 pt

Give the homology relation between zn143_human and znf76_human? zn143_human and

q8ci27_mouse? zn143_human and znf76_mouse?

4) Un arbre a été construit à partir de cet alignement selon la méthode du neigbor-joining. 2.5pt

Dans cet arbre (page 3), 3 identifiants de séquences ont été remplacés par x, y et z.

A tree has been constructed from this alignment using the neighbor-joining method. In this

tree (page 3), 3 sequence identifiers have been replaced par x, y, and z.

a) Est-ce que cet arbre est en accord avec votre analyse de l’alignement? Justifiez

votre réponse.

Is this tree in agreement with your alignment analysis? Justify your answer.

b) Selon vous, à quelles séquences correspondent respectivement X, Y et Z?

According to you, which sequences correspond respectively to X, Y and Z?

2

ZN143_HUMAN

A6QQW0_BOVIN :Q8CI27_MOUSE :Q6VQB0_FUGRU :A0AUQ7_DANRE :Q4S173_TETNG :Q6GPP5_XENLA :ZNF76_MOUSE

ZNF76_HUMAN

MEGVSLQAVTLADGSTAYIQHNSK----DAKLIDGQVIQLEDGSAAYVQHVPIPKSTGDSLRLEDGQAVQLEDGTTAFIHHTSKDSYDQSALQAVQLEDGTTAYIHHAVQMEGVSLQAVTLADGSTAYIQHNS-----------------KDGSAAYVQHVPIPKTTGDSLRLEDGQAVQLED------------SYDQSALQAVQLEDGTTAYIHHAVQMEGVSLQAVTLADGSTAYIQHNSK----DGRLIDGQVIQLEDGSAAYVQHVPIPKS----------------------------NSYDQSSLQAVQLEDGTTAYIHHAVQMDTVSLQAVTLADGSTAYIQHDSKASFSDGQIMDGQVIQLEDGSAAYVQHVSMPKAGGDSLQLEDGQTVQLEDGTTAYIHTP-KETYDQSGLQEVQLEDGSTAYIQHTVHMDTVSLQAVTLVDGSTAYIQHSPKVSLTENKIMEGQVIQLEDGSAAYVQHLPMSKTGGEGLRLEDGQAVQLEDGTTAYTHAP-KETYDQGGLQAVQLEDGTTAYIQH---MDTVSLQAVTLADGSTAYIQHDSKASFPDGQIMDGQVIQLEDGSAAYVQHVSMPKAGGESLQLEDGQTVQLEDGTTAYIHAP-KETYDQSGLQEVQLEDGSTAYIQHTVHMESMSLQAVTLADGSTAYIQHNTK----DGKLMEGQVIQLEDGSAAYVQHIP----KGDDLSLEDGQAVQLEDGTTAYIHHSSKESYDQSSVQAVQLEDGTTAYIHHAVQMESLGLQTVRLSDGTTAYVQQAVK----GEKLLEGQVIQLEDGTTAYIHQVTI---QKESFSFEDGQPVQLEDGSMAYIHHTPKEGCDPSALEAVQLEDGSTAYIHHPVPMESLGLHTVTLSDGTTAYVQQAVK----GEKLLEGQVIQLEDGTTAYIHQVTV---QKEALSFEDGQPVQLEDGSMAYIHRTPREGYDPSTLEAVQLEDGSTAYIHHPVA

ZN143_HUMAN

A6QQW0_BOVIN :Q8CI27_MOUSE :Q6VQB0_FUGRU :A0AUQ7_DANRE :Q4S173_TETNG :Q6GPP5_XENLA :ZNF76_MOUSE

ZNF76_HUMAN

VPQSDTILAIQADGTVAGLHT-GDATIDPDTISALEQYAAKVSIDGSESVAGTGMIGENEQEKKMQIVLQGHATRVTAKSQQSGEKAFRCEYDGCG--VPQSDTILAIQADGTVAGLHT-GDAAIDPDTISALEQYAAKVSIDGSEGVTGSGIIGENEQEKKMQIVLQGHATRVTAKSQQSGEKAFRCGYDGCG--VPQSDTILAIQADGTVAGLHT-GDATIDPDTISALEQYAAKVSIDGSDGVTSTGMIGENEQEKKMQIVLQGHATRVTPKSQQSGEKAFRCKYDGCG--MPQSNTILAIQADGTIADLQA-DATGLNPETISVLEQYATKVESIENQLG--SYSRAEADNGVHMRIVLQDQDNRQS-RSTNVGEKSFRCEYEGCG--MPQSNTILAIQADGTVADLQT-EGT-IDAETISVLEQYSTKMEATECGTG--LIGRGDSD-GVHMQIVLQGQDCRSP-RIQHVGEKAFRCEHEGCG--MPQSNTILAIQADGTIADLQA-DAAGLNPETISVLEQYATKVPLVSGLRLRLLWAGGEYRKPVGLLQPAGGGERRPH-ADCFTRSRQQAVAEHQCGREVPQSDTILAIQADGTVAGLHT-GEASIDPDTITALEQYAAKVSIEGGEGAGSNALITESESEKKMQIVLS-HGSRVPVKVPQTNEKAFRCDYEGCG--VPSDSAILAVQTEAGLEDLAAEDEEGFGTDTVVALEQYASKVLHDS--------------------------PASHNGKGQQVGDRAFRCGYKGCG--VPSESTILAVQTEVGLEDLAAEDDEGFSADAVVALEQYASKVLHDS--------------------------QIPRNGKGQQVGDRAFRCGYKGCG--

DE Homo sapiens MHC class I antigen (HLA-A) gene, HLA-A01 variant allele, DE alternatively spliced. ... XX FH Key Location/Qualifiers FT source 1.. FT /db_xref="taxon:9606" FT /organism="Homo sapiens" FT gene <1..> FT /gene="HLA-A" FT /allele="HLA-A01 variant" FT mRNA join(<1..373,504..773,1015..1266,1870..2145,2248..2364, FT 2807..2839,2982..3029,3199..>3374) FT exon <1.. FT /number= FT 5'UTR <1.. FT /allele="HLA-A01 variant" FT CDS join(301..373,504..773,1015..1266,1870..2145,2248..2364, FT 2807..2839,2982..3029,3199..3203) FT /gene="HLA-A" FT /product="MHC class I antigen" FT /protein_id="AAW30165.1" FT /translation="MAVMAPRTLLLLLSGALALTQTWAGSHSMRYFFTSVSRPGRGEPR FT FIAVGYVDDTQFVRFDSDAASQKMEPRAPWIEQEGPEYWDQETRNMKAHSQTDRANLGT FT LRGYYNQSEDGSHTIQIMYGCDVGPDGRFLRGYRQDAYDGKDYIALNEDLRSWTAADMA FT AQITKRKWEAVHAAEQRRVYLEGRCVDGLRRYLENDPPKTHMTHHPISDHEATLRCWAL FT GFYPAEITLTWQRDGEDQTQDTELVETRPAGDGTFQKWAAVVVPSGEEQRYTCHVQHEG FT LPKPLTLRWELSSQPTIPIVGIIAGLVLLGAVITGAVVAAVMWRRKSSDRKGGSYTQAA FT SSDSAQGSDVSLTACKV" FT exon 504.. FT /number= FT exon 1015.. FT /number= FT variation 1268 FT /note="alternatively spliced compared to HLA-A010101; FT results in altered exon and protein length; no membrane FT expression detected" FT /replace="g" FT /gene="HLA-A" FT exon 1870.. FT /number= FT exon 2248.. FT /number= FT exon 2807.. FT /number= FT exon 2982.. FT /number= FT exon 3199..> FT /number= FT 3'UTR 3204..> ...

  1. Une recherche blastp a été effectuée à partir de la séquence protéique HLA-A (357 aa).

Cette protéine est similaire à des immunoglobulines comme le montrent les alignements avec

la protéine MUCM_RABIT. 3 pt

A blastp search has been performed with the HLA-A sequence (357 aa). This protein is

similar to immunoglobulins as shown by the alignments with the protein MUCM_RABIT.

a) Représentez schématiquement les 2 protéines en indiquant les régions conservées. Represent schematically the 2 proteins with their conserved regions.

b) Représentez le résultat d’une comparaison des deux protéines par la méthode de la

matrice de points.

Represent the result of a dotplot comparison between the two proteins.

>sp|P04221|MUCM_RABIT Ig mu chain C region membrane-bound form

Length = 479

Score = 45.1 bits (105), Expect = 3e- Identities = 29/94 (30%), Positives = 49/94 (52%), Gaps = 11/94 (11%)

Query: 214 EATLRCWALGFYPAEITLTWQRDGED-----QTQDTELVETRPAGDGTFQKWAAVVVPSG 268 ++ L C A GF P +I+++W RDG+ T+ E ET+ AG TF + + + Sbjct: 132 KSRLICQATGFSPKQISVSWLRDGQKVESGVLTKPVE-AETKGAGPATFSISSMLTITES 190

Query: 269 E---EQRYTCHVQHEGL--PKPLTLRWELSSQPT 297

    • YTC V H G+ K +++ E S+ P+ Sbjct: 191 DWLSQSLYTCRVDHRGIFFDKNVSMSSECSTTPS 224

Score = 40.4 bits (93), Expect = 7e- Identities = 24/81 (29%), Positives = 37/81 (45%), Gaps = 6/81 (7%)

Query: 215 ATLRCWALGFYPAEITLTWQRDGEDQTQD---TELVETRPAGDGTFQKWAAVVVPS---G 268 AT+ C GF PA++ + WQ+ G+ + D T P G + + + V Sbjct: 352 ATVTCLVKGFSPADVFVQWQQRGQPLSSDKYVTSAPAPEPQAPGLYFTHSTLTVTEEDWN 411

Query: 269 EEQRYTCHVQHEGLPKPLTLR 289

  • +TC V HE LP +T R Sbjct: 412 SGETFTCVVGHEALPHMVTER 432

Score = 31.2 bits (69), Expect = 4e- Identities = 23/85 (27%), Positives = 37/85 (43%), Gaps = 10/85 (11%)

Query: 219 CWALGFYPAEITLTWQRDGEDQTQDTELVETRPA---GDGTFQKWAAVVVPS-----GEE 270 C A F P+ +T +W + + V T P GD + + V+VPS G E Sbjct: 28 CLARDFLPSSVTFSWSFKNNSEIS-SRTVRTFPVVKRGD-KYMATSQVLVPSKDVLQGTE 85

Query: 271 QRYTCHVQHEGLPKPLTLRWELSSQ 295

  • C VQH + L + + + S+ Sbjct: 86 EYLVCKVQHSNSNRDLRVSFPVDSE 110

Deuxième partie (6 points)

Une région génomique (982 bases, access GQ2293385) d’une souche de virus H1N1 a été

comparée à une banque de séquences nucléiques avec les programmes fasta et blastn. L’une

des séquences détectées est la séquence synthétique CS723756 (4700 pb). Les séquences

GQ2293385 et CS723756 ont également été alignées avec le programme d’alignement

optimal Water. Les alignements entre ces 2 séquences obtenus par les trois méthodes vous

sont présentés.

A genomic region (982 bases, access GQ2293385) of a H1N1 virus strain has been compared

to a nucleic sequence database using fasta and blastn programs. One of the detected

sequences is the synthetic sequence CS723756 (4700 pb). The GQ2293385 and CS

sequences have also been aligned using the optimal alignment program Water. The

alignments between the two sequences obtained using the three methods are shown.

1) Que pouvez-vous déduire sur la similarité entre les 2 séquences? Quelles sont les

principales différences entre les 3 alignements obtenus? Comment l’expliquez-vous 3 pts

What can you deduce about the similarity between these 2 sequences? What are the main

differences between the three alignments? How do you explain these differences?

2) Megablast est-il adapté dans le cadre de cette recherche? Pourquoi? 1 pts

Is Megablast suitable in the context of this search? Why?

3) Donnez schématiquement le résultat d’une comparaison de ces deux séquences par la

méthode de la matrice de points. 2 pts

Give schematically the result of a dotplot comparison of these two sequences.

BlastN

>emb|CS723756.1| Sequence 14 from Patent WO

Length=

Score = 105 bits (116), Expect = 6e-

Identities = 121/163 (74%), Gaps = 0/163 (0%)

Strand=Plus/Plus

Query 818 GATCGTCtttttttCAAATGTATTTATCGTCGCTTTAAATACGGTTTGAAAAGAGGGCCT 877 || || || || ||||| || || || || | | || || || |||| ||||| ||| Sbjct 1503 GACCGGCTGTTCTTCAAGTGCATCTACCGGAGACTGAAGTATGGACTGAAGAGAGGACCT 1562

Query 878 TCTACGGAAGGAGTGCCTGAGTCCATGAGGGAAGAATATCAACAGGAACAGCAGAGTGCT 937 | || | ||||||||||| || ||| |||| || ||| ||||||||||||||| || Sbjct 1563 GCCACAGCCGGAGTGCCTGAATCTATGCGGGAGGAGTATAGACAGGAACAGCAGAGCGCC 1622

Query 938 GTGGATGTTGACGATGGTCATTTTGTCAACATAGAGCTAGAGT 980 |||||||| || ||||| || || || || || ||||| |||| Sbjct 1623 GTGGATGTGGATGATGGCCACTTCGTGAATATCGAGCTGGAGT 1665

Score = 48.2 bits (52), Expect = 0. Identities = 32/36 (88%), Gaps = 0/36 (0%) Strand=Plus/Plus

Query 716 CCTACCAGAAGCGAATGGGAGTGCAGATGCAGCGAT 751 |||||||||| || |||||||||||||| |||||| Sbjct 1464 CCTACCAGAAATGAGTGGGAGTGCAGATGTAGCGAT 1499

Fasta

>>EM_PAT:CS723756; CS723756 Sequence 14 from Patent WO20 (4700 nt)

initn: 378 init1: 238 opt: 549 Z-score: 279.9 bits: 65.5 E(): 7.6e-

58.0% identity (69.2% similar) in 357 nt overlap (631-982:1319-1667)

Sequen GAGGCCAUGGAGGUUGCUAAUCAGACUAGGCAGAUGGUACAUGCAAUGAGAACUAUUGGG

EM_PAT GCUGACAGACUAACAGACUGUUCCUUUCCAUGGGUCUUUUCUGCAGUCACCGUCGUCGAC

Sequen ACUCAU--CCUAGCUCCAGUGCUGGUCU-GAAAGAUGACCUUCUUG-AAAAUUUGCAGGC

EM_PAT ACGUGUGAUCAGAUAUCGCGGCCGCUCUAGAGAUAUCGCCACCAUGCAGUACAUCAAGGC

Sequen CUACCAGAAGCGAAU-GGGAGUGCAGAUGCAGCGAUUCAAGUGAUCCUCUCGUCAUUGCA

EM_PAT CAACAGCAAGUUUAUCGGCAUCACAGAGCUGUCUCUGCUGACAGAAGUGGAGAC-CCCUA

Sequen GCAAAUAUCAUUGGGAUCUUGCACCUGAUAUUGUGGAUUACUGAUCGUCUUUUUUUCAAA

EM_PAT CCAGAAAUGAGUGGGA--GUGCA---GAUGUAG-CGAUAGC-GACCGGCUGUUCUUCAAG

Sequen UGUAUUUAUCGUCGCUUUAAAUACGGUUUGAAAAGAGGGCCUUCUACGGAAGGAGUGCCU

EM_PAT UGCAUCUACCGGAGACUGAAGUAUGGACUGAAGAGAGGACCUGCCACAGCCGGAGUGCCU

Sequen GAGUCCAUGAGGGAAGAAUAUCAACAGGAACAGCAGAGUGCUGUGGAUGUUGACGAUGGU

EM_PAT GAAUCUAUGCGGGAGGAGUAUAGACAGGAACAGCAGAGCGCCGUGGAUGUGGAUGAUGGC

Sequen CAUUUUGUCAACAUAGAGCUAGAGUAA

EM_PAT CACUUCGUGAAUAUCGAGCUGGAGUGAACACGUGGGAUCCAGAUCUGCUGUGCCUUCUAG

c) Que pouvez-vous dire sur la similarité entre les 2 protéines? 1.5 pts

What can you say about the similarity between the two proteins?

FIRST iteration

>sp|Q57979.2|SURE_METJA RecName: Full=5'-nucleotidase surE; AltName: Full=Nucleoside 5'-monophosphate phosphohydrolase Length=

Score = 58.9 bits (141), Expect = 3e-06, Method: Compositional matrix adjust. Identities = 55/219 (25%), Positives = 97/219 (44%), Gaps = 45/219 (20%)

Query 1 MRVLITNDDGPLSDQFSPYIRPFIQHIKRNYPEWKITVCVPHVQKSWVGKAHLAGKNLTA 60 M +LI NDDG +SP + +K + + IT+ P Q+S +G+A Sbjct 1 MEILIVNDDG----IYSPSLIALYNALKEKFSDANITIVAPTNQQSGIGRAI-------- 48

Query 61 QFIYSKVDAEDNTFWGPFIQPQIRSENSKLPYVLNAEIPKDTIEWILIDGTPASCANIGL 120

    • P +++ + KD + + + GTP C +G+ Sbjct 49 ------------SLFEPLRMTKVK-------------LAKDIVGY-AVSGTPTDCVILGI 82

Query 121 HLLSNEPFDLVLSGPNVGRNTSAAYITSSGTVGGAMESVITGNTKAIAISWAYFN---GL 177

      • DLV+SG N+G N I +SGT+G A E+ G K+IA S + Sbjct 83 YQILKKVPDLVISGINIGENLGTE-IMTSGTLGAAFEAAHHG-AKSIASSLQITSDHLKF 140

Query 178 KNVS-PLLMEKASKRSLDVIKHLVKNWDPKTDLYSINIP 215 K + P+ E +K + + + + ++D D+ +INIP Sbjct 141 KELDIPINFEIPAKITAKIAEKYL-DYDMPCDVLNINIP 178

SECOND iteration

>sp|Q57979.2|SURE_METJA RecName: Full=5'-nucleotidase surE; AltName: Full=Nucleoside 5'-monophosphate phosphohydrolase Length=

Score = 210 bits (534), Expect = 5e-52, Method: Composition-based stats. Identities = 67/318 (21%), Positives = 119/318 (37%), Gaps = 71/318 (22%)

Query 1 MRVLITNDDGPLSDQFSPYIRPFIQHIKRNYPEWKITVCVPHVQKSWVGKAHLAGKNLTA 60 M +LI NDDG +SP + +K + + IT+ P Q+S +G+A + L Sbjct 1 MEILIVNDDGI----YSPSLIALYNALKEKFSDANITIVAPTNQQSGIGRAISLFEPLRM 56

Query 61 QFIYSKVDAEDNTFWGPFIQPQIRSENSKLPYVLNAEIPKDTIEWILIDGTPASCANIGL 120

  • D I + GTP C +G+ Sbjct 57 TKVKLAKD----------------------------------IVGYAVSGTPTDCVILGI 82

Query 121 HLLSNEPFDLVLSGPNVGRNTSAAYITSSGTVGGAMESVITGN---TKAIAISWAYFNGL 177

      • DLV+SG N+G N I +SGT+G A E+ G ++ I+ + Sbjct 83 YQILKKVPDLVISGINIGENLGTE-IMTSGTLGAAFEAAHHGAKSIASSLQITSDHLKFK 141

Query 178 KNVSPLLMEKASKRSLDVIKHLVKNWDPKTDLYSINIPLVESLSDDTKVYYAPIWENRWI 237

  • P+ E +K + + + + P D+ +INIP E+ + +T + + + Sbjct 142 ELDIPINFEIPAKITAKIAEKYLDYDMP-CDVLNINIP--ENATLETPIEITRLARKMYT 198

Query 238 PIFNGPHINLENSFAEIEDGNESSSISFNWAPKFGAHKDSIHYMDEYKDRTVLTDAEVI- 296 +E+ + S+ W D +E +D TD V+ Sbjct 199 --------------THVEERIDPRGRSYYW-------IDGYPIFEEEED----TDVYVLR 233

Query 297 ESEMISVTPMKATFKGVN 314

  • IS+TP+ N Sbjct 234 KKRHISITPLTLDTTIKN 251