Lab3: Multiple Sequence Alignment and Phylogenetics using MEGA, Lab Reports of Bioinformatics

The steps to perform multiple sequence alignments, calculate distance matrices, and construct phylogenetic trees using mega4 software. It covers creating alignments using clustalw, generating publishable alignments using boxshade, exploring alignments, calculating distance matrices, and drawing phylogenetic trees. It also includes instructions on retrieving sequences from genbank and viewing 3d structures of proteins.

Typology: Lab Reports

Pre 2010

Uploaded on 07/30/2009

koofers-user-6rf
koofers-user-6rf 🇺🇸

4.3

(3)

10 documents

1 / 5

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
BIT150 – Lab3
Multiple sequence alignment and Phylogenetics
Copy 08_Lab3 from Z: to C:, and open the file ‘FT proteins for MEGA.doc’.
Objective: Perform multiple sequence alignments, calculate distance matrices, and
construct phylogenetic trees, to understand and interpret relationships between species.
Activities:
A. Creating Multiple Sequence Alignments (MSA)
In this example, we will create a multiple alignment of protein sequences that will be
imported into the alignment editor using different methods. Multiple protein sequence
alignment is a central tool to infer protein function, predict protein secondary structure,
and identify residues important for protein specificity.
A1. Start MEGA4 by using Start\Programs\BioInformatics\MEGA4.
A2. In the MEGA4 window, go to Alignment|Alignment Explorer/CLUSTAL. Select
Create a new alignment’, and click on OK. Click on [NO] for protein sequence
alignment.
A3. Sequences can be entered either from FASTA files or by hand. We will enter the
sequences by hand, one by one. In the Alignment Explorer window, go to Edit|
Insert Blank Sequence or click on , and repeat it to generate 8 blank
sequences. Right-click on the blank sequence name and edit the sequence name
for each protein sequence, as it is in the Word document ‘FT Proteins for MEGA’.
Copy and paste each sequence.
A4. Go to Edit|Select All to select every site for all the protein sequences in the
alignment.
A5. Go to Alignment|Align by ClustalW or click on to align the selected protein
sequences using the ClustalW algorithm.
A6. Save the current alignment by selecting the Data|Save Session. Save it as ‘FT.mas’.
This will allow the current alignment to be restored for future editing. Also,
export it (Data|Export Alignment|FASTA format) as both a FASTA file
(‘FT.fas’) and a MEGA file (‘FT.meg’).
B. Generating a publishable MSA using BoxShade
1
pf3
pf4
pf5

Partial preview of the text

Download Lab3: Multiple Sequence Alignment and Phylogenetics using MEGA and more Lab Reports Bioinformatics in PDF only on Docsity!

BIT150 – Lab

Multiple sequence alignment and Phylogenetics

Copy 08_Lab3 from Z: to C:, and open the file ‘FT proteins for MEGA.doc’. Objective : Perform multiple sequence alignments, calculate distance matrices, and construct phylogenetic trees, to understand and interpret relationships between species. Activities : A. Creating Multiple Sequence Alignments (MSA) In this example, we will create a multiple alignment of protein sequences that will be imported into the alignment editor using different methods. Multiple protein sequence alignment is a central tool to infer protein function, predict protein secondary structure, and identify residues important for protein specificity. A1. Start MEGA4 by using Start\Programs\BioInformatics\MEGA4. A2. In the MEGA4 window, go to Alignment | Alignment Explorer/CLUSTAL. Select ‘ Create a new alignment ’, and click on OK. Click on [NO] for protein sequence alignment. A3. Sequences can be entered either from FASTA files or by hand. We will enter the sequences by hand, one by one. In the Alignment Explorer window, go to Edit | Insert Blank Sequence or click on , and repeat it to generate 8 blank sequences. Right-click on the blank sequence name and edit the sequence name for each protein sequence, as it is in the Word document ‘FT Proteins for MEGA’. Copy and paste each sequence. A4. Go to Edit | Select All to select every site for all the protein sequences in the alignment. A5. Go to Alignment | Align by ClustalW or click on to align the selected protein sequences using the ClustalW algorithm. A6. Save the current alignment by selecting the Data | Save Session. Save it as ‘FT.mas’. This will allow the current alignment to be restored for future editing. Also, export it ( Data | Export Alignment | FASTA format ) as both a FASTA file (‘FT.fas’) and a MEGA file (‘FT.meg’). B. Generating a publishable MSA using BoxShade

B1. Using Word, open the previously created FASTA file (‘FT.fas’). Copy the FASTA sequences (including gaps). Past them in BOXShade : http://www.ch.embnet.org/ software/BOX_form.html. In the ‘ Output format ’ select RTF_new and in the ‘ Input sequence format ’ select other. Click on Run BOXSHADE. Click On ‘here is your output number 1’. The alignment will be open in a Word document. C. Exploring the MSA and identifying patterns C1. Back in MEGA4, exit the Alignment Explorer window by selecting the Data | Exit AlnExplorer. A dialog box will appear asking you if you would like to open the data file in MEGA; click on ‘ Yes ’. C2. Observe different coloring schemes by clicking on: C : conserved residues (the same amino acid at a given site in all the aligned sequences), V : variable residues (at least 2 different amino acids at a given site), Pi : Parsimony informative (at least 2 different amino acids at a given site and at least 2 of them occurring with a minimum frequency of 2), S : singletons (at least 2 different amino acids at a given site with at most 1 of them occurring multiple times). (When you have a coding DNA sequence you can translate it into a protein sequence by clicking on UUC->Phe. Clicking again you go back to the DNA sequence).

  • Can you discover some groups by looking at the Pi characters?
  • Move sequences to have OsFT2 close to TaFT2, and also TaFT, OsFTa, and OsFTb close to each other. Can you see patterns now? C3. To see the format of a MEGA file, in the MEGA4 window, go to File | Export Data , and click on OK to take a look at it. Exit ( File | Exit Editor ) this window. D. Calculating a Distance Matrix D1. In the MEGA4 window, go to Distances | Compute Pairwise. In the ‘Analysis Preferences’ window, change ‘ Model ’ to Amino Acid | No. of differences (leave the default parameters in the other options). Click on Compute. D2. See the Pairwise Distances matrix.
  • Which sequences are the closest ones?
  • Which sequences are the most distant ones? D3. To see the matrix in a MEGA file and save it, go to File | Export/Print Distances , and change the ‘ Output Format ’ from ‘Publication’ to ‘ MEGA ’. Click on Print/ Save Matrix. D4. After you have inspected the matrix, go to File | Quit Viewer to close the Pairwise Distances matrix.

G4. Align the protein sequence using ClustalW as before, save the alignment as ‘MADS.mas’, exit and open the file in MEGA. G5. Perform a Neighbor-Joining (NJ) analysis. Copy and paste the phylogenetic tree into your Word document.

  • Which Arabidopsis protein is the closest one to the MADS box protein from barley? H. Viewing the 3D structure of a protein H1. Cn3D is an application that allows you to view 3-dimensional structures of proteins. Go to protein blast (blastp) (http://www.ncbi.nlm.nih.gov/BLAST/Blast.cgi? PAGE=Proteins&PROGRAM=blastp&BLAST_PROGRAMS=blastp&PAGE_TYPE=Bl astSearch&SHOW_DEFAULTS=on). Copy and paste AtFT protein sequence and click on BLAST. H2. Once your results are completely displayed, go to Show Conserved Domains.
  • What is the name of the conserved domain? Click on it to find more information about the conserved domain.
  • What biological functions have been attributed to this conserved domain? H3. Click on Structure to go to Entrez, Structure database. In the Structure database, insert the name of the conserved domain you found and click on Go. Click on the link displayed as your results. In the Structure Summary window, click on Structure View in Cn3D. Open the file with Cn3D. Cn3D tutorial: http://www.ncbi.nlm.nih.gov/Structure/CN3D/cn3dtut.shtml. H4. Go to View | Animation | Spin for a complete view of the 3D structure of the conserved domain. You can change the Style in which you want to see the 3D structure. The default display presented in the figure for single structures is a combination of Style/Rendering Shortcuts: Worms and Style/Coloring Shortcuts: Secondary Structure , which show a worm backbone, no side chains, and solid objects - arrows

and cylinders - to represent strands and helices. The colors are green for helices , orange for strands , and blue for coils. Arrows point in the N-to-C direction. H5. In the Sequence/Alignment Viewer, you can see where in the 3D structure the selected amino acids are located, by simply selecting them with your mouse. The 3D structure will be highlighted in the position where the selected amino acids are located. I. From Multiple Sequence Alignment to Multiple Sequence Assembly I.1. Using MEGA4, perform a new ClustalW alignment with the 8 exported sequences used in 08_Lab1 (simply select them all from the Word document called ‘08_Lab1 DNA for MEGA’, copy them (Ctrl C) and paste them (Ctrl V) in the MEGA4 Alignment Explorer window).

  • Could you get a good alignment of the sequences? Why?
  • How would you find the alignment between the overlapping regions that are present in these sequences?