Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Mitochondrial Protein Sequence-Computer Sciences Applications and Systems-Project Presentation, Slides of Applications of Computer Sciences

Alliance University Applications of Computer Sciences

This is project presentation related to Application of Computer Science course. This presentation was delivered in presence of Prof. Ashish Behari at Alliance University. Its main points are: Mitochondrial, Protein, Sequence, Extraction, Selection, Strategies, Classification, Techniques, Amino, Acids

Typology: Slides

2011/2012

Uploaded on 07/16/2012

samderiya 🇮🇳

4.3

(4)

62 documents

1 / 35

This page cannot be seen from the preview

Don't miss anything!

docsity.com

Discover Slides of Applications of Computer Sciences Alliance University

Partial preview of the text

Download Mitochondrial Protein Sequence-Computer Sciences Applications and Systems-Project Presentation and more Slides Applications of Computer Sciences in PDF only on Docsity!

Outline of Presentation

 Introduction

 Basic Concepts

 Feature Extraction strategies

 Feature Selection strategies

 Classification techniques

 Description of Author’s results

 Comparison with Author’s results

 Comparison with other methods

 Conclusions

 Future Work

Basic Concepts

 Before starting discussion to our project one must have some basic concepts discussed as :

Proteins
Amino acids
Mitochondrial protein sequence

Proteins

 Proteins are the major components of living organisms and constitute more than 25% weight of a cell.

 It performs functions e.g. catalysis, transport, transportation, digestion, movement, sensory capabilities, sense of taste, sense of vision and control the gene function.

 Proteins are made up of strings of amino acids(usually represented by English letters).

Amino acids

Amino acid 3-letter Abbreviation 1-letter Abbreviation Alanine Ala A Cysteine Cys C Aspartic acid Asp D Glutamic acid Glu E phenylalanine Phe F Glycine Gly G Histidine His H Isoleucine Ile I Lysine Lys K Leucine Leu L Methionine Met M Asparagine Asn N Proline Pro P Glutamine Gln Q Arganine Arg R Serine Ser S Threonine Thr T Valine Val V Tryptophan Trp W Tyrosine Tyr Y

Mitochondrial Protein Sequence

 >sp|P31937|3HIDH_HUMAN 3-hydroxyisobutyrate dehydrogenase, mitochondrial precursor (EC 1.1.1.31) (HIBADH) - Homo sapiens (Human).

MAASLRLLGAASGLRYWSRRLRPAAGSFAAVCSRSVASKTP

VGFIGLGNMGNPMAKNLMKHGYPLIIYDVFPDACKEFQDAGE

QVVSSPADVAEKADRIITMLPTSINAIEAYSGANGILKKVKKGS

LLIDSSTIDPAVSKELAKEVEKMGAVFMDAPVSGGVGAARSG

NLTFMVGGVEDEFAAAQELLGCMGSNVVYCGAVGTGQAAKI

CNNMLLAISMIGTAEAMNLGIRLGLDPKLLAKILNMSSGRCWS

SDTYNPVPGVMDGVPSANNYQGGFGTTLMAKDLGLAQDSA

TSTKSPILLGSLAHQIYRMMCAKGYSKKDFSSVFQFLREEETF

Data set

 The dataset used in this project has been generated in Jiang et al., 2006 and is received on request from [email protected].

 Comprises of 499 and 681 mitochondrial (positive) and non- mitochondrial (negative) sequences.

Feature Extraction Strategies

 We have used four kinds of proteins representations including:

Amino acid composition (AAC).
Pseudo amino acid composition (PseAAC).
Dipeptide composition (Dp).
Split amino acid composition (SAAC).

Pseudo amino acid composition

 Contains a set of greater than 20 discrete factors, where the first 20 represent the components of its conventional Amino Acid composition while the additional factors incorporate some sequence-order information via various modes. P = [ P 1 , P 2 ,…P 20 …. P20+ λ ]

 Whereas P 1 , P 2 ,…P 20 are the normalized occurrence frequencies of 20 amino acids and

 P 21 , P 22 , ……, PΛ are the 1st-teir to λ - tier correlation factor of amino acid sequence in the protein chain determined based on hydrophobicity and hydrophilicity.

 Hydrophobicity and hydrophilicity correlation functions are used

Dipeptide composition

 It is used to transform the variable length of proteins sequences to fixed length feature vectors.

 Occurrence frequency of every consecutive pair of amino acids is calculated F(i) = P(i) / P Where P(i) is the occurrence frequency of pair ‘ i’ and P is the total number of pairs in a protein sequence

 400 dimensional feature vector.

Split amino acid composition (Contd.)

 We have some very small number of sequences in the dataset where total amino acids in the protein sequence is less than 50.

 So for that we divided it into the same three parts but with:

10 amino acids of N termini
10 amino acids C termini and
the region between these two terminuses.

Feature Selection Strategies

 We have employed two different feature selection strategies for dipeptide composition including:

Rank Features Selection
Features Selection Through GA

Testing Methods

 Jackknifing test

One of the protein sequence pattern is taken as test sample and remaining N-1 sequence patterns are considered as training patterns.
Label of a test sample is predicted using the rest of the N- training patterns.
This process is repeated for all N patterns.

 Independent test

 Self Consistency test

Performance Measurements

 Sensitivity(), Specificity(), Accuracy (ACC), Mathew Correlation Coefficient (MCC)

Sensitivity = TP/ (TP + FN)
Specificity = TP/ (TP + FP)
Acc = (TP + TN) / (TP + TN + FP +FN)
MCC = TP.TN-FN.FP/Sqrt {(TP+FN)(TP+FP)(TN+FN)(TN+FP)} TP = true positive TN = true negative FP = false Positive FN = false negative