Notes on Biological Databases - Fall 2008 | CMSC 423, Study notes of Computer Science

Material Type: Notes; Class: BIOINFO ALGS, DB, TOOLS; Subject: Computer Science; University: University of Maryland; Term: Fall 2008;

Typology: Study notes

Pre 2010

Uploaded on 02/13/2009

koofers-user-ud2-1
koofers-user-ud2-1 🇺🇸

10 documents

1 / 15

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CMSC423: Bioinformatic Algorithms,
Databases and Tools
Lecture 5
Biological databases
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Notes on Biological Databases - Fall 2008 | CMSC 423 and more Study notes Computer Science in PDF only on Docsity!

CMSC423: Bioinformatic Algorithms,

Databases and Tools

Lecture 5

Biological databases

Biological databases

What data gets stored?

• DNA

  • string of letters
  • quality information, maybe chromatograms
  • location of genes (ranges along a chromosome)

• Proteins

  • string of letters
  • protein domains
  • 3D coordinates of each atom

• Pathways

  • graph of interactions between genes

For all – often store link to scientific articles related to data

How the data get accessed

• Gene by gene/object by object – targeted at manual

inspection of data

  • usually lots of clicking involved
  • simple search capability
  • similarity searches in addition to text queries

• Bulk – targeted at computational analyses

  • often programmatic access through web server
  • most frequently – just bulk download (ftp)

EMBL European Molecular Biology Lab.

• European version of NCBI

• BioMart query builder

http://www.ebi.ac.uk/embl/

Expasy proteomics server

• Home of Swisprot and other useful information on proteins

http://www.expasy.org

Genome browsers

• UCSC Genome Browser – http://genome.ucsc.edu

• ENSEMBL Genome Browser – http://www.ensemble.org

• Gbrowse http://www.gmod.org

Direct database access - SQL

• CHADO schema – www.gmod.org

Programmatic database access

use DBI; my $dbh = DBI->connect("dbi:Sybase:server=SERV;packetSize=8092", "anonymous", "anonymous"); if (! defined $dbh) { die ("Cannot connect to server\n"); } my $mysqlqry = ; $dbh->do("set textsize 65535"); my $qh = $dbh->prepare($mysqlqry) || die ("Cannot prepare\n"); $qh->execute() || die ("Cannot execute\n"); while (my @row = $qh->fetchrow()){ processrow($row); }

NCBI programmatic access

  • http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html
    • must write your own HTTP client (LWP Perl module helps)
    • queries go directly to web server
    • data returned in XML
  • http://www.ncbi.nlm.nih.gov/Traces/trace.cgi?

cmd=show&f=doc&m=obtain&s=stips

  • stub script provided (query_tracedb)
  • queries still go through web server
  • data returned in a variety of user selected formats
  • For both, limits are set on the amount of data retrieved, e.g. less than

40,000 records at a time

  • Download procedure:
    • figure out # of records to be retrieved ("count" query)
    • read data in allowable chunks
    • combine the chunks