Writing Bioinformatics Software Biological Databases - Lecture Notes | CMSC 423, Study notes of Computer Science

Material Type: Notes; Class: BIOINFO ALGS, DB, TOOLS; Subject: Computer Science; University: University of Maryland; Term: Unknown 1989;

Typology: Study notes

Pre 2010

Uploaded on 02/13/2009

koofers-user-i1r-2
koofers-user-i1r-2 🇺🇸

10 documents

1 / 24

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CMSC423: Bioinformatic Algorithms,
Databases and Tools
Lecture 4
Writing bioinformatics software
Biological databases
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18

Partial preview of the text

Download Writing Bioinformatics Software Biological Databases - Lecture Notes | CMSC 423 and more Study notes Computer Science in PDF only on Docsity!

CMSC423: Bioinformatic Algorithms,

Databases and Tools

Lecture 4

Writing bioinformatics software Biological databases

Writing bioinformatics software

Bio::Perl

  • http://www.bioperl.org use Bio::Perl; my $seq = read_sequence(“mytest.fa”, “fasta”); my $gbseq = read_sequence(“mytest.gb”, “genbank”); write_sequence(“>test.fasta”, 'fasta', $gbseq); ' vs “?

Bio::Perl

  • Homework question # use Bio:Perl; while ($seq = read_sequence(“test.fa”, 'fasta')) { if ($seq ->length() > 500) { print $seq->primary_id(), “\n”; } } Note: you still need to write your own version...

BioJava

  • http://www.biojava.org import org.biojava.bio.*; String filename = args[0]; BufferedInputStream is = new BufferedInputStream(new FileInputStream(filename)); //get the appropriate Alphabet Alphabet alpha = AlphabetManager.alphabetForName(args[1]); //get a SequenceDB of all sequences in the file SequenceDB db = SeqIOTools.readFasta(is, alpha);

BioJava

  • Question 5 BufferedReader br = new BufferedReader(new FileReader(args[0])); String format = args[1]; String alphabet = args[2]; SequenceIterator iter = (SequenceIterator)SeqIOTools.fileToBiojava(format,alphabet, br); while (iter.hasNext()){ Sequence seq = iter.nextSequence(); if (seq.length() > 500) {System.out.println(seq.getName());} }

BioPython

  • http://www.biopython.org from Bio import SeqIO handle = open(“file.fasta”) seq_record = SeqIO.parse(handle, “fasta”) SeqIO.write(my_records, handle2, "fasta")

BioPython

  • Question 5 from Bio import SeqIO handle = open("test.fasta") for seq_record in SeqIO.parse(handle, "fasta") : if len(seq_record) > 500 : print seq_record.id handle.close()

BioRuby

  • http://www.bioruby.org require 'bio' input_seq = ARGF.read # reads all files inarguments my_naseq = Bio::Sequence::NA.new(input_seq)

BioRuby

  • Question 5 #!/usr/bin/env ruby require 'bio' ff = Bio::FlatFile.new(Bio::FastaFormat, ARGF) ff.each_entry do |f| if f.length > 500 puts f.entry_id end end

SeqAn

  • http://www.seqan.de #include <seqan/sequence.h> #include <seqan/file.h> using namespace seqan; using namespace std; String seq; String name; fstream f; f.open(“test.fasta”); readMeta(f, name, Fasta()); readMeta(f, seq, Fasta());

SeqAn

  • Question 5 String seq; String name; fstream f; f.open(“test.fasta”); while (! f.eof()){ readMeta(f, name, Fasta()); readMeta(f, seq, Fasta()); if (length(seq)){ cout << name << endl; } }

R/BioConductor

  • http://www.bioconductor.org
  • Mainly for statistical applications, e.g. microarray analysis library("affy") library("geneplotter") library("gplots") data <- ReadAffy() eset <- rma(data) e <- exprs(eset) heatmap.2(e, margin=c(15,15), trace="none", col=redgreen(25), cexRow=0.5)

R/BioConductor

  • Book has lots of examples
  • Worth learning more about it – easy to do various cool things
  • example... if time