Data Integration and Scientific Workflows Projects in Computer Science - Prof. Bertram T. , Study Guides, Projects, Research of Computer Science

Information about various research projects in the field of data integration and scientific workflows offered in the ecs 289f course at uc davis. The projects include both theory and practice aspects, and cover topics such as query rewriting, formal concept analysis, dataflow process networks, and scientific data analysis workflows. Students are encouraged to collaborate on projects and use resources like toscana system and kepler scientific workflow system.

Typology: Study Guides, Projects, Research

Pre 2010

Uploaded on 07/30/2009

koofers-user-sv0
koofers-user-sv0 🇺🇸

10 documents

1 / 5

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
ECS 289F: Scientific Data Management
Dr. Bertram Lud¨
ascher
Dept. of Computer Science, UC Davis
Winter 2005
Research Project Descriptions
REMARKS
Choosing a Topic. By 3 pm Wednesday, January 26th, please send me a list, ordered by preference, of at
least 3 projects that you would like to work on (email [email protected] with subject=ECS289F).
I will aim at accomodating your preferences as much as possible. If you have questions regarding a topic,
see me at the office hour on Wednesday (preferred) or send email.
Preparation. The project descriptions given below provide the basic information to help you decide on a
topic. However, both theory and practice projects will require follow-up meetings with me (the earlier the
better!), e.g., to discuss the focus of a given reading assignment, how to summarize the articles, or how to
approach the practice projects (some might need access to restricted resources), etc. You can meet me either
during my office hours (Wed+Fri, 11am-noon) or during additional scheduled meetings.
Teamwork. Several projects are closely related and can be done in teams of typically 2 students.
Presentation. All projects require a presentation at class (e.g., using Powerpoint slides, or for L
A
T
EX friends
using PROSPER). In addition, theory projects require a (brief) written report, practice projects a system
demonstration (or report). Details depend on the particular project.
1 Data Integration
1.1 (T) Foundations of Data Integration & Query Rewriting
The goal of this theory project is to provide an overview of the foundations of query rewriting for data
integration, including Global-as-View (GAV) and Local-as-View (LAV) techniques. Good introductions and
the primary sources for this project are [KOC H, 2001, Chapter 3] and [LENZE RIN I, 2002].
1.2 (T) Special Algorithms in Data Integration
The goal of these theory projects is to study one or more specific algorithms in data integration, for example:
1.2.1 Answering Queries using Views
Here the setting is often LAV, i.e., the content of sources is defined via views over a (virtual) global schema.
The user issues a query against the global schema and the system has to find a rewriting that uses only the
views [HALEVY, 2001], [DU SCH KA et al., 2000].
1.2.2 Answering Queries with Limited Access Patterns
Sometimes sources can only execute certain queries, e.g., given via a set of access (or binding)patterns. Thus a
query plan has to be found that observes those patterns [NASH & LUD ¨
ASC HER , 2004], [D EUTS CH et al., 2005].
1
pf3
pf4
pf5

Partial preview of the text

Download Data Integration and Scientific Workflows Projects in Computer Science - Prof. Bertram T. and more Study Guides, Projects, Research Computer Science in PDF only on Docsity!

ECS 289F: Scientific Data Management Dr. Bertram Lud¨ascher

Dept. of Computer Science, UC Davis Winter 2005

Research Project Descriptions

REMARKS

Choosing a Topic. By 3 pm Wednesday, January 26 th, please send me a list, ordered by preference, of at least 3 projects that you would like to work on (email [email protected] with subject=ECS289F). I will aim at accomodating your preferences as much as possible. If you have questions regarding a topic, see me at the office hour on Wednesday (preferred) or send email.

Preparation. The project descriptions given below provide the basic information to help you decide on a topic. However, both theory and practice projects will require follow-up meetings with me (the earlier the better!), e.g., to discuss the focus of a given reading assignment, how to summarize the articles, or how to approach the practice projects (some might need access to restricted resources), etc. You can meet me either during my office hours (Wed+Fri, 11am-noon) or during additional scheduled meetings.

Teamwork. Several projects are closely related and can be done in teams of typically 2 students.

Presentation. All projects require a presentation at class (e.g., using Powerpoint slides, or for LATEX friends using PROSPER). In addition, theory projects require a (brief) written report, practice projects a system demonstration (or report). Details depend on the particular project.

1 Data Integration

1.1 (T) Foundations of Data Integration & Query Rewriting

The goal of this theory project is to provide an overview of the foundations of query rewriting for data integration, including Global-as-View (GAV) and Local-as-View (LAV) techniques. Good introductions and the primary sources for this project are [KOCH, 2001, Chapter 3] and [LENZERINI, 2002].

1.2 (T) Special Algorithms in Data Integration

The goal of these theory projects is to study one or more specific algorithms in data integration, for example:

1.2.1 Answering Queries using Views

Here the setting is often LAV, i.e., the content of sources is defined via views over a (virtual) global schema. The user issues a query against the global schema and the system has to find a rewriting that uses only the views [HALEVY, 2001], [DUSCHKA et al., 2000].

1.2.2 Answering Queries with Limited Access Patterns

Sometimes sources can only execute certain queries, e.g., given via a set of access (or binding) patterns. Thus a query plan has to be found that observes those patterns [NASH & LUD ASCHER¨ , 2004], [DEUTSCH et al., 2005].

1.2.3 Semantic Integration

Here the goal is to study and compare recent ontology-based mediation approaches, e.g., [TZITZIKAS et al., 2002], [PEIM et al., 2002], and [GOBLE et al., 2001]. In these, data integration involves “glue ontologies” in addition to the local and global database schemas. The final list of papers is TBD and might include the above or others from (cf. the survey [WACHE et al., 2001]).

1.2.4 Schema Matching.

In schema matching, correspondences between elements of two or more schemas need to be identified. References include those in the special SIGMOD Record section on semantic integration [SIG, 2004] and others.

1.3 (P) Practice of Data Integration

The goal of this project is to implement a simple GAV mediator prototype for a Global-as-View (GAV) approach, i.e., in which query literals are successively unfolded until an executable query plan is reached. This view unfolding is based on a standard unification algorithm. Advanced constructs, e.g. for handling recursive views can be added. This practice project lends itself to collaboration with Project 1.1).

2 Knowledge Representation

2.1 (T) Biological Ontologies and Pathways

The goal of this theory project is to give an overview of KR techniques used to represent biological infor- mation, e.g., the Gene Ontology [CONSORTIUM, 2002] or biological pathway databases such as EcoCyc and BioCyc: [KARP, 1999], [KARP, 2000], [KARP, 2001], [BIO, 2003].

2.2 (T+P) Introduction to Formal Concept Analysis (FCA)

The goal of this project is to provide an introduction to FCA and some of its applications. For the practical part, examples should be prepared using the Toscana system [GANTER & WILLE, 1999], [BURMEISTER, 2003], [TOS, 2003].

2.3 (T+P) Benchmarking Ontology Reasoners

The goal of this project is to use experiment with and compare a number of reasoning engines for ontolo- gies, e.g., Racer, FaCT, Jena, Pellet, or the Jess OWL reasoner. (Note that many of these can be used within Prot´eg´e.) [HORROCKS, 1999], [HAARSLEV & MLLER, ], [PROGRAMME, ], [KOPENA & REGLI, ], [MINDSWAP, ], [PRO, 2003].

3 Scientific Workflows

Most of these projects are conducted using (and possibly extending) the Kepler scientific workflow system [KEP, 2004]. Since Kepler extends the Ptolemy II system (PTII) [PTO, 2004], it is a good idea to install PTII first, run a number of example models (called workflows in Kepler), and then learn how to create your own models/workflows using the graphical user interface (Vergil) [BROOKS et al., 2004]. Then you can start to tackle Kepler. Note that there are Kepler mailing lists and IRC channels on which Kepler developers exchange valuable information.

[DOUGLAS THAIN & LIVNY, 2004] Todd Tannenbaum Douglas Thain & Miron Livny. Distributed Com- puting in Practice: The Condor Experience. Concurrency and Computation: Practice and Experience, 2004. http://www.cs.wisc.edu/condor/doc/condor-practice.pdf.

[DUSCHKA et al., 2000] Oliver M. Duschka, Michael R. Genesereth, & Alon Y. Levy. Recursive Query Plans for Data Integration. Journal of Logic Programming, 43(1):49–73, 2000.

[GANTER & WILLE, 1999] Bernhard Ganter & Rudolf Wille. Formal Concept Analysis – Mathemati- cal Foundations. Springer, 1999. http://www.springer.de/cgi-bin/search book.pl?isbn= 3-540-62771-5.

[GOBLE et al., 2001] C. Goble, R. Stevens, G. Ng, S. Bechhofer, N. Paton, P. Baker, M. Peim, & A. Brass. Transparent Access to Multiple Bioinformatics Information Sources. IBM Systems Journal, 40(2):534–551,

  1. http://www.research.ibm.com/journal/sj/402/goble.pdf.

[GRI, 2004] GridFTP, 2004. http://www.globus.org/datagrid/gridftp.html.

[HAARSLEV & MLLER, ] Volker Haarslev & Ralf Mller. Racer: Renamed ABox and Concept Expression Reasoner. http://www.sts.tu-harburg.de/∼r.f.moeller/racer/.

[HALEVY, 2001] Alon Halevy. Answering Queries Using Views: A Survey. VLDB Journal, 10(4):270–294,

  1. http://www.cs.washington.edu/homes/alon/site/files/view-survey.ps.

[HORROCKS, 1999] Ian Horrocks. The FaCT System, 1999. http://www.cs.man.ac.uk/∼horrocks/ FaCT/.

[KARP, 1999] Peter D. Karp. EcoCyc: The Resource and the Lessons Learned. In Bioinformatics Databases and Systems. Kluwer, 1999. http://www.ai.sri.com/pkarp/pubs/ecocyc-lessons.ps.

[KARP, 2000] Peter D. Karp. An Ontology for Biological Function Based on Molecular Interactions. Bioin- formatics, 16(3):269–285, 2000. http://www.ai.sri.com/pubs/files/887.ps.

[KARP, 2001] Peter D. Karp. Pathway Databases: A Case Study in Computational Symbolic Theories. Sci- ence, 293:2040–2044, 2001. http://www.ai.sri.com/pubs/full.php?id=880.

[KEP, 2004] KEPLER: A System for Scientific Workflows, 2004. http://kepler-project.org.

[KOCH, 2001] Christoph Koch. Data Integration against Multiple Evolving Autonomous Schemata. PhD the- sis, Technische Universit¨at Wien, Austria, 2001. http://www.dbai.tuwien.ac.at/staff/koch/ download/thesis 20010516 1500 final.pdf.

[KOPENA & REGLI, ] Joseph Kopena & William Regli. OWLJessKB: A Semantic Web Reasoning Tool. http://edge.cs.drexel.edu/assemblies/software/owljesskb/.

[LEE & PARKS, 1995] Edward A. Lee & Thomas Parks. Dataflow Process Networks. Proceedings of the IEEE, 83(5):773–799, May 1995. http://citeseer.nj.nec.com/455847.html.

[LENZERINI, 2002] M. Lenzerini. Data Integration; A Theoretical Perspective. Tutorial at the ACM Sym- posium on Principles of Database Systems (PODS), 2002. http://www.sigmod.org/pods/tut/le. pdf.

[MINDSWAP, ] Mindswap. Pellet OWL Reasoner. http://www.mindswap.org/2003/pellet/index. shtml.

[NASH & LUD ASCHER¨ , 2004] Alan Nash & Bertram Lud¨ascher. Processing Unions of Conjunctive Queries with Negation under Limited Access Patterns. In Intl. Conference on Extending Database Technology (EDBT), Heraklion, Crete, Greece, 2004.

[PEIM et al., 2002] Martin Peim, Enrico Franconi, Norman W. Paton, & Carole A. Goble. Query Processing with Description Logic Ontologies Over Object-Wrapped Databases. In 14th Intl. Conference on Scientific and Statistical Database Management (SSDBM), Edinburgh, Scotland, 2002. http://citeseer.nj.nec. com/peim01query.html.

[PRO, 2003] Prot´eg´e 2000 Project Page, 2003. http://protege.stanford.edu/.

[PROGRAMME, ] HP Labs Semantic Web Programme. Jena A Semantic Web Framework for Java. http: //jena.sourceforge.net/.

[PTO, 2004] PTOLEMY II project and system. Department of EECS, UC Berkeley, 2004. http://ptolemy. eecs.berkeley.edu/ptolemyII/.

[R, 2004] R – Statistical Data Analysis, 2004. http://www.r-project.org.

[SIG, 2004] SIGMOD Record, Special Section on Semantic Integration, December 2004. http://www. sigmod.org/sigmod/record/issues/0412/.

[SRB, 2004] SDSC Storage Resource Broker, 2004. http://www.npaci.edu/DICE/SRB/.

[TOS, 2003] ToscanaJ SourceForge Project, 2003. http://toscanaj.sourceforge.net/.

[TZITZIKAS et al., 2002] Yannis Tzitzikas, Nicolas Spyratos, & Panos Constantopoulos. Translation for Me- diators over Ontology-based Information Sources. In Second Hellenic Conf. on Artificial Intelligence (SETN),

  1. citeseer.nj.nec.com/tzitzikas02query.html.

[WACHE et al., 2001] H. Wache, T. V ¨ogele, U. Visser, H. Stuckenschmidt, G. Schuster, H. Neumann, & S. H ¨ubner. Ontology-Based Integration of Information – A Survey of Existing Approaches. In Proc. of the IJCAI-01 Workshop: Ontologies and Information Sharing, 2001. http://www.cs.vu.nl/∼heiner/ public/ois-2001.pdf.