Mattmann Journal of Big Data 2014, 1:6

http://www.journalofbigdata.com/content/1/1/6

SHORT REPORT OpenAccess

Cultivating a research agenda for data science

Chris A Mattmann1,2

Correspondence:

[email protected]

1Jet Propulsion Laboratory,

California Institute of Technology,

4800 Oak Grove Drive M/S 171-264,

91109 Pasadena, USA

2Computer Science Department,

University of Southern California,

941 W. 37th Place, 90089 Los

Angeles, USA

Abstract

I describe a research agenda for data science based on a decade of research and

operational work in data-intensive systems at NASA, the University of Southern

California, and in the context of open source work at the Apache Software Foundation.

My vision is predicated on understanding the architecture for grid computing; on

flexible and automated approaches for selecting data movement technologies and on

their use in data systems; on the recent emergence of cloud computing for processing

and storage, and on the unobtrusive and automated integration of scientific

algorithms into data systems. Advancements in each of these areas are a core need,

and they will fundamentally improve our understanding of data science, and big data.

This paper identifies and highlights my own personal experience and opinion growing

into a data scientist.

Keywords: Data dissemination; Open source; Science algorithm integration; Data

science; Software architecture; Big data

Findings

Over the last decade I have been primarily engaged in research associated with NASA’s

Jet Propulsion Laboratory (JPL), the University of Southern California and the Apache

Software Foundation. The research has explored the fundamentally changing paradigm

of data-intensive systems and its emerging frontier of Big Data and Data Science,andon

how software architecture and software reuse can assist in bridging the boundary in sci-

ence from a previously silo’ed and independent nature to one that is increasingly more

collaborative, and multi-disciplinary. This research has been applied in the development

and delivery of ground data systems software for a number of national scale projects

including the next generation of NASA’s Earth science missions (OCO/OCO-2, NPP

Sounder PEATE, SMAP, etc.); the National Cancer Institute’s Early Detection Research

Network (EDRN), NSF funded activities in geosciences, and radio astronomy, and also in

the recent context of DARPA’s BigData initiative called XDATA.

This paper identifies and highlights my own personal experience growing into a data

scientist. I begin by describing a nexus of training in grid computing and software archi-

tecture inspired through work on the Apache Object Oriented Data Technology (OODT)

project. A need to compare OODT with similar grid technologies led to precise and spe-

cific architectural analyses of grid software and an effort to more completely describe the

architecture of grid computing based on a study of nearly twenty topical grid technologies

over the last decade. This analysis identified gaps in the grid computing realm, specifi-

cally in the areas of data management, and in cataloging and archiving. Grid computing

Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction

in any medium, provided the original work is properly credited.

Data Science Research Agenda: Insights from NASA, USC, and Apache, Thesis of Computers and Information technologies

Related documents

Partial preview of the text

Download Data Science Research Agenda: Insights from NASA, USC, and Apache and more Thesis Computers and Information technologies in PDF only on Docsity!

SH O RT RE P O RT Open Access

Cultivating a research agenda for data science

Chris A Mattmann1,