The Architecture of SciDB

The original paper can be accessed here from "Scientific and Statistical Database Management, 23rd International Conference, SSDBM 2011, Portland, OR, USA, July 20-22, 2011. Proceedings Editors: Judith Bayard CushingJames FrenchShawn Bowers ISBN: 978-3-642-22350-1 "

SciDB has a great push to help create a tool for scientists and engineers to store and analyze their data.  The requirements were outlined nicely as well as functional and design considerations.  The list is provided here:

Architectural Requirements:

  1. Volume of data: MultiPetabyte - Important for data structure and format
  2. Dominance of array based data - primary examples Geospatial and temporal
  3. Complex Analytics - Diverse tools used to analyze common data sets
  4. Open source tools - To many stories of vendors caught in a pickle between clients that force them not to change and thereby alienate and/or jeopardize the very mission of the project
  5. No overwrite - Everyone wants to keep their data in raw data format for long term storage and reprocessing into new models
  6. Provenance - Ability to trace back data to its source.  In case the source was incorrect (and corrected or not) need to find all data this information was used within to drive decisions.  
  7. Uncertainty - Data needs to be stored with the uncertainty of the measure
  8. Version control Like software versioning (e.g., GitHub) the ability to have traceback to the raw data and models used – as well as variations used within – would be quite powerful

Functionality

  1. Ability to run on cloud/cluster computers for parallelism and concurrency
  2. Data model: Extensible: HDF5
  3. Query Language
  4. Extensibility – computational models ran within the DBMS rather than moving the data to the computation