Technology Stack to Achieve

MotivationDefining a Technology Strategy to Support Complex Product Development

 

Thesis StatementPresent the technology selection criterion at the data movement, aggregation, analysis and reporting layers

Executive Summary

Technology developments have been occurring rapidly over the past few years which provides organizations willing to take a fresh look at their knowledge management strategies great opportunities to accelerate.  Nothing is free of course - however - understanding the new rules of the game can help one take advantage of these technologies in a scalable and regulatory heavy environment that can inform the business in an agile method that both information technologists and business subject matter experts (i.e., scientists and engineers).  The high level user requirements include1:

High Level User Requirements

  1. System must be simple, reliable and convenient
  2. System must be platformed in such a manner that enables feature, function and design changes can be made in an agile method at low costs (Plug and Play Components)
  3. System must hit a low price point

Some maybe stating that these user requirements are indeed to high level - but we must put these forth to ensure the design around functional requirements are bounded in long term strategy of costs, quality and agility.  Drilling a little deeper on these user requirements from a scientific and engineering perspective exposes some key architectural requirements nicely defined by a broad spectrum of users under the leadership of "Extremely Large Database" and published in this article ref - you can also see the SciDB work here.

Architectural Requirements

Volume of data

Assume petabyte as it is important for scalability and flexibility in terms of data structure and format

Dominance of graph and array based data

Primary examples genealogy (graph) and temporal (array/column)

Complex Analytics

Diverse tools used to analyze common data sets

Open source tools

Acknowledging a open collaborative effort from the forefront that can help leverage concepts and ideas across disciplines

No overwrite

Everyone wants to keep their data in raw data format for long term storage and reprocessing into new models

Provenance

Ability to trace back data to its source.  In case the source was incorrect (and corrected or not) need to find all data this information was used within to drive decisions.  

Uncertainty

Data needs to be stored with the uncertainty of the measure

Version control

Like software versioning (e.g., GitHub) the ability to have traceback to the raw data and models used – as well as variations used within – would be quite powerful

Functional requirements around a system to help in the design and ultimate definition of manufacturing processes generically require one to define material and process parameters to final product attributes.  As such we put forth the following priorities matrix: 

Functional Requirements

PriorityFunctionStrategy
1Genealogy of materials with associated array dataGraph Database that inherently represents material genealogy and associative array based parameters (temporal)
2Material attributes measured throughout the process (Scalar Values)Load data to graph database linked to materials
3Continuous Data Collected throughout the processLoad data into a highly flexible and multidemensional structure to starndardize the array based formats (HDF5) or column store databases.
4Data Block Generation (Short Term) Moving calculations to data (long term)Leverage functional programming language that can interrogate graph and HDF5 structure.  Short term helps isolate data aggregation from analysis enabling one to learn independently and move towards long term as those learnings mature.  
5Model building to link CQAs to upstream process trajectories and attributesMultiple modeling techniques applied to the data blocks to provide insights into the data and processes thereof.
6Web based reporting scientific dataLeverage drupal as an interface to present scientists modeling results and associative data blocks

 

 

Points to Deliver
Data AggregationGraph and Column based databases
Data AnalysisOpen Source Platforms - common data blocks to answer key problems driven by the science/engineering requirements
Data Reporting 

 

Conclusions

Restate Thesis 

 

1: The Innovators Dilemma: Pages 245-246.