The software environment currently found in the analytical community consists of a patchwork of incompatible software, proprietary and non-standardized file formats, which is further complicated by incomplete, inconsistent and potentially inaccurate metadata. One cause for this situation is the inherent complexity of process descriptions.
Approach and IT-Solution
To overcome these issues, Allotrope Foundation is developing a comprehensive and innovative framework consisting of metadata dictionaries, domain taxonomies, data standards, and class libraries for managing analytical data throughout its life cycle. Allotrope Data Format (ADF) brings together laboratory data and semantic metadata descriptions and eases the management of a vast amount of data that underpins almost every aspect of drug discovery and development.
A key component of the framework developed by Allotrope Foundation is the semantic model consisting of four domain taxonomies for equipment, materials, processes and results; these are aligned by a top-level ontology. The taxonomies reuse terms from a range of existing standards from the International Union of Pure and Applied Chemistry (IUPAC), the Proteomics Standards Initiative (PSI) or the Chemical Methods Ontology (CHMO). A fine grain semantic model for complex analytical processes is being developed that can express the complete analytical process chain from request, plan analysis, sample preparation, different device settings, measurements and data acquisition up to data transformation of results, reporting and storage of the data. These detailed descriptions are necessary to allow tracking of the result accuracy and conformance to process templates. The model currently covers more than 12 analytical techniques such as mass spectroscopy, high performance liquid chromatography and capillary electrophoresis.
Another important part of the developed framework is the API stack that enables software developers to work with the data without profound knowledge about semantic technologies. The benefit of providing APIs is that developers do not have to concern themselves with RDF, SPARQL, semantics or complex graph patterns.
Process descriptions are complex and require a very flexible representation that allows expressing data at different level of granularity. Here, semantic technologies provide a good basis for creation of a flexible data schema and taxonomies which are built upon existing standards such as RDF, OWL, SKOS, Dublin Core, QUDT, Data Cube Vocabulary and others. By building upon this semantic technology stack the developed model and corresponding instance data can be more easily reused in different contexts such as Linked Data or Internet of Things.
The talk will describe the project from perspective of the framework architect. The semantic model will be described in detail, as well as a use case implementation for the High Performance Liquid Chromatography technique.
There have been already 4 releases of Allotrope Data Format and 8 releases of Allotrope Foundation Taxonomies. The taxonomies are reviewed by subject matter experts from different member companies to ensure broad applicability of the developed framework. Currently the framework and semantic model are evaluated along different use cases. In September Allotrope Data Format (ADF) version 1.0 will be provided to Allotrope Foundation Members and Allotrope Partner Network Members – the first public version will be available at the end of 2016.
Allotrope Foundation is an international, not-for-profit consortium that is developing an innovative, open Framework (including metadata dictionaries, data standards and class libraries) for managing analytical data throughout its life cycle. This effort is fully funded by the members of Allotrope Foundation and is rapidly progressing on our common goals to reduce wasted effort, improve data integrity and allow us to realize the full value of our analytical data.