DBpedia’s Databus: An strategic initiative to facilitate 1 Billion derived Knowledge Graphs

September 10, 2019 by by Sebastian Hellmann

Sebastian Hellmann is member of the Board of Trustees of the DBpedia Association. In this article he gives insights into the DBpedia Databus platform.

The DBpedia Databus is a platform to capture invested effort by data consumers who needed better data quality (fitness for use) in order to use the data and give improvements back to the data source and other consumers. DBpedia Databus enables anybody to build an automated DBpedia-style extraction, mapping and testing for any data they need. Databus incorporates features from DNS, Git, RSS, online forums and Maven to harness the full workpower of data consumers.

Vision

Professional consumers of data worldwide have already built stable cleaning and refinement chains for all available datasets, but their efforts are invisible and not reusable. Deep, cleaned data silos exist beyond the reach of publishers and other consumers trapped locally in pipelines. Data is not oil that flows out of inflexible pipelines. Databus breaks existing pipelines into individual components that together form a decentralized, but centrally coordinated data network in which data can flow back to previous components, the original sources, or end up being consumed by external components,

The Databus provides a platform for re-publishing these files with very little effort (leaving file traffic as only cost factor) while offering the full benefits of built-in system features such as automated publication, structured querying, automatic ingestion, as well as pluggable automated analysis, data testing via continuous integration, and automated application deployment (software with data). The impact is highly synergistic, just a few thousand professional consumers and research projects can expose millions of cleaned datasets, which are on par with what has long existed in deep silos and pipelines.

One Billion interconnected, quality-controlled Knowledge Graphs until 2025

As we are inversing the paradigm form a publisher-centric view to a data consumer network, we will open the download valve to enable discovery and access to massive amounts of cleaner data than published by the original source. The main DBpedia Knowledge Graph - cleaned data from Wikipedia in all languages and Wikidata - alone has 600k file downloads per year complemented by downloads at over 20 chapter, e.g. http://es.dbpedia.org as well as over 8 million daily hits on the main Virtuoso endpoint. Community extension from the alpha phase such as DBkWik, LinkedHypernyms are being loaded onto the bus and consolidated and we expect this number to reach over 100 by the end of the year. Companies and organisations who have previously uploaded their backlinks here will be able to migrate to the databus. Other datasets are cleaned and posted. In two of our research projects LOD-GEOSS and PLASS, we will re-publish open datasets, clean them and create collections, which will result in DBpedia-style knowledge graphs for energy systems and supply-chain management.

The full document about Databus is available at: https://databus.dbpedia.org/dbpedia/publication/strategy/2019.09.09/strategy_databus_initiative.pdf

DBpedia’s Databus: An strategic initiative to facilitate 1 Billion derived Knowledge Graphs

September, 2019

August, 2019

July, 2019

June, 2019

May, 2019

April, 2019

March, 2019

February, 2019

January, 2019

December, 2018

September, 2018

August, 2018

July, 2018

June, 2018

May, 2018

April, 2018

September, 2017

August, 2017

July, 2017

June, 2017

May, 2017

April, 2017

March, 2017

September, 2016

August, 2016

July, 2016

June, 2016

May, 2016

April, 2016

March, 2016

February, 2016

September, 2015

August, 2015

July, 2015

June, 2015

May, 2015

April, 2015

Search form

DBpedia’s Databus: An strategic initiative to facilitate 1 Billion derived Knowledge Graphs