Knowledge Graphs are on the rise

August 24, 2018

Knowledge graphs have now been “officially” announced to be on the rise by Gartner’s 2018 Hype Cycle for Artificial Intelligence. As we take pride in SEMANTiCS being one of the platforms to help “pulling knowledge graphs to the surface" (as Gartner calls it), we take this opportunitý to share a contribution recently published via LinkedIN Pulse by SEMANTiCS Conference’s co-founder Andreas Blumauer. This article provides a general overview of the context knowledge graphs are embedded in, an introduction to starting a knowledge graph yourself as well as key learnings the can be extracted from Google’s Knowledge Vault.

Knowledge Graphs - Connecting the Dots in an Increasingly Complex World

by Andreas Blumauer

Bye Bye Silos!

Those who stay on their islands will fall back. This statement is valid on nearly any level of our increasingly complex society, ranging from whole cultural areas down to single individuals. The drawbacks of isolated departmental thinking becomes even more obvious when looking on the competencies and skill sets that become more and more relevant in our data- and knowledge-driven working environment: they should be interdisciplinary, multilingual, and should be based on a systemic / integrative attitude and on an agile mindset.

When looking at our data itself: it’s kept in silos, and therefore it is a highly time-consuming task for every knowledge worker to identify the right dots and information pieces, to connect them, to make sense of them, and finally to communicate and interpret them the right way. A shift to data-centric execution instead of document-based communication can be seen in many industries.

Knowledge Graphs (KG) have become increasingly important to support decision and process augmentation based on linked data. In this article we explore how enterprises can develop their own KGs along the whole Linked Data Life Cycle.

Graphs are on the Way

Some vendors of machine learning and AI technologies have aroused great hopes to fix the problem of disconnected and low-quality data automatically. It’s not perfectly true that machines are able to learn from any kind of data, especially from unstructured information, to come to a level that could substitute subject matter experts. The truth: algorithms like deep learning work only well when a lot of data (more than even large corporations usually have) of the same kind is available, and even then, only rather simple cognitive processes like ‘classification’ can be automated.

AI technologies are currently focused on solutions that automate processes. By that, other types of AI applications are well forgotten: decision and process augmentation, which refer to systems supporting knowledge workers by connecting some, but not all of the dots automatically. Applications like these are increasingly based on graph technologies, as they can map and support complex knowledge domains and their heterogeneous data structures in a more agile manner. In mid-2018, Gartner has identified Knowledge Graphs as new key technologies in their Hype Cycle for Artificial Intelligence and in their Hype Cycle for Emerging Technologies.

What is a Knowledge Graph?

It’s all about things, not strings: A Knowledge Graph represents a knowledge domain. It connects things of different types in a systematic way. Knowledge graphs encode knowledge arranged in a network of nodes and links rather than tables of rows and columns. By that, people and machines can benefit from a dynamically growing semantic network of facts about things and can use it for data integration, knowledge discovery, and in-depth analyses.

Graphs are all around: Facebook, Microsoft, Google, all of them operate their own graphs as part of their infrastructure. Google introduced in May 2012 its own version and interpretation of a Knowledge Graph, and since then the notion of ‘knowledge graph’ got more and more popular but linked to the Silicon Valley company. On the surface, information from the Knowledge Graph is used to augment search results and to enhance its AI when answering direct spoken questions in Google Assistant and Google Home voice queries. Behind the scenes and in return, Google uses its KG to improve its machine learning.

But Google’s KG (GKG) has a big disadvantage: it is quite limited how users and software agents can interact with it and its API returns only individual matching entities, rather than graphs of interconnected entities. Even Google itself recommends that, if you need the latter, data dumps from Wikidata should be used instead. But Wikidata is only one out of currently over 1,200 sources, which are highly structured and interconnected and most frequently available for download and reuse as standards-based knowledge bases. This ‘graph of graphs’ is also known as the ‘Semantic Web’. One might argue that this data still cannot easily be integrated in enterprise information systems due to a lack of quality control or missing license information etc.

Nevertheless, there are several reasons why organizations should explore this data management approach, to find out whether a ‘corporate semantic web’ reflecting their own specific knowledge domains would make sense or not.

Would you like to dig deeper? Register NOW for SEMANTiCS!

Build your own Knowledge Graph

As we all know, many roads lead to Rome. Some of them are more exhaustive but more solid and sustainable, some of them are less explored but at the end more efficient, and in many cases the best way can only be found when already on the move.

In any case, the most frequently used approaches to develop knowledge graphs are: knowledge graphs can be curated like Cyc, edited by the crowd like Wikidata, extracted from large-scale, semi-structured knowledge bases such as Wikipedia, like DBpedia and YAGO, or they can be created by information extraction methods for unstructured or semi-structured information, which lead to knowledge graphs like Knowledge Vault.

The later approach sounds most promising since it better scales through a fully automated methodology, not only during the initial creation phase, but also for the continuous extension and improvement. A most fundamental problem with automatic knowledge extraction is the fact that and it cannot distinguish an unreliable source from an unreliable extractor. When learning from Google’s Knowledge Vault, we can assume the following:

Approaches primarily focused on text-based extraction can be very noisy
Better results can be achieved when extractors combine information obtained via analysis of text, tabular data, page structure, and human annotations with prior knowledge derived from existing knowledge repositories
Extracted entity types and predicates should come from a fixed ontology
Knowledge graphs should separate facts about the world from their lexical to make it a structured repository of knowledge that is language independent
Supervised machine learning methods for fusing distinct information sources are most promising
SKOS as a W3C standard serves as a solid starting point to create an Enterprise KG

A systematic view on how Knowledge Graphs can be created, maintained, extended and used along the whole Linked Data Life Cycle has been provided by Semantic Web Company just recently in the course of the latest release of its PoolParty Semantic Suite:

Gartner states in its Hype Cycle for Artificial Intelligence, 2018: “The rising role of content and context for delivering insights with AI technologies, as well as recent knowledge graph offerings for AI applications have pulled knowledge graphs to the surface.”

Join Andreas’ Workshop “The Fast Track to Knowledge Engineering” at SEMANTiCS Conference: Register now! Limited number of available seats.

About SEMANTiCS

The annual SEMANTiCS conference is the meeting place for professionals who make semantic computing work, and understand its benefits and know its limitations. Every year, SEMANTiCS attracts information managers, IT-architects, software engineers, and researchers, from organisations ranging from NPOs, universities, public administrations to the largest companies in the world. http://www.semantics.cc

About the author

Andreas Blumauer holds a Master's degree in business studies and information technologies from University of Vienna. For the last 14 years, he has been CEO of the Semantic Web Company (SWC). At SWC, he is responsible for business development and strategic product management of PoolParty Semantic Suite.

Andreas has been a pioneer in the area of the Semantic Web since 2001. He is editor of the first comprehensive book in the area of the semantic web. He is an experienced consultant in the areas of information management, metadata management, linked data, search, analytics, machine learning and semantic technologies.

He has been working on the specification and optimisation of PoolParty technology platform - a semantic middleware to develop applications on top of linked data, machine learning, and semantic web standards.

He is a regular speaker at international conferences with topics ranging from artificial intelligence to knowledge management.