Using Weak Supervision to Identify Long-Tail Entities for Knowledge Base Completion *)

Data from relational web tables can be used to augment cross-domain knowledge bases like DBpedia, Wikidata, or the Google Knowledge Graph with descriptions of formerly unknown long-tail entities. In previous work, we have presented an approach to successfully assemble descriptions of long-tail entities from relational HTML tables using supervised matching methods and manually labeled class-specific training data in the form of positive and negative entity matches. Manually labelling training data is a laborious task given knowledge bases covering many different classes. In this work, we investigate reducing the labeling effort for the task of long-tail entity extraction by using weak supervision. We present a bootstrapping approach that requires as supervision only a small set of simple class-specific matching rules, thereby reducing the human supervision effort considerably. We evaluate this weakly supervised approach and find that it performs only slightly worse compared to methods that rely on strong supervision.