Poster – Paper 632
Abstract
In text-based semantic analysis the task of named entity linking (NEL) establishes the fundamental link between unstructured data elements and knowledge
base entities. The increasing number of applications complementing web data
via knowledge base entities has led to a rich toolset of NEL frameworks [4,7]. To
resolve linguistic ambiguities, NEL relates available context information via statistical analysis, as e.g. term co-occurrences in large text corpora, or graph analysis, as e.g. connected component analysis on the contextually induced knowledge
subgraph. The semantic document annotation achieved via NEL algorithms can
furthermore be complemented, upgraded or even substituted via manual annotation, as e.g. in [5]. For this manual annotation task, a popular approach suggests
a set of potential entity candidates that fit to the text fragment selected by the
user, who decides about the correct entity for the annotation. The high degree
of natural language ambiguity causes the creation of a huge sets of entity candidates to be scanned and evaluated. To speed up this process and to enhance
its usability, we propose a pre-ordering of the entity candidates set for a predefined context. The complex process of NEL context analysis often is too time
consuming to be applied in an online environment. Thus, we propose to speed
up the context computation via approximation based on the offline generation
of context weight vectors. For each entity, a context vector is computed before-
hand and is applied like a hash for quickly computing the most likely entity
candidates with respect to a given context. In this paper, the process of entity
hashing via context weight vectors is introduced. Context evaluation via weight
vectors is evaluated on the test case of SciHi 1 , a web blog on the history of
science providing blog posts semantically annotated with DBpedia entities.
Leave a Reply (Click here to read the code of conduct)