Research – Paper 237
Abstract
In many applications, there is an increasing need for the new types of RDF data analysis that are not covered by standard reasoning tasks such as SPARQL query answering. One such important analysis task is entity comparison, i.e., determining what are similarities and differences between two given entities in an RDF graph. For instance, in an RDF graph about drugs, we may want to compare Metamizole and Ibuprofen and automatically find out that they are similar in that they are both analgesics but, in contrast to Metamizole, Ibuprofen also has a considerable anti-inflammatory effect. Entity comparison is a widely used functionality available in many information systems, such as universities or product comparison websites. However, comparison is typically domain-specific and depends on a fixed set of aspects to compare. In this paper, we propose a formal framework for domain-independent entity comparison over RDF graphs. We model similarities and differences between entities as SPARQL queries satisfying certain additional properties, and propose algorithms for computing them.
Though I find the approach quite well defined and potentially useful, I worry about its scalability. How well would it work to find interesting commonalities/differences in a pool of millions of entities described using a model containing tens of thousands of properties?
Hi Héctor, thanks much for the comment! Indeed, we are currently working on scalable algorithms for both (most specific) similarities and (most general) differences. 1) Despite the complexity of finding a difference query being quite high, it stems from the presence of blank nodes. In real-world scenario we would never hit the worst case. 2) In addition, in a reasonable scenario the size/depth of the query is bounded by some small value (due to readability), in which case similarity and difference computation becomes scalable.
Very interesting to compare to this approach: https://link.springer.com/chapter/10.1007/978-3-319-60438-1_61
Thanks much for the reference, Artem!
I see a potential application in (traditional) instance matching where one of the task is to find equivalent entities.
Hi Ernesto, thank you for the suggestion! Indeed, the framework could be used for equivalent and near-equivalent instance matching and discovery.