A Metrics-Driven Approach for Quality Assessment of Linked Open Data
The main objective of the Web of Data paradigm is to crystallize knowledge through the interlinking of already existing but dispersed data. The usefulness of the developed knowledge depends strongly on the quality of the published data. Researchers have observed many deficiencies with regard to the quality of Linked Open Data. The first step towards improving the quality of data released as a part of the Linked Open Data Cloud is to develop tools for measuring the quality of such data. To this end, the main objective of this paper is to propose and validate a set of metrics for evaluating the inherent quality characteristics of a dataset before it is released to the Linked Open Data Cloud. These inherent characteristics are semantic accuracy, syntactic accuracy, uniqueness, completeness and consistency. We follow the Goal-Question-Metric approach to propose various metrics for each of these five quality characteristics. We provide both theoretical validation and empirical observation of the behavior of the proposed metrics in this paper. The proposed set of metrics establishes a starting point for a systematic inherent quality analysis of open datasets.