Dimensions of Data Quality{1}

The author of this article starts off by introducing the idea of “dimensions”, such as accuracy, consistency and timeliness and asks if these “dimensions” actually exists as intelligible concepts?  The author believes a strong case can be made that we are not thinking as clearly as we can be in this area, and that there is room for improvement. He then asks where does the term “dimension” come from when talking about data quality? In context of data quality, dimension is used as an analogy. The term gives the impression that data quality is as concrete as a solid object and that the dimensions of data quality can be measured. In data quality, the term dimension could be used interchangeably with criterion, a standard of judgment. Since data is immaterial, stating that the dimensions can be measured is an astonishing claim. The author then asks, are the dimensions credible? The more “duplication” there is in a list alongside “completeness” and “consistency”, the lower data quality likely it is, while the more completeness there is the higher data quality is. Therefore, the inclusion of “duplication” in a list of dimensions of data quality immediately creates lack of consistency in the list. A much more serious problem is that there seems to be no common agreement on what the dimensions of data quality actually are. Lastly, the author asks, are the dimensions over-abstractions? A worry is that each dimension is not a single concept, but is either a collection of disparate concepts or a generalization.

This article relates very much to last week’s topic: Data Quality and Integration. We know that quality data are the foundation of information processing and are essential for well-run organizations. I liked that the author of this article admits that there is no common agreement on what the dimensions of data quality actually are, which means working in the Database field is definitely a challenge, but there is much room for improvement.

