Reference Data

by Alexander V

The article talks about the advances of database design and modeling and how it is a necessary skill. Then it goes on to question whether different data models have different rates of accuracy and whether the data model can have all the design information for a database. The author states that there are limits on what models can do and failure to understand the limits can lead to data management problems. He says data modeling in general is focused on the logical level, which is a good thing. The problem that the author brings to light is when there is the divide between the logical and physical database design and the data entered into the database acted to “specify a layer of design.” This type of data is called referential data and is commonly referred to as “code tables, lookup tables or domain values.” Referential data typically contains a “code” which is the  primary key and a description. Reference data has many important properties that other types of data do not have. One property would be that a “code” value usually has a definition.  The author defines reference data as “any kind of data that is used solely to categorize other data found in a database or solely for relating data in a database to information beyond the boundaries of the enterprise.” One way that referential data specifies database design is by “effectively replacing attributes in entities.” He says one of the biggest problems with referential data design is failure to assign definitions to data values. This leads to the problem of the divide between logical and physical models. Data models and databases are full of reference data tables and business users are usually left to deal with data values which are not found in the data model and are needed to understand the database. The author concludes that there needs to be better tools and techniques to deal with this problem.


This article relates to this week topic since we are transitioning from logical to physical database design. I don’t know much about referential data, but the first thing that came to mind was referential integrity. I’m not sure whether referential integrity and referential data have any correlation with each other. Besides that, I think that the author addressed an on going problem that has not been solved. The divide between physical and logical databases has it pros, but after reading the article there is one big problems which is referential data. Since this article was written in 2007, I wonder whether this is still a problem that arises from physical and logical design. I assume that some type of tool or technology has been created since then to deal with this problem.


Chisholm, M. (2007). Data models are not database design. Information Management,17(10), 45-45. Retrieved from