data quality

Compromised Data Quality by Malware {Comments Off on Compromised Data Quality by Malware}

by Eric C
In an article from PCWorld entitled “Symantec warns of malware targeting SQL databases,” there has been a spread of malware infecting SQL databases around the world. Although not a serious threat, it could pose to destroy data quality within the database. Originally targeted to Iran, the malware called W32.Narilam, looks for Microsoft SQL databases on the infected server. If Microsoft SQL is found, the malware then finds specific keywords from a file, such as account and financial bond, and then replaces those keywords with random characters. Database administrators who do not make frequent backups of the database will have corrupted data and the loss of data integrity, which could prove disastrous for customers’ data, especially in a banking database. read more...

Dimensions of Data Quality {1}

by Kathy S
The author of this article starts off by introducing the idea of “dimensions”, such as accuracy, consistency and timeliness and asks if these “dimensions” actually exists as intelligible concepts?  The author believes a strong case can be made that we are not thinking as clearly as we can be in this area, and that there is room for improvement. He then asks where does the term “dimension” come from when talking about data quality? In context of data quality, dimension is used as an analogy. The term gives the impression that data quality is as concrete as a solid object and that the dimensions of data quality can be measured. In data quality, the term dimension could be used interchangeably with criterion, a standard of judgment. Since data is immaterial, stating that the dimensions can be measured is an astonishing claim. The author then asks, are the dimensions credible? The more “duplication” there is in a list alongside “completeness” and “consistency”, the lower data quality likely it is, while the more completeness there is the higher data quality is. Therefore, the inclusion of “duplication” in a list of dimensions of data quality immediately creates lack of consistency in the list. A much more serious problem is that there seems to be no common agreement on what the dimensions of data quality actually are. Lastly, the author asks, are the dimensions over-abstractions? A worry is that each dimension is not a single concept, but is either a collection of disparate concepts or a generalization. read more...

Data Quality Information {1}

by Ming X
The article I read is called the impact of experience and time on the use of data quality information in decision making. “Data Quality Information (DQI) is metadata that can be included with data to provide the user with information regarding the quality of that data.” The article focuses on how the experience of the decision maker and the available processing time influence the use of DQI in decision making. Chengalur-Smith et al. (1999) define data quality information (DQI) to be metadata that addresses the data’s quality. Chengalur-Smith, Ballou and Pazer (1998) explored the consequence of informing decision-makers about the quality of their data. Their project studied two formats of data quality information (DQI), two decision strategies, and both simple and complex levels of decision complexity. Their study found variations in the amount of influence across research design.Organizations wishing to begin a program of using DQI should be aware of the fact that there was a lack of consensus when experts were presented with DQI. Organizations can predict that the addition of information about data quality to a database is likely to change the decision made, but it cannot predict what that new decision may be. read more...

Quality Data and Managing It {1}

by Katheryn T
The article I chose to write about was called “Managing Data Source Quality for Data Warehouse in Manufacturing Services”. This article spoke about the standard and quality management system that needs to take place in order to comply with the International Organization for Standardization (ISO).  First there has to be a quality management system for the data source. This is a process in which a model is put in place that has “several steps to ensure optimal data quality and is enriched with data quality management steps in order to fulfill customer requirements”(Idris & Ahmad, 2011).  Human resources, technical infrastructure, and work environment are needed to process and make sure the DQM is done correctly. The attributes of high quality data are described as accurate  reliable, important, consistent, precise, understandable, and useful. One of the problems with data quality can come from a lack of understanding of the origin of the data. When people manage the data, they need to be able to identify it correctly and need to be able to understand what kind of data it is and how to work with it. Data source quality can be improved by working with the data owner, determining the cause of the data quality, and correct the data source. read more...

The Importance of Data Quality and the Measures Taken {Comments Off on The Importance of Data Quality and the Measures Taken}

by Andrew S
In the article that I read this week, the author talked about the importance of data quality and data source management in a data warehouse project.  Many data warehouse projects fail due to the poor quality of the data, but this article explains that quality characteristics form the backbone of quality management.  The article goes in depth for the five activities that are used in ensure proper quality assurance: quality policy, quality planning, quality control, quality assurance, and quality improvement.  These are the five activities used for quality management that the author goes into detail with in order to identify and understand each quality.  There is a proper procedure that must be taken in managing data source to find the best way to provide a framework and implement tool to reach the goals and objectives within a company. read more...

Data Quality {6}

by David H
This article talks about how data quality is mundane and the issue how data quality affects to company. As we know working with data is very important.  Lou Gerstner mention that “Inside IBM, we talk about 10 times more connected people, 100 times more network speed, 1,000 times more devices and a million times more data.” This shows that as technology growth, we involve a lot with data. The poor data cost the company a lot. The author mentions that “it cost 10% to 20% of revenue to company”. In addition, author also mentions that there is some unfolding disaster. The most problems it comes from human mistakes such as input wrong data. For example, “incorrect prices on Amazon.com, where a 1GB memory module normally listed at $999.99 was on sale at Amazon.com for $19.99; hotel rooms at W Hotels sold for $59 instead of $259; and United Airline tickets selling for $5.” This shows that, because of poor quality data it makes image of company will be bad and customer will be upset. In addition, the company has difficult for making decision and implement new technologies. The other issue of poor data quality that company has is to link data of customer with different divisions, so they can do analysis and offer promotion or deals to customers. However, the data is simply unfit for doing so. The various divisions employ different data formats, they model customers differently and the data is erred, making linkage impossible. read more...

Zoomix in your SQL software. {1}

by Kevin Q
My article was about Microsoft acquiring Zoomix, which was a small company based out of Jerusalem. Zoomix is known for there accelerator data-quality technology. Microsoft plans to use the data-quality technology within its future releases of SQL Server Database Software. According to Zoomix’s website, “Accelerator software combines semantic and linguistic analysis with machine learning to classify, match and standardize complex corporate data.”(Montabalno) The Zoomix team will join Microsoft at the Isreali headquarters to help integrate their technology. This software technology will help ensure that, while data mining through multiple system , you will be returned with the most accurate data. Vendors have recently startede to integrate data-quality cpapbility in their database softwares, and microsoft recognizes its importance as well. Microsoft plans to add this feature in order to keep up with competing software products, as well as bring a nice feature to an already widely used software. read more...