Data Warehouse

Quality Data and Managing It {1}

The article I chose to write about was called “Managing Data Source Quality for Data Warehouse in Manufacturing Services”. This article spoke about the standard and quality management system that needs to take place in order to comply with the International Organization for Standardization (ISO).  First there has to be a quality management system for the data source. This is a process in which a model is put in place that has “several steps to ensure optimal data quality and is enriched with data quality management steps in order to fulfill customer requirements”(Idris & Ahmad, 2011).  Human resources, technical infrastructure, and work environment are needed to process and make sure the DQM is done correctly. The attributes of high quality data are described as accurate  reliable, important, consistent, precise, understandable, and useful. One of the problems with data quality can come from a lack of understanding of the origin of the data. When people manage the data, they need to be able to identify it correctly and need to be able to understand what kind of data it is and how to work with it. Data source quality can be improved by working with the data owner, determining the cause of the data quality, and correct the data source.

read more...

Getting Top Quality Data for your Database {1}

The article I chose for this week is titled “Managing Data Source quality for data warehouse in manufacturing services”. The main topic of the article is focused around data quality. According to the article, one of the primary success factors of a data warehouse is the quality of the data. There are a few downsides from getting low quality data. Often times someone will have to go back into the database and fix the mistakes. This can take a lot of time and effort which could better be spent on other projects. Another flaw that comes from having bad data is that your data analysis will be all wrong. Database analysts will end up wasting even more time reviewing the data again after it has been fixed. The article mentions a few ways to reduce the amount of low quality data going into your database. Two methods of doing this are Total Data Quality Management and Quality Management System requirements. From what I understand, these are guidelines for collecting and inputting data that helps limit the amount of low quality data.

read more...

Amazon’s Redshift {1}

Cloud-based hosted data warehousing services are gaining in popularity. The primary drivers for this movement are that older enterprise warehouse data systems are expensive and difficult to maintain. Amazon looks to fill those void’s with its new hosted data warehouse service Redshift. What makes it unique is that it’s about a tenth of the cost of regular data warehouses and it automates deployment and maintenance. It is also compatible with many popular business intelligence tools, so people will not have to spend resources to learn new tools. Since Redshift runs off of Amazon’s AWS service, it gets the added benefits of massive failover and redundancy clusters. Customers will not have to worry about data management as it’s already taken care of by Amazon.

read more...

Oracle’s new Finance Data Warehouse {Comments Off on Oracle’s new Finance Data Warehouse}

Oracle begun offering a data warehouse for financial service industry. Oracle claimed that this data warehouse would be more geared towards the needs of the financial environment. This warehouse is specialized for financial organizations by making it easier to store financial data, generate reports, manage metadata and carry out any other financial data needs. Oracle developed this data warehouse for 15 years using a financial services data model so it can be used for analysis, testing, reporting and possible risks.

read more...

BAM: A Real-Time BPM System {Comments Off on BAM: A Real-Time BPM System}

At the 2008 3rd ICCIT International Conference, Jin Gu Kang and Kwan Hee Han, proposed a business activity management (BAM) system in the article (2008) “A Business Activity Monitoring System Supporting Real-Time Business Performance Management.” The authors proposed BAM system design and prototype were implemented at a global automotive company. This real-world case scenario explicitly shows their BAM framework applied as a real-time business performance management system. Han and Kang advise that once the structure of the enterprise information system (EIS) of the organization had been thoroughly examined, the BAM system was categorized into the OLAP/analytical processing system. The authors then include the four step procedure in designing the BAM system. The first step is to select and define the monitoring objects from which performance is measured in real-time. In this case, it includes the key performance indicator (KPI) current sales inventory which will aid in determining the company’s operational efficiency. Also, the authors monitor the business process of equipment management in order to have real-time information on statuses of equipment failures. For step two, the conceptual design of the dashboard is created. Business and or technical events are defined in step three in order to capture the trend and status of the KPIs selected in step one. Finally, step four of the design procedure defines how data is extracted for event processing and how it will be displayed on the BAM system’s dashboard. The prototype was then implemented with the following commercial solutions: Oracle BAM (BAM type), Oracle Database 11g (database type), WebMethods (EAI tool), and Java (programming language for UI). Han and Kang include the results of their BAM system with these two dashboard screenshots that cover the KPI status and the business process status for equipment problem management.

read more...

Application of Web Data Mining and Data Warehouse in E-Commerce {Comments Off on Application of Web Data Mining and Data Warehouse in E-Commerce}

For this week’s blog assignment, I chose an article, titled “Application of Web Data Mining and Data Warehouse in E-Commerce”.  The authors provide overviews of data warehouse and how it is used in E-Commerce environments.  According to the authors, W. H. Inmon, who is considered to be the founder of data warehouse, states that data warehouse is defined as “data collection which is subject-oriented, integrated, and non-volatile and time variant and it is used to support for management decision”.  In the data warehouse, the data is organized to be specific on each subject. Once the original dispersed data are collected and cleaned, the refined data are stored in data warehouse in consistent manners.  The data in data warehouse are rarely deleted or modified even though they are updated in real-time.  The data in data warehouse store historical information and there qualitative analysis allows users to forecast the future tendency.  The authors give examples using customer management modules, which are commonly used in E-Commerce, to demonstrate how data warehouse is used.  In the example, modules are divided into their specific purposes, however; they are set up in a way to provide comprehensive understanding of the customers.

read more...

A Cloud-Based Data Warehouse Service from Treasure Data {2}

The author of this article focused on the cloud-based data warehouse company, Treasure Data. The company received $1.5 million in funding that includes an investment from Yukihiro “Matz” Matsumoto — the creator of the Ruby programming language. Treasure Data has developed a service that brings high-end analysis to businesses that don’t have the resources to afford a solutions from major companies like IBM, Oracle or Teradata. According to the CEO of Treasure Data, Hiro Yoshikawa, the total cost of ownership for a data warehouse suite from one of the enterprise players can cost as much as $5 million. Treasure Data is a subscription service that, at low end, costs $1,500 per month or $1,200 per month with a 12 month commitment. Yoshikawa says that on average the cost over time is more than 10 times less than what an enterprise data warehouse offering would cost. Treasure data has more than 10 customers that include “Fortunes 500″ companies and it has more than 100 billion records stored and is processing 10,000 messages per second. Also, Treasure Data borrows from Hadoop but with a twist. Unlike Hadoop, Treasure Data does not require an infrastructure investment.

read more...

Size of Facebook’s Data {4}

The article that I chose to talk about this week is called “How Big Is Facebook’s Data? 2.5 Billion Pieces of Content and 500+ Terabytes Ingested Every Day”, by Josh Constine. The title says it all. Facebook revealed to reporters that their system processes over 2.5 billion pieces of content worth 500+ terabytes of data per day. The author talks about how the company system processes approximately 2.7 billion ‘Like’ actions and 300 million photos per day. The Vice President of Engineering, Jay Parikh, revealed that over 100 petebytes of data are stored in their data warehouse. In order for Facebook to support data-intensive activities and distributed applications, they use a software framework called Apache Hadoop. Hadoop provides very large bandwidth across the cluster and enables applications to process petabytes of data and thousands of independent computers. Parikh said to the reporters that Facebook operates the single largest Hadoop system in the world; one that’s even larger than Yahoo’s.

read more...

Data Warehouses for Educational Analysis {1}

In a peer reviewed journal entitled “Building a Data Warehouse to Analyze Entrance Exams” written by Dr. Kornelije Rabuzin, explains building a data warehouse dedicated to student data for analysis. Using business intelligence and the combination of databases and data warehouses, it is possible to analyze student data for educational purposes. Such analysis will determine, based on an entrance exam administered to high school students, to see if students are ready for college and what materials do they know or do not know. Having such data and information can give administrators a handful of reports to determine what the best steps are for students going to college after high school. Based on the types of information mentioned in the article, it didn’t seem much can be done to analyze data based on entrance exam scores, high school grades, and regular exam scores. However, it was difficult to extract such information from various databases and it took much programming and two months. The data was loaded and organized into a star schema structure. But after loading everything into the new database warehouse, it was very useful to determine the student selection process and targeting high schools for certain skills in students.

read more...

Data Warehousing: Inmon Style and Kimball Style {2}

The article I chose for this week is titled “Best Practices in Data Warehousing to Support Business Initiatives and Needs”. The article talks about two different styles used in data warehousing and lists when you would use one versus the other. The first style is the “Bill Inmon Style” which uses a top-down approach. In contrast, the “Ralph Kimball Style” uses a bottom-up approach. The article uses a major U.S. retail company as an example. This company uses the Inmon style simply because it is what fits well for that particular business. This style of data warehousing demands that you have a third-normal form relational format for your data. The Kimball style, on the other hand, requires that you use a multidimensional style arrangement. In the case of this retail company, the application neutral aspect of the Inmon style made it an easy pick.

read more...