Data Warehousing

IBM’s Data Integration Solution {2}

by Andrew S
The author talks about in the article about how IBM bought Ascential Software for $1.1 billion in order to expand its offering in the market for corporate software.  Ascential makes software that helps companies gather and combine information from different computer sources into one system.  With this purchase, it would make IBM one of the leading companies in the emerging business of data integration.  Data integration is one of the fastest growing trends in the technological world, and acquiring Ascential would fill a hole in IBM’s enterprise software offerings.  It will allow it to offer its corporate customers a unified view of their data, regardless of where that data resides.  The need for data integration is increasing rapidly across every industry and this is just one way IBM is adapting to the situation. read more...

A Cloud-Based Data Warehouse Service from Treasure Data {2}

by Kathy S
The author of this article focused on the cloud-based data warehouse company, Treasure Data. The company received $1.5 million in funding that includes an investment from Yukihiro “Matz” Matsumoto — the creator of the Ruby programming language. Treasure Data has developed a service that brings high-end analysis to businesses that don’t have the resources to afford a solutions from major companies like IBM, Oracle or Teradata. According to the CEO of Treasure Data, Hiro Yoshikawa, the total cost of ownership for a data warehouse suite from one of the enterprise players can cost as much as $5 million. Treasure Data is a subscription service that, at low end, costs $1,500 per month or $1,200 per month with a 12 month commitment. Yoshikawa says that on average the cost over time is more than 10 times less than what an enterprise data warehouse offering would cost. Treasure data has more than 10 customers that include “Fortunes 500” companies and it has more than 100 billion records stored and is processing 10,000 messages per second. Also, Treasure Data borrows from Hadoop but with a twist. Unlike Hadoop, Treasure Data does not require an infrastructure investment. read more...

Another Tool to Help Students Understand Data Warehousing {Comments Off on Another Tool to Help Students Understand Data Warehousing}

by Katheryn T
The article I choose to write on was about a new method of helping students learn about data warehousing better. It is a new tool from the University of California, Sacramento. Since every business is taking advantage of the data mining that is going on, there needs to be well educated people to take care of that data. This article talked about how the courseware developed will help students and beginners understand the beginning phases of data warehousing and the importance of doing it right. There are dimensional models that help students visually see what they are doing. These models can be changed for the progression of the chosen company. This tool helps students learn about their designs and how they need to change with the data. read more...

Hadoop or EDW {1}

by Brian B
The article that I picked with week is called “Big Data Debate: End Near for Data Warehousing?” by Doug Henschen. The article starts off by giving some background to EDW (Enterprise Data Warehouse). It says that while the technology behind EDW is time tested and thoroughly developed it remains rigid and inflexible when you have to go back and make changes to your data model.  It also says that this is often a very time consuming process that often costs a lot of money to plan out and implement if you ever finish actually modeling and developing the system. It then talks about Hadoop, “which lets you store data on a massive scale at low cost (compared with similarly scaled commercial databases) (Henschen, 2012).” The author says that this is an improvement over normal EDW because it allows more flexibility when it comes to making changes down the road. The problem is that it is not as developed as EDW so it can be difficult to find people who have an intimate knowledge of the software. The article then opens up into a debate between Ben Werther (Pro Hadoop) and Scott Gnau (Pro EDW). Werther essentially says that EDW is a dated technology because by the time you push out the model and get everything implemented you have what amounts to a view of the world a year or more ago, which may or may not be applicable to your business needs today, wasting your companies time and resources. Gnau’s argument boils down to the fact that while Hadoop maybe more flexible it does not allow you to have very good control over the data you have collected. He says that with all of that data being un-modeled it will cause issues for analyst’s to view and sort the data, which is why EDW will stick around to make their jobs more manageable. read more...

SQL Packages and Data Warehousing {1}

by Ming X
The article I read is called the development of ordered SQL packages to support data warehousing, by Wilfred Ng and Mark Levene. “Data warehousing is a corporate strategy that needs to integrate information from several sources of separately developed Database Management Systems.” The authors mention that future database management systems should provide adequate facilities to manage a wide range of information arising from such integration. Since the order of data is usually involved in business queries; users can extend the relational model to incorporate partial orderings into data domains and describe the ordered relational model. User can use OSQL, which allows querying over ordered relational databases. Ordered SQL (OSQL) is an extension of the Data Definition Language (DDL) and Data Manipulation Language (DML) of SQL for the ordered relational model. OSQL can be applied to solve various problems that arise in relational DBMSs involving applications of temporal information, incomplete information and fuzzy information under the unifying framework of the ordered relational model. There are three OSQL packages: OSQL_TIME, OSQL_INCOMP and OSQL_FUZZY. read more...

Processing Data for Data Warehouses {5}

by Andrew M
The article I read was entitled “Research on Data Processing of Bank Credit System” by Guorong Xiao. This article was about the processing of date that happens to populate data warehouses. The author specifically talks about how this process works in the banking system. The steps involved in data processing are data extraction, data transformation, data loading process and the design of a data processing model. The author identifies the most important of this process as being the data processing model. This can be broken down into parts which are data source analysis, estimating the amount of data, data extraction design, data transformation, data cleaning, data loading and finally data validation. All of these processes are needed in order to have a fully functional data warehouse. The author goes to talk about how banks are using data warehouses extensively specifically in regards to credit. Banks can now do more detailed credit analysis of a potential customer with the use of data warehouses. read more...

Data Warehousing and the Best Practices for it {2}

by Andrew S
Data Warehousing Practices to Support Business Initiatives and Needs

The article I read was about data warehousing architecture and the practices that are used in businesses and companies.  There are two methods that were mentioned in the article, the Bill Inmon Style and the Ralph Kimball Style.  The article goes into detail to explain the practices of a major U.S. retail company and how they came to choose the Inmon Style.  The Inmon style calls for an atomic-level, third-normal form relational format in which to store extracted and transformed data.  They thought that this method was most useful and applicable to the company.  The author also explores the best practices to use for data warehousing such as data modeling, loading, attributes, and other important factors.  The article concludes informing the reader of the results of these data warehousing practices and how many departments are benefiting from queries and requests for data warehouse data, and it has been a valuable source of data that benefits the entire company. read more...

Best Practices for Data Warehousing {2}

by Robert T
In the peer review article “Best Practices in Data Warehousing to Support Business Initiatives and Needs,” authored by Jeff Lawyer and Shamsul Chowdury, the two authors discuss the importance and decisions certain businesses would have to implement in order to run their business efficiently. The authors illustrate how many companies in the 1990s were having a difficult time adjusting to the success of the many computer applications as well as the blooming of internet use. What was most difficult, according to Chowdury and Lawyer, was choosing which architecture to implement. The two general types of architectures were the Bill Inmon Style and the Ralph Kimball Style. The Inmon Stlye is one that is considered to be application neutral and could be named an enterprise data warehouse. The Kimball Style, however, has data prearranged. The authors also mention that with the “stove-pipes” of data, the cross use of data between businesses was unknown. “Under the Kimball approach, data are arranged in an application- or data-view-specific manner [8]. Under the Inmon approach, data are arranged according to the rules of normalization and remain application-
and data-view-independent [13].”
As for data warehousing growth, most data warehousing initiatives have concluded that there is a continuous need for incremental updates to the data warehouse. The authors suggest to treat the warehouse as an ongoing application. “Keeping your data warehouse team intact after the initial build is very important in order to sustain the capability to react to this need. To paraphrase a popular saying, ‘Data warehousing is not a destination – it is a journey’”. The authors were studying the data warehouse journey of one U.S. retailing company in 1995. The company used their warehouse to store only 80 gigabytes of information. The
80 gigabyte Inmon-style data warehouse was used to select customers for a targeted creditstimulation marketing program. The database has grown to hold nearly 7 terabytes with two hundred tables and two-thousand seven hundred columns. read more...

{Comments Off on }

by Willen L
In this web page article lists a summary of each company and what they’ve done to win the TDWI Data Warehousing Institute Awards for 2011. If you don’t know what these awards are, it is one of the most prestigious awards that highlight best data warehousing practices. The contestants can apply to different categories of their choosing. Some categories are advanced analytics, BI on Limited Budget, Emerging Technologies and Methods, Enterprise BI, Enterprise Data Management Strategies, Enterprise Data Warehousing, Government and Non-profit, Organizational Structures, Performance Management, and Right-Time BI. Here is a list of winners that won in 2011 categorized by category they competed in. If you are interested in learning more about these companies and what they did to obtain these awards here is the link. read more...

HP’s Dabble in the Data Warehousing Business {2}

by Kevin Q

When Mark Hurd became Hewlett-Packard’s President in 2005, he noticed that as a technology company, they were failing in one area internally. They had no central system that collected all their companies data together into what Mr. Hurd liked to call “a single version of the truth.”(Vance 2008) Mark Hurd used to head the teradata division at NCR, where he helped start data warehousing, however now he notice that HP needed some kind of similar system to help itself. After explaining some benefits of data warehousing like noticing trends during certain times of the year and other analysis that can be noticed once data is all collectively pulled together, HP created NeoView internally. NeoView is a data warehouse and business intelligence computer server that would solve and help with HP and Mark Hurds needs. It became available for purchse to the public, now a competitor in the data warehousing market which was dominated by much larger companies like Teradata, IBM, Oracle and Microsoft. HPs sales weren’t impressive, which may have been results of building its systems on expensive older technology, reather than cheaper and newer technology, according to expertes in the field. The NeoView can cost more than 10 million for the whole setup, which is kind of pricey, especially when competitors are using cheaper setups and therefore reducing price to customers. Their entrance into the data warehousing came at a time where companies were beginning to see the importance and advantage of data warehouses, but their approach seems to be a little off. read more...