Understanding Physical Database Infrastructure{1}


Data warehouse experts are responsible for creating a bridge between business requirements for information and technological resources at hand to better process big data. In order to make this happen, database architects and business analysts meet to translate the company’s abstract data into logical models that colleagues can easily make sense of. In the article, “Building a Best-Fit Data Warehouse: Why Understanding Physical Database Structure Matters”, John O’Brien identifies key issues that challenge the CPU power, disk space and connectivity balances of a typical corporate network in processing big data. The article also introduced four architecture models that data architects use to structuralize business intelligence.

As most data warehouse architects know, their challenge is to provide enough disk space, processing power, and network bandwidth. Depending on the amount of business intelligence (BI) workloads from an operational, management and historical analytics standpoint the company receives, data architects need to understand which of the four models will help the data warehouse operate most efficiently. Logically, a data warehouse would operate most efficiently in a “best-fit” manner when the BI workload and logical data models are supported by the appropriate physical database architecture. The four models introduced were:

1) Symmetric Multi-Processing (SMP) Model: A system with multiple CPUs to provide scalable and processing capability with external storage servers that’s usually connected over the network.

2) Cluster: Multiple servers attached to storage while all components operate as a single virtual server.

3) Massively Parallel Processing (MPP) Model: A system with multiple CPUS directly attached to the storage. The system operates as a coordinated and independent component.

4) Grid: Collection of different computers whose resources are called upon to work in parallel on complex issues such as massively distributed data sets.

Data warehouse architects need to determine physical architecture best facilitates these three types of BI:

1) Operational BI: Information and insights delivered to a broad range of users within minutes to hours for the purpose of efficiently managing time operational or time-sensitive business processes.

2)Management BI: Information services required by strategic decision makers ranging from dashboards to dynamic reports with “drill-down” research capabilities.

3) Historical Analytics BI: The practice of analyzing data that has been collected over a period of time.

Data volume, number of users, query types, timing of access and latent reports will change depending on which BI it comes from. The data architects would need to know which infrastructure best accommodates the given information.

I chose this article to talk about because it further expanded the importance of what we learned in the past week. Physical database designs represent the tier of designs before infrastructure implementation of actual models. I was intrigued to learn that physical database designs may translate into different physical models. I never knew that database architects had the versatility to design and implement different models of infrastructure to how they see fit. I always thought they managed one single infrastructure by making sure that the company’s data would “fit” into the systems that are already there. Nevertheless, database design and infrastructure require an understanding of both the internal algorithms and external architecture.

Citation:

Dash, D. (2011). Automated physical design: A combinatorial optimization approach. Carnegie Mellon University). ProQuest Dissertations and Theses, , 177. Retrieved from http://search.proquest.com/docview/859949966?accountid=10357. (859949966).