Performance

SQL Performance Tips {4}

by Andrew H
I read an article by Craig Mullins called, SQL Performance Basics. I chose the article because I was searching for SQL statement articles and it was closer to the top and the fact that it has to do with performance caught my eye. The article states that as much as 75% of poor relational performance is caused by poorly coded SQL. The article goes through a few different ways to help increase performance as well as how to avoid coding SQL poorly. As the author says with proper training and experience coding it in an efficient way is not hard but can still be an elusive task for many coders. Some of his simple tips to create an efficient database are, to think relationally, code join instead of implementing cursors for every table that is going to be read, always provide only the exact columns that you need to retrieve, and filter data using a where clause. To always provide the exact columns that you need the author says to use a SELECT statement in SQL but don’t have use SELECT* as this is a bad practice for programs because that statement will retrieve all columns from the tables being accessed. The WHERE clause is important because you should filter what data you need out before bringing it in to your program, as this will greatly increase the speed of your application. read more...

SQL Performance Tips You Can Use Now {1}

by Hieu H
We have all learned throughout the course that the database can affect our applications’ overall performances. As we progress in our knowledge of databases and SQL, we will undoubtedly be exposed to more advanced queries and techniques. However, there are some “basic” techniques that we can use to improve relational database performance now. Cullins points out that “up to 75% of poor relational performance” is caused by inefficient SQL queries and application code. By following some simple rules, we can eliminate some of these issues. The first step is to use JOIN queries instead of cursors for every table. JOINs will return only what we need from the database. The next step is to stop using SELECT * in our queries.  Instead of selecting every column in a table, we should select only the required columns. Another trick to improve performance is to use the WHERE clause. This helps to filter out and return only the desired information. read more...

Design and Performance of Databases {Comments Off on Design and Performance of Databases}

by Eric C
When it comes to building databases, performance is perhaps the top priority for database engineers. Speed and efficiency is the key when it comes to gathering information using queries for end users. There are various ways to improve the efficiency of databases and that includes the incorporation of vertical and horizontal partitioning. In a peer reviewed article written by Agrawal, Narasayya, and Yang, entitled “Integrating Vertical and Horizontal Partitioning into Automated Physical Database Design,” the authors discuss how vertical and horizontal partitioning can both improve the performance and manageability of databases. Horizontal partitioning takes the rows of tables that have common values and puts them together into one or more tables. Vertical partitioning is the same concept, except using the columns of several tables. Furthermore, there are two main methods of partitioning, as mentioned in the article: hash and range. Hash involves the distribution of rows evenly across different tables with the use of a hash key. Range involves the partition of minimum and maximum values of data in many columns. In order to make the database easier to manage with these methods, it is required to have the indexes and tables aligned. An index is aligned when the index uses the same partitioning technique as the tables. However, achieving such tasks is complex and therefore the physical design is very important. For example, there are many ways to horizontally and vertically partition a database, as well as the alignment and as a result, choosing the correct design is critical. If such designs are not implemented correctly, that database could perform slower or even cause lockups when running queries. Once correctly implemented, the database can receive an improvement of up to 20% in running queries. read more...

Using GPUs for Database Processing {5}

by Eric C
When designing a physical database, there are many variables to take into consideration that will impact the overall success of a database. Perhaps one of the most important is optimal performance. In the peer-reviewed journal written by Chang, Sheu, Yuan, and Hsu, they discuss how GPUs are used to increase the performance of a database. Since modern GPUs typically have more computing power than regular CPUs, it is beneficial to harness the power of GPUs. Using GPUs in conjunction with regular CPUs require the use of a different programming architecture, called CUDA (Computer Unified Device Architecture). Developed by NVIDIA, CUDA is a parallel computing platform that results in faster rendering and computation of complex and intensive tasks. The benefits of using CUDA include faster scattered reads from memory, higher bandwidth using fast shared memory, and bitwise operations. However, in order to incorporate the parallel computing of GPUs with databases, the database has to be specifically designed to work with it. Using a database structure called in memory database (IMDB), which runs the database on the main memory of the server, instead of a disk-drive, allows the computing of GPUs. In this case, the database will be run in the GPU’s memory with the help of CUDA. The database itself must use 2D column-major arrays instead of the typical tree structure (B-Tree) to manage the index. The performance gain, as noted by the authors, is faster by two times when running SQL operations on the GPU. read more...

mySQL, NoSQL, and now NewSQL? {4}

by Eric C
In today’s fast paced world with data growing at an exponential rate, a database must be scalable and perform well with today’s demands in storing data. SQL databases have been around for decades and the basic architecture wasn’t created with scalability in mind. Apparently there have been new advances in database technology and it includes instances of NoSQL and NewSQL. Michael Stonebraker is a seasoned database creator who is now a chief technology officer for VoltDB, explained the benefits of using “NewSQL” to better benefit today’s demands for database performance. Stonebraker stated that traditional SQL systems have many limitations and that includes performance. SQL databases are also not scalable onto more than one server. If one were to make an SQL database scale onto more servers for better performance, it would be very complicated to manage. NoSQL was indeed created to improve on scalability and is increasing in popularity; it also has its own limitations as well. The main problem with NoSQL is that it cannot perform complicated mathematical queries. However with NewSQL, it improves on all of the issues with SQL and NewSQL, making it a more efficient database system that can process requests faster and can scale to more than one server. According to Joab Jackson, the author of the article entitled “’NewSQL’ Could Combine the Best of SQL and NoSQL” from PCWorld, using NewSQL “can execute transactions 45 times faster than a typical relational database system” and “can scale across 39 servers, and handle up to 1.6 million transactions per second across 300 CPU cores” (Jackson). read more...

Oracle Knows NoSQL {3}

by Ermie C
Joab Jackson article is about the Oracle’s launching of NoSQL database.  The software is now available for download on their Oracle website and the reason for them doing this is that they want to respond to the many creations of database tools over the years.  Now, with the development at it’s end, they will be providing it to the community.  In their community of Oracle users, they say that the implementation would help the databases of Oracle users.  It would take the non-important tasks and create a separate database for that type of information.  With this new invention, it would help decrease crashes because that was one of their problems.  NoSQL is based of the Java version created in Berkeley and it is using a very simple Key-Value data model that simplifies the input of data.  With these multiple records that could inputted into the database, NoSQL has made it a lot more flexible in order to keep load-balance at a consistent speed.  Oracle has recently been providing it for free as a community edition, but for more features, there will be paid editions in order to access those features. read more...

SQL Performance Basics {Comments Off on SQL Performance Basics}

by Toan T
Creating a database using any application is one easy task, but how efficient the database is is a whole different story. Assuring optimal performance of database applications reply on properly coded SQL statements. Poor written application code is the cause of most performance problems. And in most cases, as much as 75% of performance is caused from bad SQL and application code. It is impossible to address all the performance issues in the database environment and writting SQL properly is also a challenge for most programmers who are new to relational database. This article will explain and emphasize on some of the SQL basics that one can utilize when working with any databases. The first step is to think relationally. Code join instead of using cursors for every table and reading each of them like they were individual files. The DBMS is optimized with join and will return only the data that is needed. The next step is to always provide exact columns that is needed to be retrieved in the SELECT-list of each SELECT statement. Another way of stating is do not use SELECT *. The statement means retrieve all columns from the tables being accessed. This method is only usable for quick queries but is bad practice for inclusion in actual application programs. The next rule is to use WHERE clause to filter data in the SQL instead of bringing it all into your program for filtering. This mistake most often associated with new programmers. It is actually much better to actually letting the DBMS filter the data before returning it to your program since I/O and CPU resources  are required to obtain each row of data. So the fewer rows get passed, the more your SQL will be. read more...

Indexing SQL Server Relational Databases For Performance {Comments Off on Indexing SQL Server Relational Databases For Performance}

by Toan T
In a relational database, indexing is used to retrieve data quickly from a system. With indexing, data is sorted in a way that made it relatively easy for users to search for which is very similar to an index in a book. In this article, we will be talk about some issues that are often overlooked with regards to indexes. Generally, having more indexes means that quickly you will be able to retrieve the data. However, large amount of indexes can also increase the time it will take to insert new data. In Microsoft SQL Server, there are two supported indexes: clustered and nonclustered. Clustered indexes mean that the data are already physically ordered while nonclustered indexes contain pointers in the data rows that typically contain a reference to the location in the clustered index if it exist. Indexes are often automatically created by the database to ensure that no duplicate data can be inserted. In MS SQL Server, 256 nonclustered indexes can be created per table. That might seems to be a lot but it is no advisable to do so because indexes require additional storage space in both memory and on the disk drive. They also lead to a decrease in performance when performing insert statements because the data needs to be inserted according to the indexes instead of the first available space which requires more time to complete. read more...

Denormalization: Intermediate Step {5}

by Jasmine C
The article I read about is very informative.  A quick synopsis of the article is that it discusses a lot of information regarding the techniques of denoralization and the pros and cons of normalization vs denoralization.  Today, normalization is the way to designing a relational database.  However, the biggest disadvantage of normalization is that system performance is very poor.  With normalization, data is organized so that there is minimal updating and data is easily accessible.   At the moment, denormalization techniques do not have concrete guidelines to guide the process.   However, denormalization shows a positive effect on a databases’s performance.   It has been proposed that denormalization be used, in addition to normalization, to play as a middle step to help with system performance .  This article describes three approaches that are used to review the donormalization strategies.  Each of the approaches shows how denormalization positively affects databases. read more...

Why You Want To Be Good At Data Modeling {1}

by Toan T

A good database is something every organization must have when managing large sets of data. However, building a good database is now no longer consider to be a main priority because time and cost during the development life cycle have become the major factors for a lot of companies such that database design of today no longer focus on the performance aspect but to rather have something that just “work”. This article addresses some of the flaws that exist and are often overlooked as well as focusing on the core problem that relates to poor performing databases. Building a good database is not something that can be simply created through coding but it something takes huge amount of time and effort as well as knowledge to lay down the proper foundation for the design. There was a myth that that many developers often discuss about was that database’s performance is derived from the code that it is written from. This is somewhat untrue to because there are other things that are performance related. Good code does have an impact in performance but the core issue that is most important is the physical design of the database. No matter how well the codes were written or how fast the hardware it is running on, performance constraints will always exist when the overall physical design is poorly implemented. Developers should thoroughly monitor the database’s performance and focus more on the logical aspects of designing because there must be something wrong when it takes 40 minutes to generate a report. read more...