Analyzing Geographical Data with Relational Databases{Comments Off on Analyzing Geographical Data with Relational Databases}

by Joe C

This article talks about the use of relational databases to analyze the geographical data between multiple profiles. Many databases are currently not utilizing this type of information correctly, as they do not have the correct software needed. With the right software, it can potentially calculate customer base density of an area, the distance between multiple customers, and other spatial information analysis. To start off, geocoding data is data on the locational points on earth. When customers provide this information, the database will know exactly where each customer is coming from. By cross-referencing this data between multiple customers, users can figure out multitudes of information through both simple and complex algorithms. Examples of geocoding data customers usually provide include street address, city, ZIP, and state. Latitude/Longitude coordinates may be given directly by the customers or retrieved with the previously listed data. As for relational databases, there needs to be unified formats such as choosing between degree-minute-second or decimal degree formats. In addition, performance becomes a great issue once the amount of data scales to large numbers so that even the simplest calculations being iterated millions of times becomes a great workload to watch out for.



This reading relates to our first chapter of reading on the topic of relational databases. Relational databases are multiple entries that share similar fields, in which this case we are talking about the geography field. These fields are linked up and can show how many data points are exactly matching or also how many unique inputs there are. I believe that using these relational databases to calculate spatial information can be extremely useful. It can provide relevant and valuable information that you would otherwise have no way to retrieve (customers wouldn’t be able to tell you information about themselves compared to other customers, only you can calculate it yourself). By having the geographical data of each customer, you can form algorithms to calculate many additional fields such as how many customers you have per area, which areas have the most customers, which location needs a more central warehouse to reduce shipping expenses, etc. All of those just listed are pieces of information that each individual customer would not be able to give to you, but through the small pieces of data each customer gives and is put into the relational database, you are able to figure out this new piece of info.

On the other hand, I agree that this all seems simple in the beginning. You feel like you can make all sorts of formulas and figure out all sorts of new information. While this is true, it comes at a price. Calculating a simple formula may be done in a split second. However, when you have millions of data on all your customers in your database and you have to do these calculations multiple times, it takes a whole lot of processing power. I’ve had this experience with Excel when copying a formula over a field where it has to look something up 10,000 times and Excel would just freeze for minutes performing all those calculations. Therefore, you have to either have the money to buy all this processing power, or utilize the simplest yet most efficient formulas to calculate and figure out the information you need. I really like this concept of using geographical information because it falls in to the idea of creation of information. It’s what engineers/accountants do. You take 1+1 and you make it not into 2, but into whatever you need it to be. With this, you take inputs of geographical information, and you analyze it into providing much more than just that.



Wang, J. (2009). Geocoding data analysis and processing in relational databases. Communications of the IIMA, 9(3), 81-89. Retrieved from