Mongo DB (No SQL database for web)

by John J
Today’s highly social and interactive web has created a market for a database management system with the ability to offer fast real time access over the Internet while managing massive data sets that are growing by the minute in volume and complexity. MongoDB fills this need. As I will explain later in this blog, MongoDB is not the perfect solution for every project, but for certain tasks that are within it’s niche, it is the best solution.

MongoDB is a NoSQL or non-Relational Database Management System that uses a document-oriented storage format. Other storage formats used in the non-relational class of databases include graph, key-value store, multi value, object, RDF, tabular, and tuple store. Since we’ve all (presumably) taken a databases course already, I would like to explain how MongoDB works in the context of relational databases.

MongoDB uses a document, which can be thought of like a row in an SQL table and a collection, which is like the whole table itself. Everything between the curly braces {} is called the document. A single collection can contain millions or more of documents. A document resembles a JSON object and looks like the picture below:

(Image obtained from docs.mongodb.org/manual/reference/sql-comparison)

Since it is not a relational database, MongoDB does not enforce a schema. In a relational database, a schema would be the various column headings of a table. Every record in that table would have to have the same fields and thus the same schema. In MongoDB, each document in the collection can have a different number of fields with different data types and you can also have documents nested in documents.

(Image obtained from docs.mongodb.org/manual/core/document)

Now let’s take a look at how MongoDB statements compare to SQL statements. Here is the basic command to make a new data entry:

(Image obtained from docs.mongdb.org/manual/reference/sql-comparison)

Now let’s take a look at some basic queries:

(Image obtained from docs.mongdb.org/manual/reference/sql-comparison)

As you can see, it’s not too steep a learning curve for MongoDB statements, especially if you’ve got some experience with SQL. It’s the same logic, just slightly different syntax.

MongoDB is best for Big Data environments. “Big Data refers to the massive growth in the volume, variety and velocity of data being produced and the set of applications that generate, store, process and monetize this data.” (10gen., 2013) Sites like craigslist, intuit, Disney and foursquare have Big Data environments. Craigslist moved over two billion documents to MongoDB. They used to use MySQL and an ALTER TABLE statement “took months” to finish running on their archive which caused performance loss on their live database during execution of that statement. (10gen, 2013)

Content Management is another area where MongoDB shines and MTV uses it for exactly that purpose. MongoDB is exceptionally well suited for a site like MTV because of it’s multimedia saturated environment. The GridFS technology in MongoDB allows MTV to “store and serve rich media such as video, images and audio in the database itself.” (10gen, 2013) GridFS allows the storage and retrieval of documents that exceed 16MB size limit. It does this by breaking the large file down into many small chunks and when it is queried “the driver or client will reassemble the chunks as needed.” (mongoDB.org, 2013)

The location-based social networking site foursquare moved over to MongoDB for two reasons: it’s built in auto-sharding capability and geospatial indexing support. Sharding is something that invariably happens when you have a lot of data. It describes a situation where data fills a server and needs to continue on to another server (possibly breaking across a collection or even a single document). In relational databases, “which were designed to run on just one machine” (Finley, 2013) this usually required the writing of custom code to manage this, but MongoDB does this automatically.

Since foursquare’s entire business model relies location-based data, the fact that MongoDB supports geospatial indexing made them the perfect fit. MongoDB does this by recursively dividing a map into quadrants where quadrant looks like:

(Image obtained from docs.mongodb.org/manual/core/geospatial-indexes/)

It then concatenates these ones and zeros to make the hash identifier which will specify the exact location. The more bits in the hash identifier, the greater the accuracy of the location.

Disney Interactive Media Group uses MongoDB for user data management. Disney was having problems with their existing MySQL database in regards to performance and scalability. As a result, their game developers spent more time trying to develop their own database management system then they did actually developing games. Disney has many games with users bases of varying size on any given day. When a game suddenly becomes really popular, they wanted the ability to be able to change how they modeled their data as a particular game grew and continued to be developed. The flexible schema of MongoDB offers exactly that.

All this speed and flexibility comes at a price. “Security was not a primary concern of MongoDB’s designers” (Okman, 2011, p. 546) and to achieve the speed and scalability they desired they had to make some trade offs. They decided to “trade consistency and security for performance and scalability.” (Okman, 2011, p. 541) In a paper titled Security Issues in NoSQL Databases the author points out seven different security vulnerabilities in MongoDB. I will mention two of them, the first of which being that “Mongo data-files are unencrypted and Mongo doesn’t provide a method to automatically encrypt these files.” (Okman, 2011, p. 546) The second security hole talks about the potential for injection attacks: “Mongo heavily utilizes JavaScript as an internal scripting language… because JavaScript is an interpreted language, there is a potential for injection attacks.” (Okman, 2011, p. 546)

The sacrifices MongoDB makes in security and consistency have certainly paid off. The authors of MongoDB vs Oracle – database comparison ran some tests and produced some vary interesting data. The following insert function was used for this test:

As you can see, MongoDB drastically outperformed an Oracle database, especially when working with an extremely large number of records. The results of a similar test using update was equally impressive:

In conclusion if speed of retrieval and flexibility in data representation aren’t key to a given project you might work on in the future and things like security and accuracy of data are more of what you’re looking for, then MongoDB is not the product for you. However, if your project is an on-line game, a website with personalized content and/or real-time updated data or anything where performance and/or evolving data representation is the primary concern, than MongoDB is what you’re looking for. If the data has to be instantly and completely accurate and/or security is a primary concern, stick with the tried and true SQL relational databases.

References

Boicea, A., Radulescu, F., & Agapin, L. I. (2012, September). MongoDB vs Oracle–Database Comparison. In Emerging Intelligent Data and Web Technologies (EIDWT), 2012 Third International Conference on (pp. 330-335). IEEE.

Finley, K. (2013, March 19). NoSQL Database MongoDB Reaches Beyond Software Coders | Wired Enterprise | Wired.com. Retrieved April 24, 2013, from http://www.wired.com/wiredenterprise/2013/03/mongodb-enterprise/

MongoDB. (2013, April 24). Retrieved from http://www.mongodb.org/

Identity management systems and MongoDB | 10gen. (2013, April 24). Retrieved from http://www.10gen.com/solutions/user-data-management

Okman, L., Gal-Oz, N., Gonen, Y., Gudes, E., & Abramov, J. (2011, November). Security issues in nosql databases. In Trust, Security and Privacy in Computing and Communications (TrustCom), 2011 IEEE 10th International Conference on (pp. 541-547). IEEE.

6 thoughts on “Mongo DB (No SQL database for web)”

  1. Interesting blog about using a database considering speed vs security. Would having two databases of both MySQL (security and main database) and MongaDB (for immediate storing and constant data retrievals) in a project be possible? Rather than having one slow, high security and one fast, low security, how about a middle option?

    1. Hi Jenifer, I’m glad you enjoyed the blog! Conceptually, I see no reason why a single project could not be running SQL database on some servers for the more sensitive data and MongoDB on other servers for the less sensitive data. The second question is more difficult because there are a ton of database products out there today and quite possibly one of them achieves the nice balance of security and speed that you mention. That being said, 10gen is continually working on MongoDB with security patches and updates in newer versions.

  2. I found your blog extremely fascinating. As you pointed out, most, if not all of us have been exposed to relational databases and not non-relational databases. I found it interesting that non-relational databases do not require a schema. I suppose this would allow the database to be more flexible if any changes were to be made to the database’s structure. This has definitely opened my eyes to exploring the capabilities of other non-relational databases and their use of key/value storage.

  3. As Emily stated, it’s a very interesting concept to consider the idea of non-relational databases. I’m actually having difficulty wrapping my mind around what makes it non-relational? My meager understanding based on the 15 or so minutes of your presentation and reading over your blog post has failed to yield an understanding beyond that in a way, it is a relational database, but with one dimension.

    Anyway, do you have any simplified insight on what distinguishes a non-relational database?

    1. Non-relational databases store all their records in a single structure which, in the case of MongoDB, is a collection. Relational databases normalize data to avoid redundancy, and they do it by relating a field in one table to a field in another (recall Primary Key, Foreign Key). A non-relational database just stores data redundantly. A NoSQL database like MongoDB can mimic that data model with inter document references or even inter collection references called DBRefs but, the key difference is that it is not restricted to that model.

  4. Very good article regarding NoSQL. I’ve been hearing a lot about it for the past year or so. Many people have been claiming that NoSQL is the future and that relational databases are going away (to be replaced by a combination of other things). Good to know how quick everything can be, but I can see the drawbacks you listed. Very well written

Comments are closed.