Database Optimization: Genetics{3}

As the need for databases increases optimization is the next step as a natural form of evolution. Everyone wants better, faster, and more efficient technology including databases. Givon Zirkind is the author of an academic journal which talks about the Optimization of databases that involve genetics. Zirkind writes about, with data storages increasing their memory size at a cheaper cost than before optimization should be easy. Unfortunately, since many programs that are created are not as efficiently coded as they can be and other minor altercations and factors leads to software bloat. According to Zirkind, “Software bloat is when a computer program has so many features, that a user cannot possibly know them all and use them all”. Zirkind writes about a project he did to decrease the amount of bloat and excess data by articulating a specific software design and specifications. Some of the ideas the group as well as Zirkind used was indexing method selection criteria and programming language selection. The indexing method selection involved the use of complex mathematics to create superior access speed over a Linked List using B-tree. B-Tree is an organizational structure for storing information and retrieving the information in the form of a tree. As for the programming language that Zirkind chose to use was the C language due to its performance and portability. After applying the software design and specification phase the next step was to optimize, this was through key compression and index size reduction. Not only is the key compression and index size reduction important having what Zirkind calls “good engineering” is a huge factor towards optimization. Zirkind clarifies that a good engineer is one that is simple. Engineering in databases and especially in technology needs to keep code and other information simple by reason that the more information that is used in code or other sources takes up more memory. In databases this means that load times are longer than needed.  The practices that Zirkind and the group used made a significant increase in the efficiency of their genetic database by 7 to 9 times the original access speed over the databases they used for testing. Also according to the article, the database normally used 7 disk access to record all the data within the database, however, with the new optimization the reduced the use of disk access to a maximum of 2. The reduction of the use of disk access was by recording data loaded into memory and record blocking.

Although, optimization is an extremely important factor to the future what is more intriguing is a viable project that India is currently running for the benefit of the country. This is a prime example of a genetic based database and according to TechCrunch, a website that shares technology related information, mentions that India has a project known as Aadhar. This is a country based product also known as the Unique Identification project. TechCrunch shares this project captured demographic and biometric data on over 500 million residents which is noted as the largest biometric project on earth. The idea of this project is a huge benefit to the country because of the people whose information was gathered none of them can access country based funds to support them due to lack of identification. According to a research firm CLSA, more than 40% of the Indian government’s $250 billion worth of subsidies meant for the poor is lost to corruption. Fortunately, Aadhar will eliminate the corruption by enabling a direct cash transfer from the subsidies to those who need it. While the project is a very good for the country there are multiple risk factors which include privacy issues, information security, and information sharing. Data redundancy is a very common problem and Aadhar is aware of this and has a process called de-duplication which in essence checks the data that is just registered with the other 500 million residents and if any matches are found it destroys any redundancies. Some of the other concerns involving Aadhar are in particular political mainly caused by upcoming elections. As well as the idea of any U.S. government agency spying or in specific the NSA getting information they don’t need the website writes.

Database optimization with a focus in genetics relates to CIS 305 because it relates to databases and brings ideas from the past and brings methods that may be useful and can be applied towards future coursework or even career endeavors. Other ways it relates to the course is by presenting more information on databases that is not covered in the class. The information presented is not limited to database but also common problems in the real world. The Aadhar project is not only a phenomenon but also a step closer to justice as well as a form of optimization towards society against corruption.


Capron, M. Mauron, A. Elger, B. Boggio, A. Ganguli-Mitra, A. Biller-Andorno, N. (2009, June) Ethical Norms and the International Governance of Genetic Databases and Biobanks: Findings from an International Study. ACM. Retrieved January 22, 2014, from

Mishra, P. (2013, December 6). Inside India’s Aadhar, The Worlds Biggest Biometric Database. Retrieved from

Zirkind, G. (2006, April 13). Genetic Database Optimization: How Data Inspection and Consideration, Provides for Index Compression and Record Access Optimization of Genetic Databases. ABI/Inform. Retrieved January 22, 2014,from