Data Mining in Software Engineering{0}

By Jesus S.

The today’s day and age everyone thrives off of data. Data has become key for organizations to succeed. Whether it be in the marketing field, health care field, agriculture field, and the list goes on! Data can help you strike gold, if you know can find out what people want, what people dislike, what is trending than that can help you become successful. Data mining is the technique we use to collect as much information as possible and turn it into data which is useful to us. I will be describing how data mining is used in software engineering.

Have you ever noticed when you install a new program or software on your computer, it may ask you if you would allow it to obtain information to help enhance the user experience? This is pretty much asking you for consent for them to get as much information possible to see if they can pick up any patterns which will help with them developing and upgrading the current software. Apple banned a lot of apps because they were secretly collecting users’ personal information. Apple states “Data collected from apps may not be used or shared with third parties for purposes unrelated to improving the user experience or software/hardware performance connected to the app’s functionality, or to serve advertising in compliance with the Apple Developer Program License Agreement.” (Apple , 2017)

Data mining in software engineering helps with the development process, it helps with the management aspect, and of course with the research process for the development of a software or program. When developing a software, developers want to know if there is any other software out there that is similar to theirs and what people do not like about it so when they develop theirs they can improve on those complaints. It is all about being better than your competitors with your software if not you will lose or be in the same boat as your competitor. If your program is more productive and works more efficiently then you will have edge. A prime example of this would be Sonos. Sonos is a company that sells Wi-Fi speakers. Sonos was the one of the first companies to have Wi-Fi speakers but like always you are bound to gain competition. Now Sonos competes with Play-Fi, Bose Soundtouch, and a couple of other companies. Sonos still has an edge and is the more popular option because its app is a lot more user friendly. Play-Fi speakers sound better than Sonos speakers but because their app is a lot buggier and a little more complicated to use so most customers prefer Sonos.

As stated in this journal “Classification and assignment can sometimes be automated, but are often done by humans, especially when a bug is incorrectly filed by the reporter or the bug database. Anvik et al. (2006, 2005) and Anvik (2006) have researched automatic classification of defects by severity (‘triage’), and Cubrani and Murphy (2004) have studied methods for determining who should fix a bug. Both approaches use data mining and learning algorithms to determine which bugs are similar and how a specific bug should be classified.” (Quinn & Giraud-Carrier, 2010) In this example we see how data mining helps detect bugs and classifies them in a database so they have information about that bug and help prevent that in later software.

Tools like the one I stated before is what helps with data mining. Another software that is widely used is called the Concurrent Versions System which pretty much stores different records of revisions done to different codes which allows them to go back and revise or start a new piece of code. Bugzilla is another software used to allow developers to record problems with their software. Organizations set up Issue Tracking Systems so they can manage and store issues they have with their software.

Like always there will always be limitations and challenges to data mining in any field and this field does not get excluded. Finding relevant information is always a big one, so times you can collect a lot of information but if it is not relevant then it is useless. Another issue that can result would be the amount of data they collect and store. They can collect to much information which they may never even have a chance to look at. Another limitation would be the data type itself. As stated in this journal, “Software engineers usually use individual data types to perform a software engineering task. However, software engineering domain can be complex and thus software engineering tasks increasingly demands the mining of multiple correlated data types to achieve the most effective results. Moreover, there are cases that significant information is not only associated with individual data items but with the linkage among them. Thus the requirement of techniques that analyze complex data types taking also into account the links among SE data is stronger than ever in software engineering.” (Halkidi, 2011)

Data mining in software engineering has its benefits. It benefits the developers and the users like I stated above. It helps with debugging which benefits both of them. It helps with going back into the code and retrieving various versions of code. Data mining always has its challenges and limitations which is part of the nature of data mining.

Apple . (2017, February 16). App Store Review Guidelines. Retrieved from Apple Developer :
Halkidi, M. (2011). Data mining in software engineering. Intelligent Data Analysis.
Quinn, T., & Giraud-Carrier, C. (2010). Applications of data mining in software engineering. Int. J. Data Analysis Techniques and Strategies, Vol. 2, No. 3.