Data Mining Within E-Commerce

By Gary C.

In e-commerce, data mining is critically essential in order to compete with the rapidly growing competition amongst retailers. E-commerce is the exchange of data within the online world in order to garner business transactions. There are patterns and trends within shoppers that are analyzed and broken down in order to determine strategies to identify a multitude of situations, such as from what customers may like based on a previously purchased product all the way to why customers tend to avoid a certain product. The amount of raw data that is transmitted through data mining is astounding and requires a tremendous amount of research in order to determine the most of every possible likely scenario.

Through data mining, a repository of information containing consumer shopping habits proves to be beneficial for expanding business potential. Consumers’ interests, habits, preferences, and demands can be determined from gathered data. Additionally, this same data allows retailers to predict future behavior and market specific content to these same consumer groups. This allows companies to save time and costs, as they are building from a library of data that provides efficiency and convenience.

Some common techniques in data mining include association rules, clustering, and prediction. Association rule extracts information that has correlation and association among sets of items in databases. Clustering groups similar sets of data together to form a particular class. And prediction is used to predict unavailable data based off data that has already been gathered. (Ismail, M., Ibrahim, M., Sanusi, Z., Nat, M., 2015).

There are a variety of tools that are used in e-commerce and data mining, based on what most suits the business. Aside from commonly used data mining tools such as Weka, other software such as Oracle Data Miner, IBM DB2 Intelligent Miner, and SAS Enterprise Miner are all highly regarded deep data mining software used by many businesses (Dai, 2014). For measuring site analytics, Google Analytics and RJMetrics can provide data on consumers visiting sites, such as when peak shopping hours are to where the vast majority of consumers are located and their range of age and gender. Some companies will use Facebook Page Insights and Twitter Analytics as a means of gathering data on consumers, when they share to friends and family on social media outlets when products have been purchased. Through the total number of Likes received and where they came from to the audiences reached, social media helps to determine the interests of customer demographics.

Businesses utilizing web data mining are able to create a variety of successful marketing models for targeting consumers. For example, a business can obtain the visitors’ individual preferences through web log records in order to better understand the customers’ needs. This allows businesses to not only retain old customers but to also find potential new customers and generate appropriate models. Businesses can choose to a build a particular model based on this data for each region and narrow by customer type. The ability to have a large number of models allows for better focus on certain demographics through personalization of service and this targeting can in turn generate higher return on investment.

Another example of businesses using data mining to efficiently target consumers is through customer recommendations. Amazon is one of the first companies to have utilized this method of targeting consumers (Lessons). Through big data, Amazon suggests a variety of products to the consumer based on his or her browsing history. Amazon gathers information on consumers who purchased certain products and suggests additional recommendations based on what you and others have purchased and searched for in the past. This is a method businesses use to combat consumers leaving items in their shopping carts and ultimately leaving the site without having purchased the product(s).

In online business, merchandise planning can help determine just how much stock a company should have in their warehouse. If a business feels they are turning enough profit, they can decide on whether or not they need to open up additional warehouses or stores. Through merchandise planning, businesses can also determine what price point they should set for certain products and services offered. The specific amount of stocks a business should purchase throughout the year and during buying seasons is also attributed to data mining (Ismail, M., Ibrahim, M., Sanusi, Z., Nat, M., 2015).

There are a number of challenges and limitations within e-commerce, with much of the issues falling upon incorrect data and data collection being an afterthought. Analysis of data from revealed an oddly large number of female customers. Even customers with male names identified themselves as female. The issue was the registration form, which defaulted the gender field to only “female.” These types of oversights tend to cause incorrect data to occur, as customers will not bother changing the default value when one has already been given. Additionally, approximate form validation with the drop down lists as opposed to text free fields will save much time needed for data cleansing and future data analysis (Kohavi, Mason, Parekh, Zheng, 2004).

Session timeout is extremely important to how data is collected and to be determined, as the duration of session timeouts can greatly affect clickstream collection. Session timeout for e-commerce sites is recommended to be set to a minimum of 60 minutes. When timeouts occur, consumers will leave the site without checking out or they may start anew, causing captured data to become inaccurate. Another challenge within data collection is when visitors’ demographics change with time. People get married and have children, habits and salaries change, etc. When these changes occur, the visitors’ needs are being remodeled in their own lives but not necessarily in terms of data already collected. As a result, these “slowly changing dimensions” become quite difficult to track and businesses must find methods to keep up to date with these consumers (Kohavi, Mason, Parekh, Zheng, 2004).

The scalability of data mining algorithms can present itself to be an issue as larger amounts of data are gathered. Yahoo has over 1.2 billion page views in a day and the data mined can cause the algorithm to become quite complicated, due to the nonlinearity of the scale (Ismail, M., Ibrahim, M., Sanusi, Z., Nat, M., 2015). Another difficulty e-commerce companies must deal with is making data mining models understandable to business users, as the models can become quite complicated to explain in a simple presentation. E-commerce companies must be up to date with how search engines work and whenever any algorithm changes occur, as these changes can cause the ranking of e-commerce sites to plummet and not display when certain keywords are searched by consumers.

Data mining is the key to success within e-commerce and the challenges continue to grow with each passing year. New consumers and competition will cause businesses to create new models and algorithms in order to accommodate these changes. Businesses must learn to collect the right data from the very beginning and keep tabs on these consumers, as their lives change over time, along with the demographics of their data. The data on consumers is available for businesses within e-commerce, as long as they properly utilize their resources efficiently.


Lessons from How Amazon Uses Big Data. (2014, August 20). Retrieved from

Dai, T. (2014). International trade e-commerce based on data mining. Retrieved from

Kohavi, R., Mason, L., Parekh, R., Zheng, Z. (2004). Lessons and Challenges from Mining Retail E-Commerce Data. Retrieved from

Ismail, M., Ibrahim, M., Sanusi, Z., Nat, M. (2015). Data Mining in Electronic Commerce: Benefits and Challenges. Retrieved from