Why Detecting Anomalies is Important?
Detecting Anomalies in your data is important because it enables you, the business owner/founder/head-of-BI to quickly react to the business changes. After all, why would you not want to know that your sales for product X doubled in Country Y for yesterday? Quite often, these anomalies could be buried in the overall volume of data and go unnoticed.
Here are some of the benefits of Anomaly detection
- Avoiding adverse effects of IT Releases on Data quality (a story on that bellow)
- Detecting jumps or declines in sales (another story on that bellow)
- Detecting data duplication issues in Data Warehouse
- Showing the impact of Marketing Campaigns or TV Advertisements
During my early years as Head of Business Intelligence at various companies, I always dreamed about knowing the most important trends and occurrences before the management gets up and sees the morning reports (and trust me, they got up pretty early). My motivation for seeing the anomalies early was rooted in plain pragmatism. Starting from about 8 AM all the way until 12 PM, or even 17 PM, between 10 and 30 pairs of eyes (depending on the size of the company) would look at various daily reports. It was often the case that some over-ambitious Data Intern or a very data-driven founder, would go through the reports with a fine toothpick and discover an abnormal change in the data, zeroing in on it, like a data hound. I was never fond of receiving the emails with headers like “PLEASE CHECK DATA DISCREPANCY”, always in capital letters and always with half of the company in cc. That got me thinking
How can I and my Data team be the ones to know any and all major data changes every morning, before the rest of the company will see them?
The practical implications of detecting anomalies were 2 fold:
- Knowing major data changes made Data Team look good in front of the rest of the company because we could distribute the morning report with major observations already baked into our comments, such as “Please notice the 50% in sales in our Asian markets for yesterday that was mostly driven by a successful CPC campaign we launched the day before.” It seemed like we had our hand on the pulse of the company… which we did!
- Knowing major data changes helped the Data Team focus on investigating possible data-related issues and correct them before the rest of the company noticed them. It seemed like we had our hand on the pulse of the data… which we did!
Stories from the Front Lines
The search for Internet Explorer
One of my many stints as Head of BI was at a very data-driven travel startup. We had over 25 analysts that would support multiple departments within their area of focus. Google analytics was the de-facto reference standard for data accuracy and consistency, while Snowplow Analytics was used as a product analytics tool. One beautiful day it was noticed that our internal session count was about 10% (a random number for the sake of story) below what we saw in GA. Our initial thoughts were that our internal session configuration was different than that of GA. After, multiple checks we have confirmed that the configuration was in fact, the same. The Data team and I continued digging through the data completely lost and bewildered as to what could have caused this drop in sessions. We had checked all the usual suspects:
- Country breakdowns
- Mobile versus Desktop
- Product breakdowns
- Pages (including a gazillion Landing Pages)
- Collector pipeline (including bad events – an awesome feature of Snowplow)
After about a week of digging through (we also had other things to do, so this investigation was done in-between priorities) we had come to the real culprit – one of the releases had broken the website only on a specific version of IE and that one version was responsible for the 10% of the sessions that we had lost. When we looked specifically at the IE browser traffic, the drop was pretty evident. However, in the overall volume of traffic, it was somewhat diluted when looking across all browsers. Having Anomaly detection running across multiple combincations of data dimensions would have really saved me and my team a lot of time on figuring out what was wrong.
Increase in Sales due to COVID-19 Quarantine
This story already comes from SageData and with Anomaly Detection running for one of the clients. The client was an e-Commerce retailer selling, among other things, computer accessories. During the first COVID-19 related quarantine, which started in March 2020, our team at SageData noticed peculiar notification coming through from the Anomaly Detection for one of the clients (because our Analysts were working along with the client to create reports and monitor data accuracy). The notification was showing about a tenfold increase in sales from one region – Italy. It turned out that during the first quarantine, which in March and April was fairly strict in Italy, many consumers rushed online to buy communication accessories, such as headphones, microphones and cameras. Within the overall volume of sales worldwide and across the plethora of products that the company is selling, the spike in Italy was negligible. Because Anomaly Detection spotted this spike in sales within 24 hours, the company ordered more products from the manufacturer, avoiding a scenario where it runs out of the popular products in the EU warehouses and possibly losing out on additional sales.
Detecting Anomalies can be a powerful tool in the arsenal of any data-driven company. Knowing early enough emerging trends or detecting possible data issues can help to avoid costly mistakes and can enable the company to capitalise on new opportunities leading to better profitability. Latest capabilities in Data Science and Machine learning, which are available on the SageData platform for all our clients, can enable even the smallest of the startups utilise the power of Anomaly Detection to scale faster and scale smarter.