Data Quality is a key step of any process of any data analytics project.
As such, attribution is heavily tributary of such processes to make sure the algorithms are fed clean data to give clean insights.
But Data Quality should never be considered a one and done project. It is a year long battle to adapt the media mix to its evolutions and to work outside of the media mix to educate, preempt and prepare for any change.
What is data quality?
Data quality is the alignment of marketing datasets with standards of validity, coherence, and operational usefulness defined as the golden standard for the company.
As such, data quality in marketing focuses mostly on the alignment of the visits, costs, imported costs and imported impressions up to a campaign level to have the most precise of insights.
It mostly focuses on having all metrics aligned, or at least minimising the amount of visits and conversions that are not included in the original standard of coherency of the company in its naming convention.
It is generally considered acceptable to have less than 3 to 5% of your traffic and numbers being “not in the right place” as it would not drastically alter the learnings of large models.
Thus, any metric controlling the variations in data and imprecisions are to be checked regularly to avoid issues ?
But why do we do it then ?
Attribution models, and teachings in general are mostly based on having the proper data sets aligned between what you are explaining, and how you explain it.
To give an analogy, you are trying to guess the weather over the city of Paris, you have an amazing amount of data to work with. But in the precipitation you forgot to deduplicate your Paris data in regard to the country as you had no way to determine which country “Paris” is in which Country.
Now you enjoy your weather forecast with data from Sweden and Lima at the same time. Needless to say, this is going to be interesting when you try to plan whether to bring a coat, or a shirt.
This problem is easy to grasp, but runs much deeper in marketing when you start to touch nuances relative to the device supporting the advertisement or the A/B test cases.
Then, before any study, a rigorous data quality pass must be done to ensure clean learning.
Part 1 : Preemptive works
“The best step to solve a problem is not to have it in the first place”
- First preemptive work is to create a utm link generator : these tools will write the url of landing for the user and are already taking into account the rules for the user.
Making clever use of conditional writings and some excel magic, it is possible to create reliable tools for this
The main issue with that one is that they need maintenance and sometimes requires enforcement to its use. - Establishing a proper naming convention : whilst it seems quite self explanatory, this step cannot be skipped as you need to think a naming convention with the following features:
- Easy to understand
- Easy to implement
- Able to adapt itself to addition of either countries or other campaign types
Example : [CountryNaming]_[CampaignName]_[ProductTargeting]_[Datationsystem]_[extra]
- Testing redirection links : making sure they work is simple, but often the most reliable way to make sure nothing will go awry.
- Making good use of macros made available by partners : multiple trade partners have their own automated macros. The main drive for them is that it unifies without any effort the utm links and are reliable.
But they are not to be used blindly : use the right macro and read documentation between your utm and your API key
Part 2 : Reactive works
”Anything that can go wrong, will go wrong” Murphy’ law
Despite everyone’s best efforts, issues will eventually worm their way in the media mix. It is up to the reactive side of the controls to take over to quickly detect, identify the root cause and solve the problem of misalignment of the data.
- Control reports and dashboards : if you have a general naming convention, in theory every campaign should follow it.
As such, any automated report and dashboard made for filtering can quickly detect campaigns not belonging in the group.
- Automated alerts : But sometimes, the anomaly is not in the naming conventions but the value taken by the metric.
In those cases, automated calculations and correlations of variations can become handy tools to spot the errors as they emerge for a quick reaction.
In the absence of robust preventive safeguards such as described, the reactive dimension of data quality management becomes all the more critical.
In these situations, the good communication, clarity of the anomaly report and ability to find the right person become the essential skills to master to avoid troubles in the long run
Part 3 : Wizaly’s approach to the problem
Being an attribution solution, Wizaly is inherently attentive to all these issues and proposes dedicated support by the data analytics team.
Each month a full report is generated and checked by the teams in charge of your account. They then write a report stating any anomaly they might have found with a general overview of the behaviour of your key KPIs.
They are also assisted by a fully in house built automated alerting system toggleable in one click. Featuring varied algorithm and a tailored approach for marketing data, the system is able to detect and classify a multitude of anomalies such as :
- Tracking drop
- Connector failure
- Strategic chances of investment
Conclusion : Data Quality is a key element in the life of an account and any project.