Article posted on the GRBN’s newsletter on July 23rd
Before the first self-service grocery store, Piggly Wiggly, opened in 1916 in Memphis, Tennessee, customers had to give their shopping lists to clerks, who would then pick out the goods. It was a personal interaction in which the clerk developed a deep knowledge of the customers preferences. The act of shopping became to some extent anonymised. The introduction of the scanner on the 26th June 1974 in Marsh Super Market, Troy (Ohio) introduced detailed shopping information at a large scale. Later on, without losing anonymity, shopping lists became more personal again with the introduction of the Loyalty Cards in the 1990s (Kroger, Safeway, Tesco). With Loyalty Cards, retailers were able to establish a personal relationship with customers at a massive scale. Furthermore, a new turn of the screw was brought by e-commerce and predictive model, which use all sorts of individual-specific details.
Until very recently, market researchers have almost exclusively interacted with respondents in a pre-Piggly Wiggly model: a one-on-one relationship between the researcher and the survey participant. Following the ICC-ESOMAR ethics code, personal information is consistently removed. It is very easy to separate personal from declared data and get an anonymised sample for analysis.
However, technology disrupted this peaceful model. Researchers live in the digital world as well as their subjects of analysis. The more our lives get digital the more the data collection must be digital. If researchers want to holistically understand their customers they need to know what they do on their desktops or mobile devices in terms of online searches, browsing or apps usage. Furthermore, data related to their environment such as geolocation or audio detection and matching is also gathered to complement the analysis. Thus, we can say that in the last 10 years researchers have faced the challenges retailers experienced in 90.
Survey and behavioral data complement each other, providing a 360º view of the consumer: opinion and behavior. But they are radically different regarding anonymity. While PII can be easily removed from survey data, it is deeply embedded in clickstream behavioral data (a collection of visited URLs), geolocation or audio detected data. A simple inspection into it makes it possible to identify 85% of the users (“How to protect privacy in Big Data”, ESOMAR 2016). In fact, browsing data from two users sharing a device can be easily separated by simply inspecting clickstream data (“Who is who with behavioral data”, ESOMAR 2017).
The variety of datasets available in the market encourage the collaboration among different companies. The challenge in sharing clickstream data with third parties is to avoid violating individual’s privacy rights, as defined in the GDPR. This intricacy must be tackled from two different angles. On one hand, a refined Machine Learning model must be capable of masking all PII information. And on the other hand, this model can only be successful as long as companies are trustworthy enough to make people as long as share their personal information without hesitation.