GDPR highlights just how much JavaScript webpages devote to tracking you (yes, you)

By: Kate W. Zimmerman and Patrick W. Zimmerman

New regulations like the EU’s General Data Protection Regulation (GDPR) are interesting points of inflection, a shift in the rules of capitalism, a chance to see how companies react to changes in their ecosystem. You notice this change when 103,482 emails hit your inbox about updates to privacy policies for services you didn’t know you had.

So we’re taking this opportunity to look at one of the more opaque aspects of the internet economy: how much space on a webpage is actually devoted to tracking your every move in order to present you with “better ads?” For this pilot study, we are going to take the 50 most trafficked websites, and compare the page source code for users coming from an EU country (France, in this case, but we picked that for reasons of convenience, not for any substantial reason other than that there was a fast and secure socks5 proxy) and the US.

It seems as if the invisible hand of the self-correcting market doesn’t really work so well when there is asymmetrical information distribution. What we mean by that is that trackers are intentionally hidden. Users have to look at the source code to access this information. For the large number of you out there who may not know how to do this, that means looking at what your browser is actually receiving from the web server, rather than the effects of that code. On Firefox, Chrome, or Edge, hit ctrl/cmd+U (depending if Windows or Mac). On Safari, you have to enable develop mode first (Preferences > Advanced > check “Enable Develop Mode”), then it’s cmd+option+U.

What? You don’t do that on most of the pages you visit?

So, there’s a small percentage of people who a) actually know how to do this, b) bother to do so on a given site, and c) also know how to interpret the difference between:

  • Scripts/jq_plugins/jquery.quick.pagination.min.js,
  • stats.g.doubleclick.net/dc.js, &
  • connect.facebook.net/en_US/fbevents.js?

Hint: the first is an innocuous way to have multiple pages with tabs on the same site, the second is the Google ad server, the third is part of the Facebook tracking pixel.

Trackers used to be less prevalent, but the spread of high-speed internet, even to mobile devices, has basically removed the cost of putting a gazillion JS scripts in the header of a webpage. The load time difference is not remotely as noticeable as it was even 5 years ago, much less 10. That is what we mean by asymmetrical information, and it creates a marketplace where the vast majority of people using a service cannot reasonably be expected to be able to give informed consent. Thus, the idea that the online world can be self-regulated because people will refrain from using sites that engage in objectionable practices is hilariously naïve.

GDPR also provides an opportunity for companies to realize that Bigger Data isn’t always Better Data. The reflex action of always looking to collect more and more information about your users often exacerbates the signal:noise problem. Now that there’s an increased cost to collecting data, this is an important time to strategize about how and why you collect and analyze information about your customers. Weigh what information you actually need. What questions does it help you answer? If the questions asked by your research team aren’t linked to your company’s long-term strategy, then just collecting more information isn’t going to solve that with magic sample-size dust.

So, the market failed. That’s where the EU stepped in.  How have companies reacted?

Let’s find out.


The question

The GDPR has forced websites to react by making their users explicitly opt into their tracking system or ditch those systems entirely for traffic coming from within EU countries.  What does this tell us about how much bandwidth was previously devoted to those hidden activities?


The short-short version

We ran a pilot study of the top 50 domains ranked by traffic according to Alexa and Moz, and whoa, yeah, there’s a big difference in the index pages served to EU and US visitors.  There’s clearly something there that merits a longer-term project.


The results

Since GDPR, do EU visitors have to deal with less JavaScript?

The answer: Yep.

When visited via the US IP, the top 50 domains, the median number of JS elements loaded was 21, over 40% more than the median for visits from the French IP (13). We used a Wilcoxon signed-rank test (because we didn’t want to assume a normal distribution) to test for significance: T = 72.5, p = .0002.

JavaScript Element Counts by IP Country
JavaScript Element Counts by Country

We did find an odd outlier: Dropbox loaded 127 elements when the index page was visited by the French IP, vs. 55 elements when visited by the US IP. (Yahoo was another outlier, though its overall trend – with more elements loaded for US than France – was consistent with other domains.) The chart below shows elements loaded by domain.

JavaScript Elements by domain

We also checked for cookie consent notices: you may have seen a few around, saying something like, “We use cookies to ensure that we give you the best experience on our site and blah blah blah blah blah require you to give up your first born child.” Notification of cookies & consent is required by GDPR when a website uses cookies to track you. Of the 50 sites we visited, 30 served cookie notices; and of those 30, only 6 bothered to show the cookie notices when visited by the US IP.

The chart below shows load counts for domains visited by the US IP plotted against the load counts for domains visited by the French IP, and whether a cookie notice was served. Some sites loaded very few JS elements, and it’s possible those sites don’t have tracking that would require cookie notices.

JavaScript Cookie notices
Cookie Notices Served by Domains for US vs EU IPs

One important note: we didn’t differentiate between JavaScript loaded for tracking vs. content. In a follow-up, we’d like to code and measure tracking-specific JS; we expect that counts will be much higher for US visitors than for EU visitors.


The methodology

We identified the most trafficked sites using a combination of rankings pulled from Alexa (top sites by Country) and Moz. With Alexa’s API, we were able to extract top sites for the 5 countries that account for the most traffic in the EU: Germany, UK (the GDPR will continue to form part of UK law after Brexit), France, Italy, and Spain. However, Alexa isn’t a perfect source of truth for internet traffic: it relies on stats collected by people who have installed extensions with Alexa data or sites that have installed Alexa Certify. That’s why we supplemented our Alexa data with data from Moz and built an improper linear model averaging the ranks from Moz Alexa.

Once we had our list of 50 trafficked sites, we visited the index page of each site in Firefox without a proxy (resulting in a US IP address) and then using a French proxy IP. We tracked JS loads using Firefox’s Developer tools.

Dev tools

All our data crunching was done using Jupyter Notebook with Python 3.


Takeaways

Complying with GDPR bears a real cost to companies: extra development costs (maintaining different experiences for visitors from different IPs; auditing, updating, and eliminating tracking), less insight into user behaviors, and potentially reduced traffic if users find privacy & cookie notices to be off-putting.

….but perhaps this will lead to better outcomes in the long-run. This is an opportunity for companies to step back and focus on the key strategic questions they need to answer, rather than suffering from data overload.


What next?

Well, the first thing is to dive into questions that were raised in this preliminary analysis: why do some sites load more JavaScript for EU visitors than US visitors? How much of the JS loaded was used for tracking? Do some sites completely forgo tracking?

And we want to expand our dataset, ensuring we gather a more representative sample of the internet, rather than just the top 50 biggest traffic sites.

Most of all, we want to evaluate changes over time. Will companies continue to serve different experiences to EU vs US visitors, or will they move to a more consistent approach? Might we see regulations like GDPR implemented in the US? Will we finally stop being chased around the web by those bizarre cowboy boots? We only looked at them that one time!!

About The Author

Kate is a pragmatic dreamer who loves overanalyzing the world around her. She works in Silicon Valley, where she somehow stumbled into the sexiest profession of the 21st century, and lives in San Francisco with her darling husband (*cough* Principally Uncertain’s founder *cough*), two apartment-sized dogs, and their adorable toddler (who will someday take over the world).

No Comments on "GDPR highlights just how much JavaScript webpages devote to tracking you (yes, you)"

Leave a Comment