The EU General Data Protection Regulation and Me (and You)

Andrew Stephens, Saturday the 23rd of June, 2018 in Computing, Rant

The GDPR has been implemented and this may affect your website, even if it is completely static.

A photo I took of some data
A photo I took of some data

The GDPR came into force in the EU on the 25th of May, 2018 to much wailing and gnashing to teeth from IT people who should know better. The GDPR specifies in great detail what data about individuals can be lawfully obtained and stored without those individuals' explicit and informed consent.

Although I am not in the EU, I fully support the intent of the legislation. For far to long companies on the web have been amassing huge databases of information on people using their websites and it has already caused all sorts of trouble.

Websites naturally accrue vast amounts of data and are subject to GDPR rules. I am not just talking about sites like Facebook which obviously have a lot of peoples life storiesI previously wrote The Seven Realities of Social Networking on that topic, but even static sites like this one passively accrue data in logs that contain potentially identifiable information.

Sheep.horse only gets a dozen hits a day and as the admin I get to see the IP addresses of every visitor in the logs. IP addresses are not tied to a single person or location, but in practice I can make a pretty good guess which of my friends is reading my blog at a given time by using a location service, particularly if I know they favor a particular OS.

Snippet of the access.log file from this website. The highlighted lines are from me browsing to the top level page. My IP address is the first column on the left.
Snippet of the access.log file from this website. The highlighted lines are from me browsing to the top level page. My IP address is the first column on the left.

The GDPR contains explicit language allows for data like this to be stored for technical reasons but makes website owners liable for any misuse of the data, even if it is leaked as a result of a hack.

In the case of Sheep.horse, this logging is sometimes useful to diagnose immediate problems with the site but I can't think of a reason to store the logs long term, which is the default. But defaults exist to be changed and I have modified the /etc/logrotate.d/nginx file to shred logs after 24 hours. Data can't leak if it has been deleted.

This seems pretty silly on Sheep.horse but the GDPR is really aimed at companies that are in the business of obtaining data. And while this obviously includes Facebook and its ilk, personally I am more concerned with the advertising companies that gather data across a huge swath of sites.

I am glossing over a lot of details here. In particular, on many sites there a multiple ads from different sources, sometimes dozens. And it is not all bad news, your browser is supposed to provide some level of security but that has led to a decade-long arms race between the advertisers and the browser makers. Keep your browser up-to-dateInternet advertising is a huge topic it itself but briefly, almost any ad you see on a website is not put there directly by the publisher of the website. Instead, that publisher (say localnewspaper.example) has rented out that space to a third-party (advertisingcompany.example). When you visit localnewspaper.example to read an article, your browser also contacts advertisingcompany.example to download the adIt is this step that adblockers prevent. AdvertisingCompany now knows that you have viewed that article.

Next you visit another site (magazine.example). This also has ads from advertisingcompany.example, and AdvertisingCompany now knows two things about you. As you browse, AdvertisingCompany is building up a pretty good picture of the type of person you are including all sorts of identifiable information. In fact AdvertisingCompany might have detailed files on what you have read going back decades. This data is extremely valuable to AdvertisingCompany since it is used to control which ads you see.

You may have clicked on a banner at localnewspaper.example.com to consent to that site storing cookies on your computer, but did you give your informed consent for AdvertisingCompany to collect this data? My guess is not.

I wish more countries would implement something like the GDPR to limit this pernicious form of business.

In the meantime, website owners, limit the amount of data you collect and store. Everyone else, use an ad blocker.