You Should Be Stripping EXIF Data out of Your Image Files Before Posting

, in Computing, Rant

Hey Andrew,

I just came to your site via HN, and a browser extension I use alerted on GPS data being present. This is on the Ponyhenge page, on two images. I thought I'd mention it, in case this is not what you want.

When I received this email from a very nice reader I knew exactly what they were talking about and it was very much not what I want.

Cell phone cameras helpfully tag each photo with various bits of information about how the photo was taken in a hidden part of the image file called the EXIFEXchange Image File Format - yes I know that is two Fs. segment.

Normally this is quite useful but if you are posting images online you should be aware that this also includes the GPS coordinates of where the image was taken. If you post an image from around your house then everyone knows where you live.

A photo of a statue in an undisclosed location. Whoops, the included GPS location kind of gives it away.
A photo of a statue in an undisclosed location. Whoops, the included GPS location kind of gives it away.

If you upload to a social media site, they will (probably) filter this information out so that members of the public can't see where the photo was taken but you can bet that your GPS coordinates go into the huge bag of data that they already collect about you.

The iOS export panel with the Location slider turned off (as it should be)I don't know about Android but Apple software has a handy "Include Location" option when exporting images that you should almost always set to off.

In the case of my blog, the static site generator I wrote would to just copy the raw image to the site when I published a post. I would manually remove the GPS EXIF data for any images that were taken around my house but this process relied on me remembering to do this. Obviously I failed at least once.

Yesterday I did what should have been done years ago and implemented a GPS EXIF stripper for any images that get uploaded to sheep.horse.

This led me down what I call the 7 stages of JPEG parsing.

  1. Wow, JPEG is a really nice binary file format. Everything is laid out in sensible chunks with tags and lengths.
  2. OK, so not every chunk has a proper length. That's fine, just a couple of special cases.
  3. What? The image data itself is in a weird non-chunk that you don't know the length of and have to scan for certain byte combinations. What a terrible oversight but workable.
  4. Nice, got that done. Now I just need to interpret the Application Specific chunk to read the EXIF data. It can't be that hard.
  5. WTF?
  6. No seriously
  7. Sigh, I guess I'll be importing a third-party exif parser after all.