Parsing XML EXIF from .avif files (plus a rant)

Andrew Stephens, Wednesday the 3^rd of June, 2026 in Computing, Rant

This quite large image compresses down to just 120k with a nice alpha channel in a nice demonstration of the strengths of avif.

Avif files are the new hotness, and to be honest they are pretty good. Basically a replacement for both jpeg and png with better compression and (finally!) support for alpha channels in lossy images. It's a similar idea to webp but with a more modern codec. Sounds great!

I decided to start compressing some of my photos for this blog in avif as an experiment. And things were looking great until I realized that the code that strips exif data out of my images did not handle avif at all. Looking around, it seems that not many python packages do, since avif is relatively recent.

Long story short, I wrote my own parser for the HEIF container format that avif uses. You can find it here in the gensite repository - I am not publishing it as a separate module because it is very low-end and probably does bad things if the image is malformed.

But all the complexity is wrapped up in a simple API.

# open the image file and remove the exif location data
with open(imagePath, "r+b") as f:
    t = avif_image.AvifImage(f)
    t.scan()
    t.overwriteSensitiveXmlData()

And now the rant.

The HEIF container formatJust to compound the drama, HEIF is essentially the same thing as Apple's HEIC format, renamed because of reasons. is ridiculous. Look, I get that these things are put together over years of committee meetings with input from different companies with different priorities so some amount of cruft is expected. But this is really pushing it - for a modern format they had the opportunity to produce something future-proof and sane. They failed - I am not sure if HEIF is the silliest binary format I have encountered but it ranks.

Firstly, the blocks are somewhat self-describing, with a 4 byte size and 4 byte name header. Sensible, except they then immediately break this by having a magic number to label blocks that are too big for 4 bytes so some block headers have an extra 8 bytes. The self-describing headers cannot even describe themselves, breaking in a way that cannot even be backwards compatible with older software.

The format is riddled with these types of decisions. Weird little boxes with variable length structures that depend on the values of version numbers that you just have to know. Lots of extra complexity to shave a couple of bytes from a container format designed to store multimegabyte image files.

Oh, and the locations of different types of data in the container are stored in different boxes than the corresponding types, so you have to iterate through two different types of boxes if you want to build a complete index of the contents. The boxes are all variable length anyway - why not just store everything together?

My favorite little detail is that strings are encoded as null-terminated in variable length structures. Empty strings are just encoded as a single null character. But if the string happens to be the last member of a structure it is legal to not encode it at all, just denote an empty string by ending the structure early by declaring its length to be one byte shorter.

It makes so little sense and the official standard that documents this mess costs 250 Euros. I reversed engineered it with the help of some C++ codeI used the very clear and helpful code in libheif for the lower price of just my sanity.

Finally, after you have parsed all the headers and boxes and variable arrays, you are now in a position to read the exif data. Surely that is in a sensible format:

<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="XMP Core 6.0.0">
   <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
      <rdf:Description rdf:about=""
            xmlns:exif="http://ns.adobe.com/exif/1.0/"
            xmlns:xmp="http://ns.adobe.com/xap/1.0/"
            xmlns:tiff="http://ns.adobe.com/tiff/1.0/"
            xmlns:exifEX="http://cipa.jp/exif/1.0/"
            xmlns:mwg-rs="http://www.metadataworkinggroup.com/schemas/regions/"
            xmlns:stArea="http://ns.adobe.com/xmp/sType/Area#"
            xmlns:apple-fi="http://ns.apple.com/faceinfo/1.0/"
            xmlns:stDim="http://ns.adobe.com/xap/1.0/sType/Dimensions#"
            xmlns:photoshop="http://ns.adobe.com/photoshop/1.0/">
         <exif:CompositeImage>2</exif:CompositeImage>
         <exif:WhiteBalance>0</exif:WhiteBalance>
... another 7k of this

What was the point of all that binary soup when you are just going to embed a mass of pretty-printed XML anyway?

Sorry, just needed to get that off my chest.