Blocking HTTP1.1 - Some Results

, in Computing, Rant

A couple of weeks ago I wrote that I was experimenting with blocking HTTP1.1 requestsTLDR: v1.1 traffic is almost 100% bots to my site. Here are some observations, in case anyone is thinking of following in my dubious footsteps. I am still not good at drawing cool logos I should mention that I went into this experiment with impression that almost all maintained software would be at least using the newish HTTP2.0 standard to make requests - it is officially over 10 years old as of 2026 and is widely implemented in libraries and frameworks.

It turns out this assumption is very wrong. The main problems are listed below but, cutting to the chase, the relevant section of my Caddyfile now looks like this:

# Return an error for clients using http1.1 or below - these are assumed to be bots
@http-too-old {
    not protocol http/2+
    not path /rss.xml /atom.xml # allow rss
    # allow some acceptable bots
    not header_regexp User-Agent (?i)(Google-Site-Verification|googlebot|bingbot|duckduckbot|mastodon|^Lynx)
}

RSS Feedreaders

Almost all of the popular RSS feed readers use HTTP1.1. I mitigate this slightly by explicitly excluding /rss.xml and /atom.xml from the block. Unfortunately, some feed readers also request pages from the site to generate previews, etc, which are now blocked.

This is exacerbated by my blogging software generating a truncated feed containing only the first sentence of each post. This is a little shameful and something I have been meaning to fix for ages. For now, readers have to click through to read my posts - I apologize to anyone for whom this is a burden.

Maybe I will add the more popular feedreaders' User-Agents to the list above.

Google

I was most annoyed that Google of all organizations are still using v1.1 to spider sites. I had to add both Google-Site-Verification and googlebot before the search console stopped complaining.

Honestly, I thought twice about adding an exception for Google. The days where I was desperate to have Google spider my site are long gone - in 2026 I know they are just spidering the web to boost their own AI results and I receive so little traffic from Google that it hardly seems to matter.

I foresee a day where I don't care to make an effort not to block them, but they get a reprieve this time.

Lynx

I received some interesting feedback from Mastodon.

@sheephorse congratulations, you excluded all lynx users (still the #1 text mode browser, especially among reduced-vision people). That’s not just throwing the child out with the bathwater, it’s throwing the neighbours’ kids out as well for good measure.

@mirabilos@toot.mirbsd.org on Mastodon

I don't usually do this but sometimes a meme just fits
I don't usually do this but sometimes a meme just fits

I must admit I hadn't thought about Lynx for years. I just assumed that if it was still around it would support HTTP2.0 by now but no. I do not get a lot of Lynx traffic but I did not mean to exclude them.

During testing I downloaded Lynx and played around for a while. I am happy to find that sheep.horse works reasonably well in a text-based browser - score one for plain old HTML websites.

Conclusion

There is still a surprising amount of legitimate HTTP1.1 traffic out there. Allowing certain User-Agents through seems tacky but works OK in lieu of more sophisticated bot detection.

As it stands, excluded bots could easily pretend to be a blessed User-AgentI know some non-google crawlers pretend to be some version of googlebot, for instance but it seems a lot of bots don't bother. I'll probably continue to tweak the list of allow agents as problems occur.

Do I recommend this approach? Probably not for a general purpose site unless you are absolutely getting hammered by bot traffic. There are better solutions out there but this seems to meet my needs for now.