Well this sucks. You are not actually accessing sheep.horse directly, I have had to introduce an intermediary due to abuse.
This goes against the ethos with which I have run this site up until now and I bitterly regret needing to do it. But the only things worse than big corporations owning vast swathes of the internet are botnets running wild.
A few weeks ago I noticed a large upswing in the number of people accessing the front page of this website. Normally this would be welcome but a quick look at the logs revealed that the traffic was all coming from obvious bots repeatedly hitting a single page.
I am not sure what the motive for such activity is. Looking at a sampling of the IPs shows that most of them either belong to hosting companies or corporate ISPs. My guess is that they are mostly corporate desktop machines that have been 0wned by malware. The rates of data are too low to be a deliberate denial-of-service attack - my best guess is that they are trying to generate a bunch of legitimate-looking traffic in an attempt to drown out their nefarious deeds in any firewall logs.
I probably wouldn't have even noticed if they just hit the index html file but they appear to be using a real browser to make the requests. This downloads the rather large splash image on the front page as well has hitting my basic analytics script which is what made me notice in the first place.
Every server connected to the internet is continually being bombarded with bot traffic so even then I wasn't particularly bothered but after a few days the traffic started to increase. It doubled and then doubled again.
First idea - do nothing
The extra traffic wasn't affecting the opperation of the site in any way apart from cluttering up my access logs. I decided to let it slide for a while and probably would still be ignoring it today if the number of hits hadn't continue to climb.
This still had no effect on the operation of sheep.horse, but the bots were using up bandwidth and I didn't want to run afoul of Digital Ocean's limits on \$5 servers. I started thinking of ways to block the traffic.
Second idea - slim down my homepage
I didn't really need that huge image on my front page so I removed it and went on with my day. This dropped the bandwidth but the bots were still hitting my analytics script. This irked me.
It irked me even more when the bot traffic increased again, outweighing the reduction in page size by sheer numbers.
Third idea - firewall rules
Time to get defensive and start adding firewall rules for the worse offenders. I went through my logs looking for likely suspects that hit the page many time (addresses have not been changed to protect the guilty, because fuck them)
grep "GET / HTTP/2.0" /var/log/nginx/access.log | awk '{print $1}' | sort | uniq -c | sort -nr 117 205.185.223.43 75 205.185.223.42 68 205.185.223.54 65 205.185.223.104 58 205.185.223.55 58 205.185.223.160 52 205.185.223.61 52 205.185.223.159 49 205.185.223.166 48 205.185.223.165 47 205.185.223.103 46 205.185.223.4 45 205.185.223.97 41 205.185.223.91 41 205.185.223.9 39 205.185.223.85 38 205.185.223.90 37 205.185.223.84 35 205.185.223.3 29 205.185.223.98 27 205.185.223.60 27 205.185.223.172 26 205.185.223.25 26 205.185.223.135 25 205.185.223.171 22 205.185.223.49 22 205.185.223.48 22 205.185.223.136 22 202.73.14.58 22 116.110.45.234 21 205.185.223.154 19 222.253.24.16 19 209.127.24.22 18 205.185.223.153 17 216.151.191.7 17 205.185.223.10 17 173.245.203.176 ...
There were several low-hanging fruit subnets that could be blocked with firewall rules.
sudo ufw insert 1 deny from 205.185.223.0/24 comment 'Denied'
I knew this would only be partially effective - a lot of requests were coming from unique IPs but I figured that this was a numbers game now and blocked the top offenders. This dropped the traffic down to a reasonable level - for a few days. Whatever botnet was doing this was ramping up and I started to get hit from so many IPs that I quickly realized I could never block them all.
Fourth idea - Cloudflare
By this stage I was pretty sick of this fight. Sheep.horse gets about 30 legitimate hits on a good day so setting up automatic responsesI investigated fail2ban but I thought the number of unquie IPs was growing too fast even for that solution seemed like overkill.
Reluctantly I turned to Cloudflare which offers a free service for just such situations. I say reluctantly because the way Cloudflare works is extremely heavy-handed. All traffic to sheep.horse now goes through Cloudflare's serversIndeed, I had to point the sheep.horse domain name at a Cloudflare's nameserver and is proxied to sheep.horse. In fact, it is impossible to access sheep.horse directly because I have reset the firewall to block all non-proxied traffic.
I was hoping that Cloudflare would recognize the offending IPs as suspicious immediately but that was not the case. I ended up having to create a simple Cloudflare rule to serve up a captcha based on certain patternsI am being coy here because they would be simple to bypass I recognized from my logs. That did the trick - the bots cannot solve the captcha.
Result
I am not happy; this cannot be described as a victory. Cloudflare is working well enoughCurrently blocking over 1500 hits an hour from the looks of it but I bitterly resent having to introduce a third party into what was a beautifully simple system a few weeks ago.
Some legitimate readers of this site may actually be asked to solve a captcha before Cloudflare lets them see my pages - a massive imposition that I apologize for. And, of course, Cloudflare gets to see all my traffic.
Maybe I will be able to remove the Cloudflare integration in a couple of months but I am not holding out much hope.
A pox on whomever set up that botnet and a begrudging thank you to Cloudflare. This is why we cannot have nice things.