Besides “don’t use Cloudflare/AWS/etc”, how can we make our selfhosted setups resilient to outages like the ones we’ve seen recently?
Besides “don’t use Cloudflare/AWS/etc”, how can we make our selfhosted setups resilient to outages like the ones we’ve seen recently?
Maybe you could describe what you mean by self-hosted and resilient. If you mean stuff running on a box in your house connected through a home ISP, then the home internet connection is an obvious point of failure that makes your box’s internet connection way less reliable than AWS despite the occasional AWS problems. On the other hand, if you are only trying to use the box from inside your house over a LAN, then it’s ok if the internet goes out.
You do need backup power. You can possibly have backup internet through a mobile phone or the like.
Next thing after that is redundant servers with failover and all that. I think once you’re there and not doing an academic-style exercise, you want to host your stuff in actual data centers, preferably geo separated ones with anycast. And for that you start needing enough infrastructure like routeable IP blocks that you’re not really self hosting any more.
A less hardcore approach would be use something like haproxy, maybe multiple of them on round robin DNS, to shuffle traffic between servers in case of outages of individual ones. This again gets out of self hosting territory though, I would say.
Finally, at the end of the day, you need humans (that probably means yourself) available 24/7 to handle when something inevitably breaks. There have been various products like Heroku that try to encapsulate service applications so they can reliably restart automatically, but stuff still goes wrong.
Every small but growing web site has to face these issues and it’s not that easy for one person. I think the type of people who consider running self-hosted services that way, has already done it at work and gotten woken up by PagerDuty in the middle of the night so they know what it’s about, and are gluttons for punishment.
I don’t attempt anything like this with my own stuff. If it goes down, I sometimes get around to fixing it whenever, but not always. I do try to keep the software stable though. Avoid the latest shiny.
I’ve been thinking about this one. I have everything on one Proxmox machine, and I could potentially have a second machine offsite for backups. If I did that I could go whole hog and just mirror my whole machine offsite for failover. Some kind of Proxmox cluster but with geographic separation.