Just a service, please

It starts

A friend recommends it, you see a post or a video on the internet. Or worse: you suddenly have the need for something, and you start researching. At the end of the research the path is clear: you’ll self-host that service. You don’t know it yet, but that’s the first in a long series of mistakes.

It is really simple: simple-service

There you have it:

  • The service;
  • The Internet;
  • You, wherever you may be.

Now it is really the last chance you have to step back and rethink all this: from here onward the sunk cost fallacy will eat you alive at every minute of downtime.

Not just a service, after all

Now, unless you want your service to be reachable only in your (physical, let’s skip VPNs and such for now) LAN, you actually need to think about how something deployed in your humble home-server should be accessed from outside. Things to keep in mind:

  • Browsers really don’t like when you are using plain HTTP1;
  • While you can get a certificate for an IP address, having a domain is usually a bit more practical;
  • Thinking in advance how much exposure you are willing to tolerate2 for your service helps in the long run;
  • While many services come with an integrated web server, having a reverse proxy in front of it can be a good idea;
  • CGNAT will make you cry;
  • Don’t expose anything that can be used without authentication, unless you want to end up in Shodan.

Things got a bit more complicated already. TLS certificates, domains, firewalls, fail2ban, a reverse proxy: just a few of the many pieces you need to be aware of if you want your service to run properly. Don’t get me wrong: even if you stop at having just the service exposed on the right interface (and maybe some shenanigans with tunnelling/VPNs if you are behind a CGNAT) it will work more or less. You’ll encounter several hiccups down the way: it’s your choice if you are willing to deal with them.

The real challenge, at least for me, someone that struggles to function executively, is that it gets overwhelming real soon once you take in consideration every bit and piece that should be in place. It gets also annoying when the gnome that lives in the back of your head keeps saying “it isn’t perfect, what are you even doing”, but that’s another story. From the top of my head, what I consider a good setup includes:

  1. A reverse proxy in front of the service(s);
  2. A domain to have human-readable endpoints for your services;
  3. A way to have deployments that are reproducible (Docker, Nix, etc.);
  4. A minimal amount of ports exposed externally, preferably only 80 and 443 for the reverse proxy;
  5. A way to collect metrics from the machines you use (e.g. Node exporter to Prometheus/VictoriaMetrics);
  6. A way to collect logs from the services you deploy (e.g. Rsyslog to Grafana Loki/VictoriaLogs);
  7. A way to make dashboards from the metrics and logs mentioned above (e.g. Grafana);
  8. A way to backup anything that you will need when something breaks down.

Again: if you are like me, every point has several subpoints to consider that baloon the mental energy required to address it. For example: how do I ship metrics/logs from Machine A to Machine B, if they are not in a LAN3? Plain HTTP is a horrible idea, as now potentially everyone could snoop on your logs and gain information on your deployments: for example, with default configurations you can sometime find tokens in the logs of reverse proxies. That’s a very big no-no and considered a major oopsie in my rulebook.

The Behemoth is born

Now the diagram above looks more like this (minus minor logical errors here and there): the-behemoth

The fun part? You can have an entire Behemoth on a SBC like the Rock644 with 2 or 4 GB of RAM, and it will be smoother than you would expect. There will be compromises, as there always are: you won’t have sub-100ms latency on most requests, storage is limited to an eMMC (or a µSD if you are brave enough) and what you can connect to the USB 3.0 port, you won’t get 1gbps on the NIC, etc. But it is satisfying!

The goal of the next posts will be dual: I’ll illustrate how I would set up this from scratch on a SBC (the Rock64 sitting on my desk that I mentioned above, and maybe something else I have around too) and then how I do it now that I moved away from a single-machine deployment. I used “moved away”, rather than a different word like “evolved” or “upgraded”, because it’s mostly a question of preference and having fun; of course there will be a limit on how much you can deploy on a single machine, but it’s higher than you would initially predict. The core of the setup will be based on ARM64 and Void Linux5, so minor/major changes may be needed if you are on different platforms. I’ll try to point them out if I remember/notice them.

You may find that some things are done in a weird or unorthodox way and that’s perfectly understandable: find the way that works for you and it is comfortable to maintain. Computers are binary, we are not: what makes sense to me may make sense to you only 80% of the time even if it works flawlessly, and finding that missing 20% can be a really fun adventure. Fun as in “spending 2 hours to troubleshoot what should have worked out-of-the-box” kind of fun. Did you forget to chmod +x that script? You did, didn’t you.

Build your own Behemoth

Those links do not exist yet. I don’t have an exact schedule for posts, as they are written in the following fashion: I take some hours in the evening to write, if I’m not too mentally exhausted from my day job. Nevertheless, I hope they won’t take too long to come out; they are fully in my head regardless already.

  1. Setting up the SBC
  2. A Service to Rule Them All
  3. Getting Exposed
  4. I Want Metrics, I Want Logs
  5. Show Me Those Bar Graphs
  6. Help, Bots are Everywhere
  7. Backups and Other Mysterious Creatures

  1. Even if your site is simple HTML, read-only, without login forms or ways to transmit sensitive data. ↩︎

  2. Bots will scrape your site. Polite bots will read your robots.txt and respect it; other bots will just pipe it to /dev/null. You’ll see both people and bots checking random paths in your IP/domain looking for an exploit, an entrance. You may decide to use third-party services to avoid exposing an IP that can be traced back to you (the third-party will still be able to do it tho). It’s a game of compromises. ↩︎

  3. Some may argue that having plain HTTP in a LAN is bad practice as well; I argue that if you have someone inside your LAN the last thing you should worry about is how you ship your logs. ↩︎

  4. Not an endorsement or a sponsorship: it’s simply what I used back in 2017 to deploy my first long-lived services, and still sits on my desk at this time. ↩︎

  5. The main difference is that it doesn’t have systemd. That’s the reason why I started using it, but I ended up sticking with it for many reasons. ↩︎