Last week I have a little problem.
I have a dedicated server running on Hetzner. They called the service Hetzner Robot. I’m running k8s cluster in their other service called Hetzner Cloud. My boss asked me to migrate old legacy server in Hetzner Robot into Hetzner Cloud on top of our k8s cluster. The old server itself is pretty powerful, but no one manages it. It was down. I checked it and see that the server is running out of space.
After a closer look (ssh’ing into the machine), it seems our docker volume backup is filling up spaces. So what I did to resolve the issue was to delete old backups to clear out space. After that, I reboot the server.
But oops, the server won’t restart…
I have no idea what is wrong. Or even how to access the machine because I can’t ssh. If I were using Hetzner Cloud, they have a virtual console for me to see. In Hetzner Robot, I have 3 options: Force power off, Automatic Restart, or Ask the technician to restart the server for you. The last option seems bothersome. I don’t want to bother people if I don’t have to. So I tried the first two options. But still, the server was not up.
Ok, before we are going deep down into the details. Let’s start with basic introduction
Hetzner is a company based in Germany. They provide bussiness like web hosting, VPS hosting, or cloud instance. Our company, Kartoza, uses Hetzner because they run 100% on green energy and very cheap at the same time.
In the past, we don’t have that many client. We hosted our own services in a dedicated server. We store our backups and data in Hetzner’s storage boxes. Nowadays we have several clients and recommended Hetzner as the hosting platform. We typically uses Hetzner Cloud because it’s easier to manage. However, our old services run in Hetzner Robot, somewhat their legacy products.
You can judge yourself. But I think it’s pretty good and fast response. Depending on your trouble, by the way.
I didn’t contact Hetzner support, but they have some kind of indicator that tells you if the server is running or not. So, when I click the button: Order a manual hardware reset, I imagine some technician walk into my server to push a physical button to restart my server. Then at the same time, they will know that my server can’t reboot.
So I wait and wait… after maybe less than an hour (how stupid of me to not track it). The indicator goes green, finally. However, I can’t ssh. So, I imagine something is not right, still.
Since I am working from UTC+7, and it’s already late in my timezone. I send a message to my boss to forward all Hetzner’s emails sent to his email address. There is no option in Hetzner Robot to share notification or alert to a different email address other than the owner of the account. I’m using my boss’s account to access the legacy server. So, notification will go to his account.
I know you are going to tell me how messed up for me to use my boss account. But hey, we are a small company. My boss usually manage these server by himself but he is trying to retire to focus more on the bussiness side. So he let me take it over with the infra. In Hetzner Cloud, I never have to do this, because I can create Hetzner project then assign my own work account to that project. After that, I’m working with the infra using my work account, not my boss’s account.
In the next morning, I read the messages forwarded to me, by my boss, from Hetzner Support.
The technician told me that my server goes into a kernel panic and won’t reboot. So, they attached a rescue system instead and send me the credentials (the root password).
I used that credentials and logged in.
Hetzner also provides documentation on working in a “rescue mode”. I’m happy that it is simple. It might depends on which server you purchased from Hetzner. But basically, I want to mount my server’s disk to check things.
First thing that I do is check
chroot-prepare /mnt chroot /mnt
device is busy
I ignore the warning and continue with
I assume you understand what
I checked my filesystem. Everything seems fine. Size is good. I leave some empty spaces. So, I check
Anyway, I still check
systemctl disable docker
I proceed to delete the systemd symlink in
Then I exit chroot, reboot, and wait for the server to be up.
It still doesn’t boot. AAAAAARGH
My server won’t reboot and it will be useless to Order hardware restart again and ask my boss to forward the email. I am in UTC+7 zone and usually worked from home since 7:00 AM. My boss is in UTC+2, so 2:00 AM. He will be asleep around this time. I don’t want to wait more hours just to reboot the server and get my credentials.
Fortunately, since the previous email mentions about Rescue System, I tried to activate it manually. So it seems I was able do that and received my credentials right from the website without contacting support.
I get back into the server, rescue mode, checking back the logs on mounted
There’s no other way, except asking the technician for more info about the logs in the screen of the server when it reboots.
I can create a support ticket request, but that would mean the reply will be sent to my boss’s email address. So I scrambled around the Account Settings. I found that I can add additional email address in the Hetzner notification. So I added mine (I know, it’s not supposed to be like this). This will make sure I will get the reply. Now how to send the notice?
I should be able to send the support ticket now, but I decided to do a little experiment. I got the previous forwarded mail from my boss. In the subject of the email, there was a ticket number and the server number. What if, I replied to that email, using the same subject, with a personal email address. Do they replied back?
So I send the question asking about how to retrieve the kernel logs. To my surprise, they replied back in under 8 minutes!
This is great and scary at the same time. Human response in 8 minutes in an email is considered fast. But I wonder if they did some validations beforehand. Mind you that I was trying to get access to a server using my personal mail account! My email was not registered elsewhere. If they replied, then the only token credentials is the ticket number. Keep your ticket number safe!
Anyway, let’s skip the conversations back and forth explaining the problem to the technician. They offered a KVM console, attached for up to 3 hours. I used that. The domain and HTTPS certificate were mismatched, strangely. But I used the website anyway. Fill in my credentials and get into a HTML5 version of the KVM console.
The error message in the screen shows that it indeed a boot problem. So the server doesn’t even reach the kernel. It says Kernel Panic: Unable to mount rootfs on an unknown block. Kind of like this one.
I’m really surprised that this is the problem. This machine has been running since Ubuntu 16.04, and now we have Ubuntu 20.04. It’s more than 4 years since, so why a simple reboot now made a kernel panic? Are there any hardware changes while the machine itself is running? I’m not familiar with how people running server. Is it even possible to hotswap the entire component while the process is running? I’m not sure.
The solution is straightforward. Go into rescue mode again. chroot to my
It also offerred me to delete old kernels, yes please delete them. After that,
It still doesn’t boot!!! Ugh Why!
Looking at KVM console again, the same kernel panic. Also, it doesn’t recognize my new kernel version, because it shows the wrong linux version. What???
After thinking it out a bit, I‘m guessing that the server uses a separate
apt install --reinstall linux-image-<bla bla the version>-generic
Oops, I got disk space insufficient warning.
So, probably. Someone does a dist-upgrade in the past, but the boot partition is not cleaned up, so the initramfs is faulty. Or probably there were some auto mainentance going on, I don’t know.
Anyway, I copy the whole
apt install —-reinstall linux-image-<version>-generic update-initramfs -u update-grub update-grub2
If you are wondering how I got the linux version, check from
dpkg --list | grep linux-image
After unmounting and rebooting, my problem is solved.
My own personal review. For the Hetzner Robot product, unfortunately for me, the management interface is not that good. There should be some mechanism for my boss to safely allow grunt workers like me to access the management interface. I know there’s an admin panel console, but I was talking about how to easily contact Support Ticket with my own work email address, not my bossess. Or maybe forwarding the support ticket is the canonical way, perhaps?
That was all about the Hetzner Robot interface.
For the technical support, I consider it very fast response. However I didn’t manage to try contact the support in my morning, which is their midnight. If they even respond very fast around that hour, that is insane support, I must say. For these kind of services, I expect 24/7 availability from their technician.
They are also quick to provide solution. For example, instead of directing me to a KVM menu in the Hetzner Robot interface, they just directly setup KVM access and pass on the credentials to me so I can quickly try it. I’m just a little bit complaining because the time limit is around 3 hours, and the credentials are given around 16:00 PM in my timezone, around the time I want to be off from work. Hahaha.
Well, fortunately I was able to try the console quickly and solve the issues around that time. So all is well. To be fair, this is excellent responses from them. They probably didn’t expect the user to be in Indonesia, not Germany.
Overall, excellent support from Hetzner. I give 4/5 rating for their support.