Updating kernel from Gentoo 3.7.8 to latest version

chuckm · n00b Joined: 11 Nov 2020 Posts: 35

Hi All,

As part of a migration project, I am migrating a Gentoo 3.7.8-pbx+ server to AWS, however, AWS does not like the kernel. In order for me to complete the migration, I need to find a way to update the current kernel, which I believe is a custom kernel.

One of the biggest problems I have is I've only been involved with Linux properly for the past 2-3 years and I still have gaps in my knowledge.

My understanding is that to update the current kernel, I will need to connect to the repository specific for the kernel to pull the data I need, however, this version is no longer supported by the platform either so it is not an option.

How can I update the kernel from the current version to a newer version while keeping this server intact?

All help is much appreciated.

Jaglover · Posted: Wed Nov 11, 2020 11:40 pm Post subject:

Welcome to forums!

Assuming rest of the system is also outdated. A fresh installation is the fastest and best solution in such a case.

If you really want to upgrade only the kernel ... there is a good chance emerge will not work after you sync the portage tree. But you can get the kernel sources from kernel.org and use the old .config with new sources. However, 'make oldconfig' may fail to produce a working configuration since the version jump is too great. You could use the oldest kernel version AWS agrees with, with some trial and error it should be possible to build a working kernel.
_________________
My Gentoo installation notes.
Please learn how to denote units correctly!

NeddySeagoon · Posted: Thu Nov 12, 2020 11:33 am Post subject:

chuckm,

What Jaglover said but you can cut corners with the reinstall.

Follow the Gentoo handbook to install on AWS, as far as rebooting into the new install.
At this point you have a Gentoo that can do little more than build more software.

Copy over /var/lib/portage/world to the AWS install. That's all the things that were explicitly installed on the old system.
Migrate your USE flags to the new install. Some will have gone, others added, its not going to be copy/paste.

Once you are happy, run

chuckm · n00b Joined: 11 Nov 2020 Posts: 35

I have skipped the native migration tool and using DD to copy and move the image of the server to AWS, move it to a secondary disk, take a snapshot and use the snapshot to start up an instance. If this works, I believe my problem then will be drivers with hard drives as this Gentoo version is old.

If this fails, I like this hard way method but before I try that, I am curious to test a theory. I came across a forum that talked about deleting unused kernels. My assumption is you can have multiple kernels installed and you can activate them. If this is correct, could I not just install a more recent kernel and migrate the server that way?

pietinger · Posted: Thu Nov 12, 2020 2:02 pm Post subject:

NeddySeagoon · Posted: Thu Nov 12, 2020 2:39 pm Post subject:

chuckm,

That old install is full of security problems. It should never be on the net.

Your old gcc probably won't build a modern kernel and if it did, the modern kernel and your old glibc probably won't work together.

If you really want to try sliding a new kernel under your old install. build the kernel elsewhere.
Copy its bzImage to the /boot on your old AWS install. Rename it if you want.
Edit /boot/grub/grub.config to add it as a boot option.

On the kernel build host, install the modules to some random location.
This will get you a directory /some/random/location/' uname -r'/.
Copy 'uname -r' to /lib/modules/ an the AWS instance. You must not change its name.

Now you have all the bits in place, reboot and choose the new kernel.
I expect to hear the crash in Scotland. :)

-- edit --

dd is unlikely to just work. The bits of grub that are installed outside of any filesystem will need to be reinstalled. Not rebuilt, just reinstalled for their new home.
/etc/fstab probably points to the the wrong things.
The kernel command line(s) in /boot/grub/grub.conf root entries will likely be incorrect.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.

chuckm · n00b Joined: 11 Nov 2020 Posts: 35

To clarify, the current server is hosted on an ESXi host and I have a clone of the server in question to run my tests.

I have no installation media, no upgrade path or support from the provider and the server has been running for at least for 6 years, without any updates. And that's 6 years I've been at the company.

What I am trying at the moment is to try and get this server working on AWS without much intervention which is why I am attempting the DD method. Otherwise, I agree, the ideal solution is to have a new host with updates version of Gentoo and migrate the application to that. I just wouldn't know where to begin. I assume I need to search for all installed packages and migrate anything relevant. The next problem might be compatibility with pbx and a more recent version of the packages.

My brain hurts...

Hu · Administrator Joined: 06 Mar 2007 Posts: 23100

If you are in a hurry to get this working, you could try booting an AWS VM with one of their standard Linux systems, and running your application in a chroot on that machine. That would use their kernel, which they presumably keep current and correct for their virtual hardware. You do run the risk that Neddy mentioned about glibc versus kernel, but I think that will probably not be an issue. The Linux kernel is very good about backward compatibility, and a new kernel with old userspace is supposed to work. Once you have the system running that way, you could duplicate it to have an AWS-hosted test platform, and update the packages in the chroot at your leisure. This may be more convenient than updating your ESXi guest and repeatedly synchronizing successful results to the AWS VM.

However, I feel obliged to reiterate a point from above: if your system has been unmaintained for this long, it probably has at least some known security exploits due to old unpatched packages, so you should keep strict control over who can access it until you have it updated.

chuckm · n00b Joined: 11 Nov 2020 Posts: 35

OK. Progress. I have imaged the server, copied to AWS S3, added it to a Gentoo instance, dd'ed it onto another disk, took a snapshot of that disk and launched an instance from that. It boots BUT when I check console, I get;

Could not find the root block device in ...
Please specify another value or: press Enter for the same, type "shell" for a shell, or "q" to skill...

I assume that the bootloader is messed up? Is there a way of me adding this disk to another instance and fixing this?

NeddySeagoon · Posted: Fri Nov 13, 2020 9:02 pm Post subject:

chuckm,

The boot loader did it thing and loaded the kernel and initrd.
It passed root= ... to the kernel and the kernel started.

Could not find the root block device in ... should give you a list of devices that he kernel can see.
The empty list reads

Jaglover

chuckm · n00b Joined: 11 Nov 2020 Posts: 35

It was a driver issue but instead of editing GRUB, I changed the instance type to a different CPU architecture and it booted.

In summary, DD the disk to an image file, move to S3, move to instance, DD to another disk and snapshot that disk. Using the correct instance type, in my case T2.Small (It didn't work on T3), I was able to boot. I have some other problems but not relevant plus I can work them out since it's booted now.

Thanks for the help. Much appreciated.

chuckm · n00b Joined: 11 Nov 2020 Posts: 35

It turns out I can't.

The instance boots but now I am getting;

"ERROR: interface eth0 does not exist"
Ensure that you have loaded the correct kernel module for your hardware
ERROR: net.eth0 failed to start
ERROR: connect start netmount as net.eth0 would not start"

Is there a way to download the NIC drivers that AWS supports onto my cloned VM then image it or do I have to recompile the kernel to include supported drivers? If it's latter, can I build a kernel on AWS so I know it's compatible and import that onto my cloned VM and image the VM that way?

I've also read udev could be the reason why eth0 is not found and this might help with that, "touch /etc/udev/rules.d/80-net-name-slot.rules", but I had no luck.

The below link is to compile a kernel for AWS. Instead of going through that, I want to believe there is a way to add the AWS NIC drivers (or module?) to the current kernel?
https://www.artembutusov.com/gentoo-on-aws/

EDIT: Or, maybe more appropriate would be if I can use the current kernel as a baseline to create a new one as this is a custom kernel I am working with. Is this possible?

Jaglover · Posted: Wed Nov 18, 2020 1:09 pm Post subject:

Your device may be renamed, 'ifconfig -a' will tell.
_________________
My Gentoo installation notes.
Please learn how to denote units correctly!

chuckm · n00b Joined: 11 Nov 2020 Posts: 35

Jaglover · Posted: Wed Nov 18, 2020 1:29 pm Post subject:

Well, it does exist, but it has private address which generally is given when there is no working connection. At this point you should be able to activate it by hand. It is weird, though, how comes interface not in use has transferred several megabytes.

_________________
My Gentoo installation notes.
Please learn how to denote units correctly!

chuckm · n00b Joined: 11 Nov 2020 Posts: 35

This is on my cloned VM of the working production server at our datacentre. I've set it static IP, .224, and I am testing a few stuff on this cloned VM which explains the transfers. I have no console access at AWS so if NIC drivers for AWS is not working when I image this VM, I have to come back to this clone, make changes and reimage to AWS.

Although this is a custom kernel, Gentoo 3.7.8-pbx+, a kernel is just a layer between application and hardware so as long as I have a new working kernel with AWS supported hardware, it should interfere with my application right?

chuckm · n00b Joined: 11 Nov 2020 Posts: 35

For instance, on AWS, I have a working Gentoo instance with when I check kernel list "linux-4.19.86-gentoo *". Can I download the files for this kernel, add it to my clone and select this as the active kernel?

Jaglover · Posted: Wed Nov 18, 2020 1:51 pm Post subject:

I've no experience with AWS, all I can tell you is if there is eth0 then the kernel driver is loaded and your problem lies elsewhere. You can see the driver loaded with 'lspci -k'. What is happening to your NIC is all in dmesg, maybe the net.eth0 runs too early, can't tell from here.
_________________
My Gentoo installation notes.
Please learn how to denote units correctly!

chuckm · n00b Joined: 11 Nov 2020 Posts: 35

I don't think you understood me. My bad for the bad explanation.

- Current "Cloned VM" is on an ESXi host. Everything works fine.
- I use DD to image this VM to a "disk.img" file and using this file, I create an AWS instance.
- When the instance starts, the drivers that used to work on the ESXi host are not found for the network interface.

So, although eth0 is fine on the ESXi host, once imaged and moved to AWS, the same drivers are not what AWS supports which is why I get the error I get (eth0 does not exist. Ensure that you have loaded the correct kernel module for your hardware"

What I want to do now is to get a working kernel from an instance from AWS and move that to my VM on the ESXi host, reimage and move to AWS. I assume that would solve the NIC driver issue. The problem is, I am not sure how to go about doing that so I might try and build a kernel on a separate test VM just to understand the process and that might explain how to migrate a working kernel from one server to another.

Jaglover · Posted: Wed Nov 18, 2020 3:05 pm Post subject:

Drivers must be compiled for running kernel and preferably by using the same toolchain which was used to compile the kernel. To see what driver is needed take the PCI ID from 'lspci -nn' output and find the correct driver at cateee.net, it is kernel driver database. Then reconfigure, rebuild and reinstall the kernel. If configured as a module you may get away easier, even reboot won't be needed. But you have to work with sources for kernel you are using. I recite, if the device node eth0 appears then the proper driver for it has been loaded. No driver - not eth0. In case your NIC is virtual you need to build a driver for this emulated hardware, not for real hardware. That's all I can tell you. You are correct, I do not understand what you are doing there.
_________________
My Gentoo installation notes.
Please learn how to denote units correctly!

chuckm · n00b Joined: 11 Nov 2020 Posts: 35

Jaglover · Posted: Wed Nov 18, 2020 3:32 pm Post subject:

Normally you should just emerge it 'emerge ena-driver', emerge will build it against your kernel sources and install it properly. All you have to do is to load it by hand to start using it right away. Then add it to your system configuration to make it auto-load at boot. Since your system is so old I do not know what exactly you need to do for auto-load.
The main problem here for you is - does emerge work? If emerge does not work then you need to get the driver sources from Amazon and build it by hand, I'm sure they have instructions for it.
_________________
My Gentoo installation notes.
Please learn how to denote units correctly!

Hu · Administrator Joined: 06 Mar 2007 Posts: 23100

As I read drivers/net/ethernet/amazon/Kconfig, the upstream Linux kernel has support for Amazon's Elastic Network Adapter, at least in the 5.9.x version that I checked. There is no need to download sources separately. Just enable ENA_ETHERNET in menuconfig, rebuild, and go.

Even so, I think you ought to do like I suggested above: start with an AWS instance and upload to it a chroot containing your legacy Gentoo system. This will get you a fully functioning AWS instance that knows how to use all the features that Amazon provides. Even if you are determined to use your custom kernel, I think you could iterate much faster if you used an AWS instance like we would normally use a LiveCD:

Create a helper AWS instance, call it aws-01.
Name the to-be-production AWS instance aws-p.
Name your local ESXi clone vmw-01.
Grant vmw-01 the ability to ssh to aws-01 as root.
Attach to aws-01 the root drive for aws-p.
Make a change in vmw-01.
Rsync your changes to aws-01, thus updating the root drive of aws-p. Since aws-01 is built off Amazon Linux, you can get it to boot and use the network easily.
Sync the changes to the drive, and unmount the filesystem.
Detach the drive from aws-01.
Attach the drive to aws-p.
Try to boot. If it fails, halt the instance, move the drive back to aws-01, and go to step "Make a change".

This avoids the need to dd the entire drive twice every time you need to make a change.

chuckm · n00b Joined: 11 Nov 2020 Posts: 35

hu, I don't need to use this old kernel as long as my application works. As I mentioned above, my downfall is my Linux experience so I am having to learn much of it as I go along so chroot is a foreign term for me. However, I did some research and I understand how it works, I think. With that said, I have the following set up on my current VM;