POSIX_ME_HARDER n00b


Joined: 30 Aug 2015 Posts: 27
|
Posted: Mon Jun 15, 2020 5:40 am Post subject: [SOLVED] Prefix on SailfishOS: library linking failing. |
|
|
Edit: If you're trying to do the same thing, I've written a guide with all my workarounds here.
I strongly recommend ignoring the workaround found in this topic. The root of the issue is that the wrong profile was chosen during the install process. I was using "prefix/linux/arm/" instead of "default/linux/arm/17.0/armv7a/prefix/kernel-3.2+/". If you've done the same mistake as me, start over with the right profile, it's worth
not having to fight the linker at every corner.
_______________________________________________________
Hi,
I've been trying to install Gentoo Prefix on a phone running SailfishOS.
There have been quite a few hurdles, but I've been able to bypass them (usually through dirty hacks).
The last one I face confuses me though. Indeed, from what I understand, bootstrap-prefix.sh creates some kind of minimal system in ${EPREFIX}/tmp, which is then used to merge the "real" system in ${EPREFIX}. The minimal system in ${EPREFIX}/tmp appears to be working just fine, but the "real" one in ${EPREFIX} has issues with linked shared libraries which end up preventing its programs from working.
I can reach up to stage 3, but at some point Bash is merged, and the version in ${EPREFIX}/bin is unable to find libtinfo.so.6, despite that file being in ${EPREFIX}/lib. The Bash in ${EPREFIX}/tmp/bin still works fine, so I can hide that particular symptom and keep the script running, but issues then arise in other programs, where they are similarly unable to find libraries that are there.
I thought about just setting the LD_LIBRARY_PATH environment variable, but that does not appear to solve the issue:
Code: | $ LD_LIBRARY_PATH="${EPREFIX}/lib" bash
Inconsistency detected by ld.so: dl-lookup.c: 799: _dl_lookup_symbol_x: Assertion `version == NULL || (flags & ~(DL_LOOKUP_ADD_DEPENDENCY | DL_LOOKUP_GSCOPE_LOCK)) == 0' failed! |
My current install is stuck near the end of stage 3, where it tries to merge sys-apps/util-linux among some dependencies for gettext and portage, but fails to find libreadline during the configure process and crashes.
To try and fix this, I've sync'd the portage tree in the ${EPREFIX} system, then created a link in the ${EPREFIX}/tmp system, allowing me to use portage (from the ${EPREFIX}/tmp system) the usual way outside of the bootstrap script (it otherwise refuses to acknowledge the existence of any package, even the ones it sees as installed). I thought simply merging ncurses and Bash again would fix the issue, but it does not. I do get additional information, though:
after merging either package, Portage warns of "Unresolved soname dependencies", with ld-linux-armhf.so.3 and libc.so.6 showing up as missing for both packages. I assume this means there is an issue with glibc in the ${EPREFIX} system, but I do not understand what would cause it to not work in the ${EPREFIX} system when appears it works fine in the ${EPREFIX}/tmp system. I tried just going for "emerge -1 --nodeps glibc", but it complains about no compatible version of Python being available. I'd appreciate any help I can get.
Sailfish uses an aarch64 kernel, but an arm userland. I couldn't get an aarch64 toolchain to work properly, so I just went with a 32bit arm Prefix install. CHOST is set to armv7l-hardfloat-linux-gnueabi, CFLAGS are set to "-march=armv8-a -mtune=cortex-a73.cortex-a53" (it's a Qualcomm Snapdragon 835 MSM8998).
I mentioned a few hurdles, here are my workarounds:
Stage 1:
- wget needs libpsl, I added it to the script as something to install. Problem is, wget can't find it easily, so I made a 'wget' executable that set the LD_LIBRARY_PATH to the right location when calling the real wget.
- tar fails to merge because of a _GL_WARN_ON_USE for gets. I just go in there after it crashes, comment out
the line in ${EPREFIX}/var/tmp/tar-1.26/tar-1.26/ and make it compile and install.
- Bash won't compile with the readline option disabled, so I remove the option that disables it in the bootstrap script.
- It can't find a profile. so I ln -s ${EPREFIX}/var/db/repos/gentoo/profiles/prefix/linux/arm/ ${EPREFIX}/etc/portage/make.profile
Stage 2:
- ${EPREFIX}/tmp/usr/local/bin/{gcc,g++} are a mess, with duplicate lines and they export PATH as being "". I remove the duplicates and set the PATH to be the same as I was using during stage 1.
- Just in case, I set the CHOST in both ${EPREFIX}/etc/portage/make.conf and ${EPREFIX}/tmp/etc/portage/make.conf
- sys-apps/baselayout-prefix are all masked, so I unmask the most recent one.
Stage 3:
- All linux headers are masked, despite them being attempted for a merge. I unmask sys-kernel/linux-headers-4.4, which is the same version number as the running kernel. I'm pretty unsure of that move, but it seems to let things go forward.
- Perl... is a mess. Perl thinks that any Linux system in which /system/lib/libandroid.so exists is automatically Android. It also tries to compile with incompatible options: userelocatableinc and useshrplib. The latter is part of the ebuild, so I let it stay, the former gets removed using a package environment that sets EXTRA_ECONF to
-Dhintfile='linux' -Dosname='linux' (so that it can't say I'm using Android) and -Duserelocatableinc='false'. I've not actually reached the point where it tries to merge Perl in this current attempt, this was from another attempt that exported LD_LIBRARY_PATH to include both ${EPREFIX}/lib and ${EPREFIX}/tmp/lib. Which is probably not a good idea, and ends up having the same issue anyway, since the next package was unable to find perl5.so.30.
######## Progress Report 0
This lets me use the ${EPREFIX}'s system's Bash:
Code: | $ ${EPREFIX}/lib/ld-linux-armhf.so.3 ${EPREFIX}/bin/bash |
I can't do much with that, but that's technically progress.
Also, "$ ${EPREFIX}/tmp/usr/bin/ldd ${EPREFIX}/bin/bash" indicates that all libraries are found (and from he right lib folder)
I don't know why it uses tmp/usr/bin's ldd and not usr/bin's, since the latter is first in the $PATH list, but both agree on the result.
Sailfish's /usr/bin/ldd disagrees and tries to link everything to /lib/ and /usr/lib. Since it's the only ldd to indicate that libtinfo.so.6 is missing, I believe the issue I am facing is that /usr/bin/ldd is being used instead of Gentoo Prefix's during some operations.
######## Progress Report 1
I'm starting over, once again. This time, I've set the CHOST to "armv7hl-hardfloat-linux-gnueabi" (armv7l is now armv7hl, the former was from the known to work values indicated on the Gentoo wiki, armv7hl is the one actually being used by SailfishOS). With the exception of PATH, I've set all the other env variables listed as likely to cause issues (CFLAGS, CFFLAGS, LD_LIBRARY_PATH, ...) to "". And, last change: it turns out the script isn't attempting to employ the PATH from stage1 in g++, gcc (see the first item in the Stage 2 workarounds): it wants the PATH usually used by the host system. This might not fix anything, since the issue seems to be with the ${EPREFIX} system, the g{cc,++} scripts are replaced during stage 2 (IIRC), and the ${EPREFIX} system is built during stage 3.
I'm also wondering if that first fix on the first stage (wget can't find libpsl) isn't related to the issue. It is an issue of shared library linking, so it might be. That wouldn't be reassuring since that'd mean the issue is likely to be coming from Sailfish, and that's not so easy to fix.
######## Progress Report 2
gmp won't compile, complains about some instruction not being available for this processor. Setting CFLAGS back to "-march=armv8-a -mtune=cortex-a73.cortex-a53" in ${EPREFIX}/etc/portage/make.conf lets it compile again, but that means I can't really test going all the way to my previous issue without CFLAGS being set. It also means
some package have already been merged without these flags, and adding them now might cause other issues, I assume.
######## Progress Report 3
That didn't work. Bash is still unable to find libtinfo.so.6. This blocks stage 3 as soon as Bash is merged.
I'm out of ideas for now...
Learning more about how shared libraries work though, so there's that.
Looking at the man page for ld.so, I found the LD_DEBUG env variable, that's going to help.
Code: | [nemo@Pro1 gentoo]$ export LD_DEBUG="all"
[nemo@Pro1 gentoo]$ /gentoo/bin/bash
13687:
13687: WARNING: Unsupported flag value(s) of 0x8000000 in DT_FLAGS_1.
13687:
13687: file=libreadline.so.8 [0]; needed by /gentoo/bin/bash [0]
13687: find library=libreadline.so.8 [0]; searching
13687: search cache=/etc/ld.so.cache
13687: trying file=/usr/lib/libreadline.so.8
13687:
13687: file=libreadline.so.8 [0]; generating link map
13687: dynamic: 0xf5305ef8 base: 0xf52c9000 size: 0x000417b8
13687: entry: 0xf52d7d38 phdr: 0xf52c9034 phnum: 7
13687:
13687:
13687: file=libhistory.so.8 [0]; needed by /gentoo/bin/bash [0]
13687: find library=libhistory.so.8 [0]; searching
13687: search cache=/etc/ld.so.cache
13687: trying file=/usr/lib/libhistory.so.8
13687:
13687: file=libhistory.so.8 [0]; generating link map
13687: dynamic: 0xf52c7f00 base: 0xf52b2000 size: 0x00016218
13687: entry: 0xf52b3f44 phdr: 0xf52b2034 phnum: 7
13687:
13687:
13687: file=libtinfo.so.6 [0]; needed by /gentoo/bin/bash [0]
13687: find library=libtinfo.so.6 [0]; searching
13687: search cache=/etc/ld.so.cache
13687: search path=/lib/tls/v8l/neon/vfp:/lib/tls/v8l/neon:/lib/tls/v8l/vfp:/lib/tls/v8l:/lib/tls/neon/vfp:/lib/tls/neon:/lib/tls/vfp:/lib/tls:/lib/v8l/neon/vfp:/lib/v8l/neon:/lib/v8l/vfp:/lib/v8l:/lib/neon/vfp:/lib/neon:/lib/vfp:/lib:/usr/lib/tls/v8l/neon/vfp:/usr/lib/tls/v8l/neon:/usr/lib/tls/v8l/vfp:/usr/lib/tls/v8l:/usr/lib/tls/neon/vfp:/usr/lib/tls/neon:/usr/lib/tls/vfp:/usr/lib/tls:/usr/lib/v8l/neon/vfp:/usr/lib/v8l/neon:/usr/lib/v8l/vfp:/usr/lib/v8l:/usr/lib/neon/vfp:/usr/lib/neon:/usr/lib/vfp:/usr/lib (system search path)
13687: trying file=/lib/tls/v8l/neon/vfp/libtinfo.so.6
13687: trying file=/lib/tls/v8l/neon/libtinfo.so.6
13687: trying file=/lib/tls/v8l/vfp/libtinfo.so.6
13687: trying file=/lib/tls/v8l/libtinfo.so.6
13687: trying file=/lib/tls/neon/vfp/libtinfo.so.6
13687: trying file=/lib/tls/neon/libtinfo.so.6
13687: trying file=/lib/tls/vfp/libtinfo.so.6
13687: trying file=/lib/tls/libtinfo.so.6
13687: trying file=/lib/v8l/neon/vfp/libtinfo.so.6
13687: trying file=/lib/v8l/neon/libtinfo.so.6
13687: trying file=/lib/v8l/vfp/libtinfo.so.6
13687: trying file=/lib/v8l/libtinfo.so.6
13687: trying file=/lib/neon/vfp/libtinfo.so.6
13687: trying file=/lib/neon/libtinfo.so.6
13687: trying file=/lib/vfp/libtinfo.so.6
13687: trying file=/lib/libtinfo.so.6
13687: trying file=/usr/lib/tls/v8l/neon/vfp/libtinfo.so.6
13687: trying file=/usr/lib/tls/v8l/neon/libtinfo.so.6
13687: trying file=/usr/lib/tls/v8l/vfp/libtinfo.so.6
13687: trying file=/usr/lib/tls/v8l/libtinfo.so.6
13687: trying file=/usr/lib/tls/neon/vfp/libtinfo.so.6
13687: trying file=/usr/lib/tls/neon/libtinfo.so.6
13687: trying file=/usr/lib/tls/vfp/libtinfo.so.6
13687: trying file=/usr/lib/tls/libtinfo.so.6
13687: trying file=/usr/lib/v8l/neon/vfp/libtinfo.so.6
13687: trying file=/usr/lib/v8l/neon/libtinfo.so.6
13687: trying file=/usr/lib/v8l/vfp/libtinfo.so.6
13687: trying file=/usr/lib/v8l/libtinfo.so.6
13687: trying file=/usr/lib/neon/vfp/libtinfo.so.6
13687: trying file=/usr/lib/neon/libtinfo.so.6
13687: trying file=/usr/lib/vfp/libtinfo.so.6
13687: trying file=/usr/lib/libtinfo.so.6
13687:
/gentoo/bin/bash: error while loading shared libraries: libtinfo.so.6: cannot open shared object file: No such file or directory
|
So yeah, it's not even trying for the Prefix's library folders.
If I set the LD_LIBRARY_PATH variable to the folders indicated in ${EPREFIX}/etc/ld.so.conf and ${EPREFIX}/etc/ld.so.conf.d/*, it does find the libraries, and bash simply segfaults.
I notice in the logs that it still uses /lib/ld-linux-armhf.so.3 and not the ${EPREFIX} version:
Code: |
[...]
13863: binding file /lib/ld-linux-armhf.so.3 [0] to /gentoo/lib/libc.so.6 [0]: normal symbol `free' [GLIBC_2.4]
[...]
|
I'd really like to be able to specify ${EPREFIX}/lib/ld-linux-armhf.so.3 as the one to use by default when using the prefix's programs.
Looks like this is a compilation option for GCC. Once I've got some free time to give it another go, I'll try adding "--dynamic-linker=/gentoo/lib/ld-linux-armhf.so.3" to the CFLAGS just before stage 3, and "ln -s /lib/ld-linux-armhf.so.3 /gentoo/lib/ld-linux-armhf.so.3" so that it still works before portage merges whichever package actually provide ld.so (glibc, I assume).
######## Progress Report 4
Just tried merging bash with "-Wl,--dynamic-linker=/gentoo/lib/ld-linux-armhf.so.3" added to the LDFLAGS, it works. I'll leave the rest of stage3 do its thing as long as it can, we'll see if that fixed the issue (I doubt it).
... and it failed with some dependecy unable to find libreadline during configure. I'll "emerge -1 --nodeps @system", just in case, but I don't have much hope at that point.
... nope, it fails too. Again, something about being unable to find a shared library.
Well, I've ran out of ideas for now...
######## Progress Report 5
Adding "-Wl,-rpath=/gentoo/lib" to LDFLAGS seems to make that one problem go. From what I understand, this will hard-code in the binary that it has to go in to folder for shared libraries. I'll see how far this lets me go, and if this has unexpected side effects. I imagine Prefix does (or at least is meant to) set up something like this anyway, right?
######## Progress Report 6
Got farther than ever. The script tried to merge virtual/libc, which was blocked by glibc.
"$ emerge -1 --nodeps -n virtual/libc glibc" resolved that.
baselayout refused to merge due to file collisions. I just deleted those files and let it install. It's not like I'd call this install clean by any mean at this point.
Absolutely no chances that pilling up hacky fixes everywhere is going to end up with that install blowing up in my face.
######## Progress Report 7
Yup, no surprises there. Got to the end of stage 3 after way too many other hacky interventions needed, ended up with a a system in which every single package appears to be masked (missing keyword), so I assume portage somehow pulled the tree for the wrong architecture or something. No clue how to fix this.
On the plus side, I moved ${EPREFIX}/tmp to another dir and things didn't explode, so the install looks like it's extremely unstable, can't actually use Portage (and I'm pretty sure I just merged the most up-to-date unstable version of Portage from another arch), but works, sorta, I guess.
######## Progress Report 8
I'm starting over, this time I won't use portage outside of the script. I believe the scripts configures it in a way I haven't bothered to before using it to modify the ${EPREFIX} system. That and my rather barbaric approach at getting (${EPREFIX}/tmp's) Portage to recognize the existence of any package so that I could use it are likely the causes of my new troubles.
This time I keep the CFLAGS on all the way, and I've added the LDFLAGS to ${EPREFIX}/etc/portage/make.conf before stage 2. I'll create a symlink for the ld.so if it tries using it before installing glibc. I'm just hoping the script doesn't install stuff on the ${EPREFIX} system using the ${EPREFIX}/tmp/etc/portage/make.conf file, that'd make things much more complicated.
Still waiting for a guru to tell me I could have run one or two commands before starting the script and made all of these issues go away.
######## Progress Report 9
Applying the aforementioned workarounds lets me rather smoothly go up to the "emerge -u system" step of the script, which fails because glibc blocks virtual/libc.
I'd just fix that by running
Code: | $ emerge -1 --nodeps -n --ask virtual/libc glibc |
since I think it would simply merge virtual/libc and leave glibc alone, but Portage can't find virtual/libc. That's likely because the script uses some variables to make Portage behave correctly, and I've not picked it apart to find what those are, so using it myself tends not to give good results.
I'll try to fix that once I've got some free time.
Oh, gave a shot at just inserting the command inside the script. That seems to work. Now I've got the file collisions preventing baselayout from installing. Just deleting said files (I don't exactly do subtle workarounds, I you haven't noticed) fixes that particular issue. It's gone back to installing stuff for the 'emerge -u system' step. I expect the next issues will arise during the 'emerge --depclean' step.
######## Progress Report 10
Ok, well, emerge --depclean fails, but the scripts says it's alright, and tells me I've successfully finished stage 3.
Now I'm back to a system in which nearly every package is masked for some reason. I don't know how to set it so that it doesn't mask stable packages for my arch (looks like it currently masks those as well). It also looks like it doesn't use the right repository, since it can't, for example, find the ebuild for mednafen despite it being in
var/db/repos/gentoo.
I'll try to figure it out from the script, since it clearly was able to merge everything until that point. But if someone knows and can tell me, I'd greatly appreciate it.
######## Progress Report 11
export PORTDIR="${EPREFIX}/var/db/repos/gentoo" lets portage find the packages. I tried adding it to make.conf, but that doesn't work, so I'll have to use the env var.
I should probably have looked at mednafen's package first, it's not available for arm. I was trying to figure out why it kept telling me it had a missing keyword.
So it works, I think. We'll see.. |
|