Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Partition can't be mounted at boot but can be mounted by....
View unanswered posts
View posts from last 24 hours

Goto page Previous  1, 2  
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
pingtoo
Veteran
Veteran


Joined: 10 Sep 2021
Posts: 1217
Location: Richmond Hill, Canada

PostPosted: Fri Oct 11, 2024 11:29 pm    Post subject: Reply with quote

Bluey_the_dog,

Now it become the question on why dn_numag is 128.

I suspect it is the partition size.

the numag stand for number of allocation groups.

This may be a limitation of the jfsutils code. but unfortunately it is no longer being maintained. so unless you want to dig in the source code and patch your self, you may be out of luck.

may be you can try to resize the partition to say 500GB (or bit less) and try jfs_debugfs to see if it still report the dn_numag = 128.
Back to top
View user's profile Send private message
Bluey_the_dog
n00b
n00b


Joined: 13 Oct 2002
Posts: 69
Location: Perth, Australia

PostPosted: Sat Oct 12, 2024 12:07 am    Post subject: Reply with quote

Pingtoo,
To clarify, that statement came from the first two "tests", the kernels that worked. The partition would mount to /home upon boot so to test if the partition would also mount manually, "mount /dev/.......", I would just try and mount it to some other point, actually /mnt/ddd. I just did the same thing for all of the tests, for the second two, the ones that failed, I just did the same thing, try and mount to /mnt/ddd. Nothing special, no tasks taking a hold of things etc.

The thing that confuses me is that I am currently sitting at a machine that is running a standard gentoo-sources, 6.2.1 and as time has passed, plenty of previous kernels, and jfs has JUST WORKED. Why did someone decide that they wanted to come into the code and play merry hell with a big stick. They obviously didn't test the code as a partition of 531.5G is nothing out of the ordinary these days. Probably some deadhead trying to justify his/her existence and in doing so has caused, at least to me, a lot of lost time. Did the person who did this change work for RedHat by some chance?

Andrew
Back to top
View user's profile Send private message
Hu
Administrator
Administrator


Joined: 06 Mar 2007
Posts: 22584

PostPosted: Sat Oct 12, 2024 12:39 am    Post subject: Reply with quote

pingtoo wrote:
However it is only exist in master branch or v6.12-<rc*>,

on 6.10 tag and 6.11 tag does not have this check
Github Linux kernel v6.11 tag:
   bmp->db_numag = le32_to_cpu(dbmp_le->dn_numag);
   if (!bmp->db_numag) {
      err = -EINVAL;
      goto err_release_metapage;
   }
So may be the binary kernel cherry-picked a patch to build.
As I hinted at above, the offending check was backported into stable kernels. So v6.10 lacks it, but v6.10.13 has it. I only found it so readily because OP reported that v6.10.12 is good, v6.10.13 is bad, and the author happened to put jfs in the commit message.
Bluey_the_dog wrote:
The thing that confuses me is that I am currently sitting at a machine that is running a standard gentoo-sources, 6.2.1 and as time has passed, plenty of previous kernels, and jfs has JUST WORKED.
Yes, because v6.2.1 was released long long before the validating check was added. The commit that I suspect (but have not tested) to be the cause of your problem was authored 2024-08-19. v6.2.1 was released 2023-02-25. The entire v6.2.x line was abandoned before the suspect commit was authored.
Bluey_the_dog wrote:
Why did someone decide that they wanted to come into the code and play merry hell with a big stick.
Did you read the commit message to which I linked? It clearly explains that this is a sanity check to reject broken inputs. The wording strongly implies to me that in the absence of this check, other code will misbehave.
Bluey_the_dog wrote:
They obviously didn't test the code as a partition of 531.5G is nothing out of the ordinary these days.
This is an excessive generalization. While you could infer that the stricter check was not done on a filesystem of the characteristics you have, we do not have evidence that it fails on a tiny JFS filesystem. Yes, 531G is easily attainable today, but is JFS routinely created on large modern filesystems? Since it is a legacy filesystem, there may be few people - and none of them kernel developers - who have recently created JFS filesystems. If most people only use JFS to access legacy filesystems that have not been replaced, and those legacy filesystems are necessarily smaller because they were created when storage was more limited, then they might not be impacted.
Bluey_the_dog wrote:
Probably some deadhead trying to justify his/her existence
As above, this is an unwarranted assumption. If you had read the commit I linked, you would see this was done apparently in good faith. Your attack is inappropriate. While it seems plausible that the check was done incorrectly, we have no evidence it was introduced just to cause problems.
Bluey_the_dog wrote:
and in doing so has caused, at least to me, a lot of lost time. Did the person who did this change work for RedHat by some chance?
If you read the commit I linked, you will see the author uses a gmail address. While a Red Hat employee might do that, it seems more likely that this person is not associated with Red Hat.
Back to top
View user's profile Send private message
Bluey_the_dog
n00b
n00b


Joined: 13 Oct 2002
Posts: 69
Location: Perth, Australia

PostPosted: Sat Oct 12, 2024 1:24 am    Post subject: Reply with quote

Hu,
Are you Hu or Pingtoo? It is 9am in Perth and I haven't been to bed yet and I'm getting confused as to who is who.

I don't want to get into a flame war but I am a Civil Engineer and have written code that is used in the analysis & design of multi story building, bridges, wharves etc, stuff that we test, test and test again before we release, we don't write the code then release "in good faith". To do so could cause the deaths of many people so hopefully you can understand my irritation at reading things such as "...and none of them kernel developers - who have recently created JFS filesystems."

By the way, no, I had not read the commit you linked to. The very few times I have looked at kernel code, I come away bewildered as I have no idea as to the context & conventions used in the code.

So what do I do now? Dump JFS?

Regards,
Andrew
Back to top
View user's profile Send private message
Hu
Administrator
Administrator


Joined: 06 Mar 2007
Posts: 22584

PostPosted: Sat Oct 12, 2024 1:51 am    Post subject: Reply with quote

Hu is Hu, and Hu is on first. Pingtoo is Pingtoo, and is on second.

If you are tired enough to mix up who is answering Hu, then I suggest you log off and get some sleep. At that level of exhaustion, it is dangerously easy to make system-killing mistakes while engaging in routine administration.

I cannot comment on whether you should dump JFS, though the references in this thread to its deprecation status would make me concerned about using it. As I see it, your options are:
  • Dump JFS.
  • Nicely report to the kernel maintainers that recent kernels fail to mount your JFS system. Be prepared to share enough information to convince them. A succinct reproducible test case would be very helpful here.
  • Locally revert the bad commit in your kernels going forward.
  • Stop upgrading your kernel.
Back to top
View user's profile Send private message
pingtoo
Veteran
Veteran


Joined: 10 Sep 2021
Posts: 1217
Location: Richmond Hill, Canada

PostPosted: Sat Oct 12, 2024 3:43 pm    Post subject: Reply with quote

Bluey_the_dog wrote:
Pingtoo,
To clarify, that statement came from the first two "tests", the kernels that worked. The partition would mount to /home upon boot so to test if the partition would also mount manually, "mount /dev/.......", I would just try and mount it to some other point, actually /mnt/ddd. I just did the same thing for all of the tests, for the second two, the ones that failed, I just did the same thing, try and mount to /mnt/ddd. Nothing special, no tasks taking a hold of things etc.

The thing that confuses me is that I am currently sitting at a machine that is running a standard gentoo-sources, 6.2.1 and as time has passed, plenty of previous kernels, and jfs has JUST WORKED. Why did someone decide that they wanted to come into the code and play merry hell with a big stick. They obviously didn't test the code as a partition of 531.5G is nothing out of the ordinary these days. Probably some deadhead trying to justify his/her existence and in doing so has caused, at least to me, a lot of lost time. Did the person who did this change work for RedHat by some chance?

Andrew


It turn out it is not partition size cause the problem.

I tried to recreate your condition with my current RPI5 kernel (Linux rpi5 6.6.31+rpt-rpi-2712 #1 SMP PREEMPT Debian 1:6.6.31-1+rpt1 (2024-05-29) aarch64 GNU/Linux)

with storage size at 600GB or same as yours at 531.5GB, each test I am not able to create dn_numag as 128.

I also tested the different sector size for both above setting using 512 bytes and 4096 bytes still not able to create same condition as yours.

So now I am thinking may be it is kernel version that cause different when creating file system.

Have you tried with recreate file system with different kernel version?

I don't have nvme device handy so I am not able to exactly match your case.

I must confess that I don't have the skill set to fix this problem,

I start out as hoping this is matter of finding the correct configuration options but now I understand it is deeper them I think so I am not sure what else I can be any of help.

The "MAXAG" in source code is created very long time ago, and it is used in few place to create static sized array. So the patch is correctly trying to address the possible out of array boundary problem. Your past experience for everything working does not mean it is correct. You could consider lucky that you did not hit this overflow boundary problem. because It would be better the system alter me that there is problem to proceed than later when hit the overflow and deal with data reconvery headache.
Back to top
View user's profile Send private message
Hu
Administrator
Administrator


Joined: 06 Mar 2007
Posts: 22584

PostPosted: Fri Nov 01, 2024 6:17 pm    Post subject: Reply with quote

A subsequent commit relaxed this check. See jfs: Fix sanity check in dbMount for mainline. It was also backported into active stable series.
    commit ce781be80bd246d52d74bd9e667385f1c7104cfd v6.11.6
    commit f475d8a0cca76da393c361932f5fd26cda75236b v6.6.59
    commit d52ac941fe7de3e96acaf3cc6b6d6f41cf9356d4 v6.1.115
    commit cdf3ab1cf811ce166dfcd83354a411caa8977c70 v5.15.170
I see no sign it is in v6.10.x, and since that is now end-of-life upstream, I do not anticipate a backport to that series.
Back to top
View user's profile Send private message
pingtoo
Veteran
Veteran


Joined: 10 Sep 2021
Posts: 1217
Location: Richmond Hill, Canada

PostPosted: Fri Nov 01, 2024 7:23 pm    Post subject: Reply with quote

Hu wrote:
A subsequent commit relaxed this check. See jfs: Fix sanity check in dbMount for mainline. It was also backported into active stable series.
    commit ce781be80bd246d52d74bd9e667385f1c7104cfd v6.11.6
    commit f475d8a0cca76da393c361932f5fd26cda75236b v6.6.59
    commit d52ac941fe7de3e96acaf3cc6b6d6f41cf9356d4 v6.1.115
    commit cdf3ab1cf811ce166dfcd83354a411caa8977c70 v5.15.170
I see no sign it is in v6.10.x, and since that is now end-of-life upstream, I do not anticipate a backport to that series.


Thanks for the update. Nice to know it is addressed.

Too bad there is no explain why the magic number 128 (MAXAG=128) and under what circumstance the fs utility will generate the number. (since I am not able to reproduce by partition size). The fix imply during initial file system creation it cannot be >MAXAG (maximum Allocation Group). At the moment I really don't want to dig into the source code for the logic.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Goto page Previous  1, 2
Page 2 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum