Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Segmentation fault with Eigen library and GMPrationals [Solv
View unanswered posts
View posts from last 24 hours

Goto page 1, 2  Next  
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo
View previous topic :: View next topic  
Author Message
amandini
n00b
n00b


Joined: 01 Jul 2020
Posts: 33

PostPosted: Mon Nov 18, 2024 6:25 am    Post subject: Segmentation fault with Eigen library and GMPrationals [Solv Reply with quote

Hello,

I am doing some work using the Eigen C++ template library, with large sparse matrices, say 200,000 x 200,000, over the mpq_class provided by the GNU Multiple Precision (GMP) library.

The first problem is that code which compiles without any warnings on the godbolt compiler, a small example of which is given at https://godbolt.org/z/j66jdK7n6, produces the following warnings when compiled on my gentoo system

Code:

#g++ -std=c++20 -O2 -march=native -lgmp -lgmpxx -Iboost -Wall -Wpedantic -Wextra example.cpp  -o example.o
In file included from /usr/include/eigen3/Eigen/SparseCore:40,
                 from example.cpp:6:
/usr/include/eigen3/Eigen/src/SparseCore/AmbiVector.h: In instantiation of ‘void Eigen::internal::AmbiVector<_Scalar, _StorageIndex>::reallocateSparse() [with _Scalar = __gmp_expr<__mpq_struct [1], __mpq_struct [1]>; _StorageIndex = int]’:
/usr/include/eigen3/Eigen/src/SparseCore/AmbiVector.h:238:11:   required from ‘_Scalar& Eigen::internal::AmbiVector<_Scalar, _StorageIndex>::coeffRef(Eigen::Index) [with _Scalar = __gmp_expr<__mpq_struct [1], __mpq_struct [1]>; _StorageIndex = int; Eigen::Index = long int]’
/usr/include/eigen3/Eigen/src/SparseCore/TriangularSolver.h:233:28:   required from ‘static void Eigen::internal::sparse_solve_triangular_sparse_selector<Lhs, Rhs, Mode, UpLo, 0>::run(const Lhs&, Rhs&) [with Lhs = const Eigen::SparseMatrix<__gmp_expr<__mpq_struct [1], __mpq_struct [1]> >; Rhs = Eigen::SparseMatrix<__gmp_expr<__mpq_struct [1], __mpq_struct [1]> >; int Mode = 6; int UpLo = 2]’
/usr/include/eigen3/Eigen/src/SparseCore/TriangularSolver.h:306:93:   required from ‘void Eigen::TriangularViewImpl<MatrixType, Mode, Eigen::Sparse>::solveInPlace(Eigen::SparseMatrixBase<OtherDerived>&) const [with OtherDerived = Eigen::SparseMatrix<__gmp_expr<__mpq_struct [1], __mpq_struct [1]> >; MatrixType = const Eigen::SparseMatrix<__gmp_expr<__mpq_struct [1], __mpq_struct [1]> >; unsigned int Mode = 6]’
example.cpp:80:61:   required from here
/usr/include/eigen3/Eigen/src/SparseCore/AmbiVector.h:97:18: warning: ‘void* memcpy(void*, const void*, size_t)’ writing to an object of type ‘Eigen::internal::AmbiVector<__gmp_expr<__mpq_struct [1], __mpq_struct [1]>, int>::Scalar’ {aka ‘class __gmp_expr<__mpq_struct [1], __mpq_struct [1]>’} with no trivial copy-assignment; use copy-assignment or copy-initialization instead [-Wclass-memaccess]
   97 |       std::memcpy(newBuffer,  m_buffer,  copyElements * sizeof(ListEl));
      |       ~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from example.cpp:4:
/usr/include/gmpxx.h:1763:7: note: ‘Eigen::internal::AmbiVector<__gmp_expr<__mpq_struct [1], __mpq_struct [1]>, int>::Scalar’ {aka ‘class __gmp_expr<__mpq_struct [1], __mpq_struct [1]>’} declared here
 1763 | class __gmp_expr<mpq_t, mpq_t>
      |       ^~~~~~~~~~~~~~~~~~~~~~~~


Eigen developers informed me that they could not reproduce these warnings on their machines. Nor are the warnings reproducible on the compiler explorer https://godbolt.org/z/j66jdK7n6.

Second and more pressing issue is that the process of constructing large sparse matrices (175,175 x 175,175) over mpq_class in Eigen causes the following segmentation fault:
Code:
Thread 1 "main.o" received signal SIGSEGV, Segmentation fault.
0x00007ffff79d525e in free () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff79d525e in free () from /lib64/libc.so.6
#1  0x00007ffff7edee80 in __gmpq_clear () from /usr/lib64/libgmp.so.10
#2  0x000055555555ff4d in __gmp_expr<__mpq_struct [1], __mpq_struct [1]>::~__gmp_expr (this=0x7fffffffd510, __in_chrg=<optimized out>)
    at /usr/include/gmpxx.h:1855
#3  Eigen::SparseMatrix<__gmp_expr<__mpq_struct [1], __mpq_struct [1]>, 1, int>::insertBackUncompressed (col=<optimized out>, row=<optimized out>, this=0x7fffffffd4c0) at /usr/include/eigen3/Eigen/src/SparseCore/SparseMatrix.h:904
#4  Eigen::internal::set_from_triplets<__gnu_cxx::__normal_iterator<Eigen::Triplet<__gmp_expr<__mpq_struct [1], __mpq_struct [1]>, int> const*, std::vector<Eigen::Triplet<__gmp_expr<__mpq_struct [1], __mpq_struct [1]>, int>, std::allocator<Eigen::Triplet<__gmp_expr<__mpq_struct [1], __mpq_struct [1]>, int> > > >, Eigen::SparseMatrix<__gmp_expr<__mpq_struct [1], __mpq_struct [1]>, 0, int>, Eigen::internal::scalar_sum_op<__gmp_expr<__mpq_struct [1], __mpq_struct [1]>, __gmp_expr<__mpq_struct [1], __mpq_struct [1]> > > (begin=..., end=..., mat=..., dup_func=...) at /usr/include/eigen3/Eigen/src/SparseCore/SparseMatrix.h:1056
#5  0x000055555556cf22 in Eigen::SparseMatrix<__gmp_expr<__mpq_struct [1], __mpq_struct [1]>, 0, int>::setFromTriplets<__gnu_cxx::__normal_iterator<Eigen::Triplet<__gmp_expr<__mpq_struct [1], __mpq_struct [1]>, int> const*, std::vector<Eigen::Triplet<__gmp_expr<__mpq_struct [1], __mpq_struct [1]>, int>, std::allocator<Eigen::Triplet<__gmp_expr<__mpq_struct [1], __mpq_struct [1]>, int> > > > > (end=..., begin=..., this=0x7fffffffdb78) at /usr/include/eigen3/Eigen/src/Core/functors/BinaryFunctors.h:36
#6  p_basis<__gmp_expr<__mpq_struct [1], __mpq_struct [1]>, Eigen::SparseMatrix<__gmp_expr<__mpq_struct [1], __mpq_struct [1]>, 0, int> >::basis_builder (this=this@entry=0x7fffffffdb38, src=..., seminormal_form=..., p=p@entry=3) at /home/user_name/path_to_file/p_basis.h:165
#7  0x0000555555570c69 in p_basis<__gmp_expr<__mpq_struct [1], __mpq_struct [1]>, Eigen::SparseMatrix<__gmp_expr<__mpq_struct [1], __mpq_struct [1]>, 0, int > >::p_basis (this=0x7fffffffdb38, src=..., seminormal_form=..., p=3) at /home/user_name/path_to_file/p_basis.h:57
#8  0x00005555555718c5 in basis_data<__gmp_expr<__mpq_struct [1], __mpq_struct [1]>, Eigen::SparseMatrix<__gmp_expr<__mpq_struct [1], __mpq_struct [1]>, 0, int> >::basis_data (p=3, src=..., this=0x7fffffffdaf8) at /home/user_name/path_to_file/basis_data.h:61
#9  mform<__gmp_expr<__mpq_struct [1], __mpq_struct [1]>, Eigen::SparseMatrix<__gmp_expr<__mpq_struct [1], __mpq_struct [1]>, 0, int> >::mform (this=0x7fffffffdaf0, src=..., path="/home/user_name/output.txt", p=3) at /home/user_name/path_to_file/mform.h:36
#10 0x0000555555559995 in main () at /usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/bits/basic_string.tcc:242
(gdb)


The code that produced the segmentation fault above, was
Code:
typedef Eigen::Triplet<mpq_class> qtriplet;
std::vector<qtriplet> triplets{};

and line 165 of the file p_basis.h referred to in the gdb backtrace aboveis
Code:
m_basis.setFromTriplets(std::cbegin(triplets), std::cend(triplets));

Would anyone at the Gentoo Science Project (Eigen) or the Gentoo Toolchain Project (GMP) be willing or able to help look into these issues?

Any other suggestions that may help get to the bottom of this issue would be gratefully received.

[edited for typos]


Last edited by amandini on Sun Nov 24, 2024 4:50 am; edited 2 times in total
Back to top
View user's profile Send private message
sam_
Developer
Developer


Joined: 14 Aug 2020
Posts: 2032

PostPosted: Mon Nov 18, 2024 6:37 am    Post subject: Reply with quote

I see the warning locally as well but not on godbolt. Not sure why yet.

What does -march=native expand to on your system? You can check by running resolve-march-native from app-misc/resolve-march-native. I do see it without but it may be useful to know nonetheless.

Could you give me a testcase which segfaults? Can look more then.
Back to top
View user's profile Send private message
amandini
n00b
n00b


Joined: 01 Jul 2020
Posts: 33

PostPosted: Mon Nov 18, 2024 7:05 am    Post subject: Reply with quote

sam_ wrote:
What does -march=native expand to on your system? You can check by running resolve-march-native from app-misc/resolve-march-native.

I see the warning locally as well but not on godbolt. Not sure why yet.

Could you give me a testcase which segfaults?


Thank you for your reply. I run these calculations on two workstations, on one -march=native resolves as
Code:
-march=tigerlake -mabm -mno-kl -mno-sgx -mno-widekl -mshstk --param=l1-cache-line-size=64 --param=l1-cache-size=48 --param=l2-cache-size=24576

and the other it resolves as
Code:
march=znver2 --param=l1-cache-line-size=64 --param=l1-cache-size=32 --param=l2-cache-size=512

The compiler warnings occur identically on both the znver2 and the tigerlake machine.

The example on godbolt was only contrived to illustrate the compiler warnings. As it is, it took many lines of code and over 500GB of ram on the AMD machine to produce the segmentation fault, so it may not be practical to share this specific example.

I have not tried to reproduce the segmentation fault on the intel machine which has less memory.

However, it may be possible to come up with a contrived and simpler example that reproduces the segmentation fault. Unfortunately may take a bit of time to get such a segmentation fault example working - my apologies in advance.

Should I go ahead and try to make such an example?


Last edited by amandini on Tue Nov 19, 2024 12:33 am; edited 1 time in total
Back to top
View user's profile Send private message
bstaletic
Guru
Guru


Joined: 05 Apr 2014
Posts: 407

PostPosted: Mon Nov 18, 2024 8:02 am    Post subject: Reply with quote

I'm not familiar with GMP, but that operator< looks very wrong - sometimes it tells you which of the two inputs is smaller and sometimes it does the opposite.

Code:

x   y | result 
---------------
0   0 | false
0   y | true 
x   0 | false
x  2x | false
2x  x | true 


The last two return true if x is greater than y.
The first three return true if y is greater than x.
Back to top
View user's profile Send private message
GDH-gentoo
Veteran
Veteran


Joined: 20 Jul 2019
Posts: 1777
Location: South America

PostPosted: Mon Nov 18, 2024 8:05 pm    Post subject: Reply with quote

I don't know why the Compiler Explorer doesn't output the warning, but it makes sense. The code in Eigen::internal::AmbiVector<>'s reallocateSparse() member function uses std::memcpy() to copy the value of an object whose type is the type denoted by the class template's _Scalar parameter. I believe that you can have undefined behaviour when using objects of nontrivially copyable types made like that... and GMP's mpq_class is definitely not trivially copyable, it's a class that has nontrivial default and copy constructors, a nontrivial copy assignment operator, and a nontrivial destructor.

I don't know in general if Eigen documents its expectations about the types that are passed as template arguments like this, examples seem to show only arithmetic types like double or at most trivially copyable classes like std::complex<float>. The segfault shown in the backtrace happens in the destructor of some mpq_class object, hard to say if it was properly constructed.
_________________
NeddySeagoon wrote:
I'm not a witch, I'm a retired electronics engineer :)
Ionen wrote:
As a packager I just don't want things to get messier with weird build systems and multiple toolchains requirements though :)
Back to top
View user's profile Send private message
amandini
n00b
n00b


Joined: 01 Jul 2020
Posts: 33

PostPosted: Mon Nov 18, 2024 10:41 pm    Post subject: Reply with quote

GDH-gentoo wrote:
I don't know why the Compiler Explorer doesn't output the warning, but it makes sense. The code in Eigen::internal::AmbiVector<>'s reallocateSparse() member function uses std::memcpy() to copy the value of an object whose type is the type denoted by the class template's _Scalar parameter. I believe that you can have undefined behaviour when using objects of nontrivially copyable types made like that... and GMP's mpq_class is definitely not trivially copyable, it's a class that has nontrivial default and copy constructors, a nontrivial copy assignment operator, and a nontrivial destructor.

I don't know in general if Eigen documents its expectations about the types that are passed as template arguments like this, examples seem to show only arithmetic types like double or at most trivially copyable classes like std::complex<float>. The segfault shown in the backtrace happens in the destructor of some mpq_class object, hard to say if it was properly constructed.


Thank you for your reply.

Apropos of the compiler warnings, an Eigen developer just posted the Discord channel that 'eigen should use the copy/move ctor instead of memcpy if the type isn't trivially copyable' and provided the following link:
https://gitlab.com/libeigen/eigen/-/blob/master/Eigen/src/SparseCore/AmbiVector.h?ref_type=heads#L91
It therefore seems that warnings may be an Eigen issue.

The segmentation fault remains a mystery - I had previously computed examples of up to dimensions 125,125 x 125,125 without encountering the problem. It has only given the segmentation fault when I moved to the next example which happens to have dimension 175,175 x 175,175 (irrelevant numerological observation: the dimensions in these two examples happen to be divisible by 1001 = 7 * 11 * 13).

Stripping out the extraneous parts of the code, I'll construct the matrix using the same mp_class elements stored in a CSV file the and see if the segmentation fault can be replicated that way.
Back to top
View user's profile Send private message
amandini
n00b
n00b


Joined: 01 Jul 2020
Posts: 33

PostPosted: Mon Nov 18, 2024 11:07 pm    Post subject: Reply with quote

bstaletic wrote:
I'm not familiar with GMP, but that operator< looks very wrong - sometimes it tells you which of the two inputs is smaller and sometimes it does the opposite.

Code:

x   y | result 
---------------
0   0 | false
0   y | true 
x   0 | false
x  2x | false
2x  x | true 


The last two return true if x is greater than y.
The first three return true if y is greater than x.


Thank you for your reply.

The code containing the operator came from Eigen https://eigen.tuxfamily.org/dox/TopicCustomizing_CustomScalar.html
Back to top
View user's profile Send private message
GDH-gentoo
Veteran
Veteran


Joined: 20 Jul 2019
Posts: 1777
Location: South America

PostPosted: Tue Nov 19, 2024 9:56 pm    Post subject: Reply with quote

amandini wrote:
The code that produced the segmentation fault above, was
Code:
typedef Eigen::Triplet<mpq_class> qtriplet;
std::vector<qtriplet> triplets{};

and line 165 of the file p_basis.h referred to in the gdb backtrace aboveis
Code:
m_basis.setFromTriplets(std::cbegin(triplets), std::cend(triplets));
amandini wrote:
It has only given the segmentation fault when I moved to the next example which happens to have dimension 175,175 x 175,175 [...].

So, m_basis here has type Eigen::SparseMatrix<mpq_class, 0, int>, right? Do you mean that when you get the segfault, m_basis.rows() and m_basis.cols() are both 175175? How many elements does triplets actually have when m_basis.setFromTriplets() is called and segfaults?
_________________
NeddySeagoon wrote:
I'm not a witch, I'm a retired electronics engineer :)
Ionen wrote:
As a packager I just don't want things to get messier with weird build systems and multiple toolchains requirements though :)
Back to top
View user's profile Send private message
amandini
n00b
n00b


Joined: 01 Jul 2020
Posts: 33

PostPosted: Wed Nov 20, 2024 7:25 am    Post subject: Reply with quote

GDH-gentoo wrote:
amandini wrote:
The code that produced the segmentation fault above, was
Code:
typedef Eigen::Triplet<mpq_class> qtriplet;
std::vector<qtriplet> triplets{};

and line 165 of the file p_basis.h referred to in the gdb backtrace aboveis
Code:
m_basis.setFromTriplets(std::cbegin(triplets), std::cend(triplets));
amandini wrote:
It has only given the segmentation fault when I moved to the next example which happens to have dimension 175,175 x 175,175 [...].

So, m_basis here has type Eigen::SparseMatrix<mpq_class, 0, int>, right? Do you mean that when you get the segfault, m_basis.rows() and m_basis.cols() are both 175175? How many elements does triplets actually have when m_basis.setFromTriplets() is called and segfaults?


Thank you for your reply.

Yes, the dimensions and type you have quoted are all correct.

There are 3,010,744,671 elements in triplets when I initialise the matrix at the line
Code:
m_basis.setFromTriplets(std::cbegin(triplets), std::cend(triplets));
Back to top
View user's profile Send private message
Hu
Administrator
Administrator


Joined: 06 Mar 2007
Posts: 22857

PostPosted: Wed Nov 20, 2024 12:55 pm    Post subject: Reply with quote

The warning about memcpy is because classes with non-trivial copy constructors often (but not always) have invariants that the copy constructor will maintain, but that a straight memcpy will not. A simple example of that would be std::vector: the copy constructor is nontrivial because the copy gets its own distinct storage, with its own copy of the contained objects. Using memcpy to copy a vector would result in two vector instances that point to the same storage, and both will try to free that storage when destroyed. That free is undefined behavior and may (or may not) cause a crash due to heap corruption.

I would argue that until you can patch Eigen to build cleanly with -Werror=class-memaccess, it is not worth your time investigating the crash further. Eliminating the undefined behavior of copying the non-trivial class is worthwhile, and may (or may not) fix your crash as a side effect.
Back to top
View user's profile Send private message
GDH-gentoo
Veteran
Veteran


Joined: 20 Jul 2019
Posts: 1777
Location: South America

PostPosted: Wed Nov 20, 2024 12:55 pm    Post subject: Reply with quote

amandini wrote:
There are 3,010,744,671 elements in triplets when I initialise the matrix at the line
Code:
m_basis.setFromTriplets(std::cbegin(triplets), std::cend(triplets));

Since what happens looks like a memory management problem, I'm not sure that the values stored in mpq_class objects matters, so the first thing I would do to try to reproduce the segmentation fault is write a small program that creates a 175,175 x 175,175 SparseMatrix<mpq_class, 0, int> object, a std::vector<Triplet<mpq_class>> object with 3,010,744,671 very simple to compute elements —but without repeating (row, column) pairs, since that executes an extra code path—, and calls setFromTriplets(). I won't have access to my Gentoo computer for several hours, but I might post one if you don't beat me to it in the meantime.
_________________
NeddySeagoon wrote:
I'm not a witch, I'm a retired electronics engineer :)
Ionen wrote:
As a packager I just don't want things to get messier with weird build systems and multiple toolchains requirements though :)
Back to top
View user's profile Send private message
GDH-gentoo
Veteran
Veteran


Joined: 20 Jul 2019
Posts: 1777
Location: South America

PostPosted: Wed Nov 20, 2024 11:53 pm    Post subject: Reply with quote

GDH-gentoo wrote:
[...] the first thing I would do to try to reproduce the segmentation fault is write a small program that creates a 175,175 x 175,175 SparseMatrix<mpq_class, 0, int> object, a std::vector<Triplet<mpq_class>> object with 3,010,744,671 very simple to compute elements —but without repeating (row, column) pairs, since that executes an extra code path—, and calls setFromTriplets()

Here it is, based on the example in section "Filling a sparse matrix" (semitested because of memory constraints and not wanting to install Eigen) :

Code:
#include <Eigen/SparseCore>
#include <gmpxx.h>
#include <iostream>
#include <vector>

namespace Eigen {

template<> struct NumTraits<mpq_class>: GenericNumTraits<mpq_class> {
        typedef mpq_class Real;
        typedef mpq_class NonInteger;
        typedef mpq_class Nested;

        static inline Real epsilon() { return 0; }
        static inline Real dummy_precision() { return 0; }
        static inline int digits10() { return 0; }

        enum {
                IsInteger = 0,
                IsSigned = 1,
                IsComplex = 0,
                RequireInitialization = 1,
                ReadCost = 6,
                AddCost = 150,
                MulCost = 100
        };
};
}

int main() {
        using namespace Eigen;
        using qtriplet = Triplet<mpq_class>;

        std::vector<qtriplet> t;
        t.reserve(3010744671UL);

        const Index nrows = 175175UL;
        unsigned long k = 0;
        for (Index i = 0; i < nrows; i++)
                for (Index j = 0; j < 17187UL; j++) t.push_back(qtriplet(i, j, mpq_class(k++)));
        for (Index i = 0; i < 11946UL; i++) t.push_back(qtriplet(i, 17188UL, mpq_class(k++)));
        std::cout << "Created vector of " << k << " qtriplets\nTrying setFromTriplets()\n";

        SparseMatrix<mpq_class> m(nrows, nrows);
        m.setFromTriplets(t.cbegin(), t.cend());
        std::cout << "I didn't explode\n";

        return 0;
}

Does this segfault?
_________________
NeddySeagoon wrote:
I'm not a witch, I'm a retired electronics engineer :)
Ionen wrote:
As a packager I just don't want things to get messier with weird build systems and multiple toolchains requirements though :)
Back to top
View user's profile Send private message
amandini
n00b
n00b


Joined: 01 Jul 2020
Posts: 33

PostPosted: Thu Nov 21, 2024 9:56 am    Post subject: Reply with quote

Hu wrote:
The warning about memcpy is because classes with non-trivial copy constructors often (but not always) have invariants that the copy constructor will maintain, but that a straight memcpy will not. A simple example of that would be std::vector: the copy constructor is nontrivial because the copy gets its own distinct storage, with its own copy of the contained objects. Using memcpy to copy a vector would result in two vector instances that point to the same storage, and both will try to free that storage when destroyed. That free is undefined behavior and may (or may not) cause a crash due to heap corruption.

I would argue that until you can patch Eigen to build cleanly with -Werror=class-memaccess, it is not worth your time investigating the crash further. Eliminating the undefined behavior of copying the non-trivial class is worthwhile, and may (or may not) fix your crash as a side effect.


Thank you for the replies.

I agree that the non-trivial copy constructor issue in Eigen needs to be solved as it is possibly the root cause of the fault. The issue has now been flagged by the developers: https://gitlab.com/libeigen/eigen/-/issues/2873.

In the meantime, thinking about the question that GDH-gentoo asked about the number of non-zero elements in the matrix, I realised that because there is a equivalence relation on the entries that is used to set many of them to zero, it would be more efficient to delete these entries before rather than after the matrix is initialised.

With this change to the program, the matrix for the same example contains 1,478,894 elements and the segmentation fault no longer occurs for this particular example.

I should have done the calculation this way in the first place.
Back to top
View user's profile Send private message
amandini
n00b
n00b


Joined: 01 Jul 2020
Posts: 33

PostPosted: Thu Nov 21, 2024 10:00 am    Post subject: Reply with quote

GDH-gentoo wrote:

Here it is, based on the example in section "Filling a sparse matrix" (semitested because of memory constraints and not wanting to install Eigen) :

Does this segfault?


Thank you for writing this code.

Being a bit swamped with other things right now, I won't be able to do any more work on this problem until Saturday.

Will be sure to let you know the result then.
Back to top
View user's profile Send private message
amandini
n00b
n00b


Joined: 01 Jul 2020
Posts: 33

PostPosted: Sat Nov 23, 2024 9:03 am    Post subject: Reply with quote

GDH-gentoo wrote:

Does this segfault?


Interestingly it does segfault:
Code:
user@localhost ~ $ g++ -std=c++20 -march=znver2 -O2 -ltbb -lgmp -lgmpxx -Wall -Wpedantic -Wextra -I boost segfault.c -o segfault.o
segfault.c: In function ‘int main()’:
segfault.c:39:37: warning: comparison of integer expressions of different signedness: ‘Eigen::Index’ {aka ‘long int’} and ‘long unsigned int’ [-Wsign-compa
re]
   39 |                 for (Index j = 0; j < 17187UL; j++) t.push_back(qtriplet(i, j, mpq_class(k++)));
      |                                   ~~^~~~~~~~~
segfault.c:40:29: warning: comparison of integer expressions of different signedness: ‘Eigen::Index’ {aka ‘long int’} and ‘long unsigned int’ [-Wsign-compa
re]
   40 |         for (Index i = 0; i < 11946UL; i++) t.push_back(qtriplet(i, 17188UL, mpq_class(k++)));
      |                           ~~^~~~~~~~~
user@localhost ~ $ ./segfault.o
Created vector of 3010744671 qtriplets
Trying setFromTriplets()
Segmentation fault

Thank you again @GDH-gentoo for providing the code to test this.

I'll bring it to the attention of the Eigen developers.
Back to top
View user's profile Send private message
GDH-gentoo
Veteran
Veteran


Joined: 20 Jul 2019
Posts: 1777
Location: South America

PostPosted: Sat Nov 23, 2024 2:24 pm    Post subject: Reply with quote

Interesting, does the GDB backtrace look the same?

Eigen::SparseMatrix<mpq_class, Eigen::RowMajor, int>::insertBackUncompressed() --> mpq_class destructor --> mpq_clear() --> std::free() --> segmentation fault
_________________
NeddySeagoon wrote:
I'm not a witch, I'm a retired electronics engineer :)
Ionen wrote:
As a packager I just don't want things to get messier with weird build systems and multiple toolchains requirements though :)
Back to top
View user's profile Send private message
amandini
n00b
n00b


Joined: 01 Jul 2020
Posts: 33

PostPosted: Sat Nov 23, 2024 9:13 pm    Post subject: Reply with quote

GDH-gentoo wrote:
Interesting, does the GDB backtrace look the same?

Eigen::SparseMatrix<mpq_class, Eigen::RowMajor, int>::insertBackUncompressed() --> mpq_class destructor --> mpq_clear() --> std::free() --> segmentation fault

There is no reference to std::free() in the backtrace:
Code:
(gdb) bt
#0  0x000055555555598d in Eigen::SparseMatrix<__gmp_expr<__mpq_struct [1], __mpq_struct [1]>, 1, int>::insertBackUncompressed (col=7613,
    row=<optimized out>, this=0x7fffffffdbc0) at /usr/include/eigen3/Eigen/src/SparseCore/SparseMatrix.h:903
#1  Eigen::internal::set_from_triplets<__gnu_cxx::__normal_iterator<Eigen::Triplet<__gmp_expr<__mpq_struct [1], __mpq_struct [1]>, int> const*, std::vector
<Eigen::Triplet<__gmp_expr<__mpq_struct [1], __mpq_struct [1]>, int>, std::allocator<Eigen::Triplet<__gmp_expr<__mpq_struct [1], __mpq_struct [1]>, int> >
> >, Eigen::SparseMatrix<__gmp_expr<__mpq_struct [1], __mpq_struct [1]>, 0, int>, Eigen::internal::scalar_sum_op<__gmp_expr<__mpq_struct [1], __mpq_struct
[1]>, __gmp_expr<__mpq_struct [1], __mpq_struct [1]> > > (dup_func=..., mat=..., end=..., begin=...)
    at /usr/include/eigen3/Eigen/src/SparseCore/SparseMatrix.h:1056
#2  Eigen::SparseMatrix<__gmp_expr<__mpq_struct [1], __mpq_struct [1]>, 0, int>::setFromTriplets<__gnu_cxx::__normal_iterator<Eigen::Triplet<__gmp_expr<__m
pq_struct [1], __mpq_struct [1]>, int> const*, std::vector<Eigen::Triplet<__gmp_expr<__mpq_struct [1], __mpq_struct [1]>, int>, std::allocator<Eigen::Tripl
et<__gmp_expr<__mpq_struct [1], __mpq_struct [1]>, int> > > > > (end=..., begin=..., this=0x7fffffffdb70)
    at /usr/include/eigen3/Eigen/src/SparseCore/SparseMatrix.h:1110
#3  main () at segfault.c:44
(gdb)
Back to top
View user's profile Send private message
GDH-gentoo
Veteran
Veteran


Joined: 20 Jul 2019
Posts: 1777
Location: South America

PostPosted: Sat Nov 23, 2024 9:47 pm    Post subject: Reply with quote

I see. A little different from your original program, but the segmentation fault still happens when executing Eigen::SparseMatrix<mpq_class, Eigen::RowMajor, int>::insertBackUncompressed()'s code.

amandini wrote:
I'll bring it to the attention of the Eigen developers.

Eigen users would probably be grateful if you do.
_________________
NeddySeagoon wrote:
I'm not a witch, I'm a retired electronics engineer :)
Ionen wrote:
As a packager I just don't want things to get messier with weird build systems and multiple toolchains requirements though :)
Back to top
View user's profile Send private message
amandini
n00b
n00b


Joined: 01 Jul 2020
Posts: 33

PostPosted: Sat Nov 23, 2024 10:39 pm    Post subject: Reply with quote

GDH-gentoo wrote:
I see. A little different from your original program, but the segmentation fault still happens when executing Eigen::SparseMatrix<mpq_class, Eigen::RowMajor, int>::insertBackUncompressed()'s code.

Eigen users would probably be grateful if you do.

Thank you for all your help with this. I have just raised the issue on the Eigen Discord channel. Will let you know the outcome.
Back to top
View user's profile Send private message
amandini
n00b
n00b


Joined: 01 Jul 2020
Posts: 33

PostPosted: Sun Nov 24, 2024 4:47 am    Post subject: Reply with quote

Update
The Eigen developers suggested that for the matrices I use
Code:
SparseMatrix<mpq_class, 0, int64_t>
and similarly for the triplets
Code:
using qtriplet = Triplet<mpq_class, std::int64_t>;
Then your program reads as
Code:

#include <gmp.h>
#include <vector>
#include <iostream>
#include <gmp.h>
#include <gmpxx.h>
#include </usr/include/eigen3/Eigen/Core>
#include </usr/include/eigen3/Eigen/SparseCore>
#include </usr/include/boost/operators.hpp>
#include <cstdint>

namespace Eigen{

template<> struct NumTraits<mpq_class>: GenericNumTraits<mpq_class> {
        typedef mpq_class Real;
        typedef mpq_class NonInteger;
        typedef mpq_class Nested;

        static inline Real epsilon() { return 0; }
        static inline Real dummy_precision() { return 0; }
        static inline int digits10() { return 0; }

        enum {
                IsInteger = 0,
                IsSigned = 1,
                IsComplex = 0,
                RequireInitialization = 1,
                ReadCost = 6,
                AddCost = 150,
                MulCost = 100
        };

   };
}

int main() {
        using namespace Eigen;
   typedef Eigen::SparseMatrix<mpq_class, 0, int64_t> qmatrix;
        using qtriplet = Triplet<mpq_class, std::int64_t>;

        std::vector<qtriplet> t;
        t.reserve(3010744671UL);

        const std::int64_t nrows = 175175;
        std::uint64_t k = 0;
        for (std::int64_t i = 0; i < nrows; i++)
                for (std::int64_t j = 0; j < 17187UL; j++) t.push_back(qtriplet(i, j, mpq_class(k++)));
        for (std::int64_t i = 0; i < 11946UL; i++) t.push_back(qtriplet(i, 17188, mpq_class(k++)));
        std::cout << "Created vector of " << k << " qtriplets\nTrying setFromTriplets()\n";

        auto m{qmatrix(nrows, nrows)};
        m.setFromTriplets(t.cbegin(), t.cend());
        std::cout << "I didn't explode\n";

        return 0;
}

It now runs with no fault and the output
Code:
user@localhost ~ $ ./segfault.o
Created vector of 3010744671 qtriplets
Trying setFromTriplets()
I didn't explode

Thank you all for the help in getting to the bottom of this.
Back to top
View user's profile Send private message
Hu
Administrator
Administrator


Joined: 06 Mar 2007
Posts: 22857

PostPosted: Sun Nov 24, 2024 2:45 pm    Post subject: Reply with quote

That program may work, but I still think Eigen needs an improvement here. It should either not use std::memcpy to copy the type, or it should include a compile-time check that the type being copied is safe to use with std::memcpy. In my opinion, it should not be possible for a user to build a program with likely dangerous behavior (byte-copying a type with a non-trivial copy constructor) without specifically bypassing a safety measure that warns the user about the expected problem.
Back to top
View user's profile Send private message
GDH-gentoo
Veteran
Veteran


Joined: 20 Jul 2019
Posts: 1777
Location: South America

PostPosted: Sun Nov 24, 2024 3:12 pm    Post subject: Reply with quote

Resolving issue #2873 should take care of that. They where careful in other places of the code, they have for example a smart_copy<>() function template —which they reference in the issue— that uses fast std::memcpy() when safe (determined for type _Scalar by evaluating Eigen::NumTraits<_Scalar>::RequireInitialization), and slower but general std::copy() otherwise.

Bit I'm not sure this particular code path was taken in the examples that segfaulted, though. What Eigen developers suggested to amandini was explicitly passing a 64-bit integer type as the _StorageIndex template parameter, instead of using the 32-bit one (on amd64) that it defaults to. Looks to me like a different underlying problem.
_________________
NeddySeagoon wrote:
I'm not a witch, I'm a retired electronics engineer :)
Ionen wrote:
As a packager I just don't want things to get messier with weird build systems and multiple toolchains requirements though :)


Last edited by GDH-gentoo on Sun Nov 24, 2024 10:47 pm; edited 1 time in total
Back to top
View user's profile Send private message
Hu
Administrator
Administrator


Joined: 06 Mar 2007
Posts: 22857

PostPosted: Sun Nov 24, 2024 3:25 pm    Post subject: Reply with quote

That is better, but if a user sets RequireInitialization=0 on a type that needs a non-trivial copy constructor, will that (a) compile and run properly, (b) compile with undefined behavior, or (c) fail to compile? My concern is that users not be able to trigger path (b) without being very clear what they are doing.
Back to top
View user's profile Send private message
GDH-gentoo
Veteran
Veteran


Joined: 20 Jul 2019
Posts: 1777
Location: South America

PostPosted: Sun Nov 24, 2024 4:31 pm    Post subject: Reply with quote

Either a) or b), I suppose, depending on what gets instantiated, but this falls under the general C++ question: if a template has requirements on its type parameters, whose responsibility is making sure that arguments meet them? If they don't, can this be detected at compile time?

GDH-gentoo wrote:
What Eigen developers suggested to amandini was explicitly passing a 64-bit integer type as the _StorageIndex template parameter, instead of using the 32-bit one (on amd64) that it defaults to. Looks to me like a different underlying problem.

Now that I have written this, come to think of it, 3,010,744,671 can be represented by a 32-bit unsigned type, but not a signed one, maybe it has something to do with the segfault and the proposed solution.
_________________
NeddySeagoon wrote:
I'm not a witch, I'm a retired electronics engineer :)
Ionen wrote:
As a packager I just don't want things to get messier with weird build systems and multiple toolchains requirements though :)


Last edited by GDH-gentoo on Sun Nov 24, 2024 10:44 pm; edited 4 times in total
Back to top
View user's profile Send private message
Hu
Administrator
Administrator


Joined: 06 Mar 2007
Posts: 22857

PostPosted: Sun Nov 24, 2024 4:40 pm    Post subject: Reply with quote

For the sake of minimizing user confusion and subsequent support questions, my preference is that the template do the best validation that it easily can, through a combination of type_traits checks and static_assert statements on those traits. For this case, I would suggest adding static_assert(std::is_trivially_copyable<_Scalar>::value, "_Scalar values must be trivially copyable to use memcpy"); adjacent to that std::memcpy. A block comment could elaborate on how the user can avoid that path for values that are known not to be trivially copyable, such as by setting RequireInitialization=1 in the Eigen::NumTraits specialization (assuming that suffices to redirect the logic to the std::copy path; I am inferring that from your prior post).

If the library requires C++20 or later, it could use requires checks instead of static_assert. Either way, mistakes give the user a clear compile-time error, independent of compiler warning options.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo All times are GMT
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum