hey, if you get a chance to write up a blog post about how you walked through this process to get to this point it would be amazing. I really need to learn how to get to this point.
I understand the terminology and what the tools are doing, but have no clue how to do it.
Are you positive it’s that line, and is it always that line?
The only pointer that’s being dereferenced on that line is proc, which is also dereferenced shortly before, so I don’t see how that line could normally cause a segfault.
The & operator binds after the -> and . operators, meaning that it applies to the whole expression proc->sig_inq.first. In other words, that == is checking whether proc->sig_inq.nmsigs.next is a pointer to the memory location of proc->sig_inq.first. (The former is an ErtsMessage ** and the latter an ErtsMessage * so that makes sense.)
My C isn’t all that great either but I think the most likely thing to be going on is concurrent access to the data causing the problem, like proc being a valid pointer during the execution of one line and suddenly not anymore during the execution of the next line, because a thread running in parallel did something. That’s assuming the segfault really happened on that line.
A kind denizen of #c on Libera.Chat says that my analysis is sound. I had to ask to make sure I'm not spouting BS. 😛
Another possible cause they named is stack corruption. To quote: "thread bug or stack corruption or some other external effect, the code as written seems fine."
feld
•I understand the terminology and what the tools are doing, but have no clue how to do it.
Alex Gleason
•feld
•https://github.com/erlang/otp/commit/4bc282d812cc2c49aa3e2d073e96c720f16aa270
Alex Gleason
•I don’t know much C, but I guess that pointer
&proccould be the problem.Social Justice Wizard
•Alex Gleason
•Social Justice Wizard
•The only pointer that’s being dereferenced on that line is
proc, which is also dereferenced shortly before, so I don’t see how that line could normally cause a segfault.The
&operator binds after the->and.operators, meaning that it applies to the whole expressionproc->sig_inq.first. In other words, that==is checking whetherproc->sig_inq.nmsigs.nextis a pointer to the memory location ofproc->sig_inq.first. (The former is anErtsMessage **and the latter anErtsMessage *so that makes sense.)My C isn’t all that great either but I think the most likely thing to be going on is concurrent access to the data causing the problem, like
procbeing a valid pointer during the execution of one line and suddenly not anymore during the execution of the next line, because a thread running in parallel did something. That’s assuming the segfault really happened on that line.Social Justice Wizard
•Another possible cause they named is stack corruption. To quote: "thread bug or stack corruption or some other external effect, the code as written seems fine."
feld
•Alex if you're using docker, what's the base here? Perhaps this could be a glibc vs musl issue?
Alex Gleason
•