-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch from SJLJ exceptions to DWARF-2 #23
Comments
To be more precise, problematic here are only relocations at odd addresses, not relocations to odd addresses. Relocations to odd addresses will happen also eg with But something like
wont work. |
Thanks for creating / noticing this issue. It would be very unfortunate if we ended up with yet another poorly supported, non-standard and soon-to-be-killed feature. |
@th-otto Very true. Thanks for the precision. And we are lucky, because we got an answer from Andreas Schwab on the GCC mailing list: He proposes to configure GCC by with However, even configured that way, exceptions don't work out of the box. In my quick test, it failed as soon as the first exception was thrown. Full DWARF-2 exception support has some prerequisites. As I understand, it needs support for In conclusion, adding DWARF-2 exception support is quite complex, and impact several components. Not the kind of thing you can easily turn on with a single option. We will have to do that to modernize the mint* targets. But as SJLJ exceptions are not yet completely obsolete, work well, and are and much simpler than DWARF-2 exceptions, I advise to keep SJLJ for now. So we can focus to the stabilization of binutils/gcc/gdb for the m68k-atari-mintelf target. Then we could start the DWARF-2 migration project in a second time. Also, the switch from SJLJ to DWARF-2 exceptions will be an ABI change. So that will require a full rebuild of all the C++ libraries. |
A good introduction to difference between SJLJ / DWARF-2 exceptions can be found here. There are pointers to more detailed information. |
Hm, are you sure? I've read that message from Andreas already, and from what i understand, DW_EH_PE_aligned only has an influence about what values are written for relocations, and how they are interpreted when later read, but not where they are written. We should maybe try to create a testcase first where the default setting of DW_EH_PE_absptr really produces relocations at odd addresses. About crtbegin/crtend: i have to look, but i don't think that we need them. It's possible that unwind info has to be initialized somehow (in glibc that seems to be register_frame_info), but imho that can be done also at the same place as do_global_ctors_aux is called in libgcc.a. If more is needed for this, that would indeed be a bigger task. |
Well, I can just say that I encountered the odd-alignment issue during my early tests with PRG/ELF. Then yesterday, I looked at the m68k-elf relocations with my test program, and there was indeed relocations at odd addresses. Then I used DW_EH_PE_aligned with m68k-atari-mintelf, and there wasn't any odd relocation anymore. Maybe this was just by chance, no idea. Indeed, this should be checked with a proper testcase |
I've just added usage of crtbegin.o / crtend.o. That's a step forward. |
The patch to crtstuff.c looks wrong to me. CTOR_LIST/DTOR_LIST are already defined in libgcc2.c. A function to run the global constructors is also already defined there. Also all the the other stuff like That's why i meant that we currently don't not need crtbegin/crtend. They are only for shared libraries. Changes to binutils are therefor also wrong. They must be usable by any compiler, and not rely on crtbegin/crtend being present. |
No. It works well, as expected.
No. They are used by libgcc2.c (compiled as __main.o), but defined elsewhere. In my initial mintelf patch (as well as in the a.out linker) they were defined in the linker script. Now they are defined in crtbegin.o / crtend.o, like any other modern target. I must admit that, due to the many defines to handle all the special cases, GCC's crtstuff is a big mess.
No. See above.
No. They are good. You can even link an empty assembler file with -nostartfiles, and see that the link succeeds even without crtbegin.o / crtend.o. This is possible because the reference to those files is done with a wildcard like *crtbegin.o to make the dependency optional. I haven't invented that, that's just a copy/paste from m68kelf. Be sure that I carefully test my patches before pushing. Did you test them, or my binaries, before declaring this was wrong? BTW, if your real intent (that I must guess, as you don't report real-world issues) is to use the mintelf linker script with the a.out linker, that might indeed not work. But for a completely different reason. The original m68kelf linker uses a CONSTRUCTORS statement in the linker script. As it's an a.out-only thing (ignored by ELF), I didn't use it in the mintelf script. But it you really want to use that linker script with the a.out linker, just add that CONSTRUCTORS line to the external linker script m68kmintelf.x then it should work well. |
They are also defined there:
Your patches using
No, i did not test them yet. But having the same definition in different files can't be right. In any case, the function do_global_ctors_aux from crtstuff.c is not run at all. That would only be the case when we have support for an .init section.
No, that would not work indeed. That toolchain uses different linker scripts for a.out and elf. |
Now i did, and it does not work. (for testing i use
It doesn't make much sense anyway. If we activate DWARF2 unwind info, then |
This is a hack from the GCC team:
Proof is that I successfully compiled and run both a standard C++ program, and also an empty .S file with -nostartfiles.
No. I added that I needed those workarounds because it's very uncommon to use crtbegin.o / crtend.o without support for the .init section. But that's a temporary situation. I will remove those workarounds as soon as the .init section is usable in mintelf. Then we won't use any uncommon configuration at all. Just progress step by step.
Yes. That will be a further step.
This is unexpected. Did you also update GCC? I have changed the specs to provide crtbegin.o and crtend.o on the command line. Particularly: Both gcc and binutils must be updated at the same time. I've explicitly added a constraint in the Ubuntu packages. |
Yes, but i failed to realize that one of the patches did not apply cleanly, and the line that changed the STARTFILE_SPEC was rejected, so crtbegin.o was not used in the link. Apologies. The #error only triggers because we have neither an .init nor an .init_array, but only need the definitions for the constructors. Apparently the source was not prepared for that, although exactly such a scenario is mentioned in libgcc2.c. After i fixed that, constructors do work now. So in general, using crtbegin/crtend seems to be ok, but i still did a few more changes (in my own branch only for now), that makes sure that only one of the two I've also protected all the differences by ifdefs now. Maybe not strictly neccessary for the elf toolchain, but makes it easier for me to sync the sources. Now i have to do the same checks again, using the a.out toolchain. Sigh. |
I've just pushed a patch that makes it possible to enable dwarf2 exceptions at configure time (default is still sjlj). It also changes the
With the old setting of
instead. The link still succeeds with error status 0. If i run the program, the exception is not catched, and the program exits with a exit value of 1536 instead. There are also other strange differences. With the old setting, the generated assembler source contains .cfi_startproc, .cfi_endproc and .cfi_offset statements, even without -g. It also does not have any .eh_frame section at all. |
I found the cause for the link error. In my fork i don't include m68kelf.h, and for The exit-code 1536 is from an abort() call in unwind-pe.h in read_encoded_value_with_base(), so there still seems to be something wrong in the generated tables. |
I don't remember having seen any warning. And certainly not an error, as the link succeeded. But at runtime, exceptions didn't work as expected. I don't remember actual message, something like invalid system call, and maybe a bus error at address 0 in the ARAnyM's log. Not completely surprising, as the startup code was incomplete.
Ah. Fine. FYI, I'm preparing support for .init_array / .fini_array in binutils / gcc / mintlib. It's really straightforward, and that's the modern way to go. We will see if that's enough to get DWARF-2 exceptions working, or if there is yet another issue behind. |
What for? Those arrays just contain the same pointer list as |
.init_array / .fini_array is the modern way to go. That's just another way to add hooks at startup. Concretely, crtinit.o use it to initialize the .eh_frame section. |
??? That section is readonly, what do you want to initialize there? .init_array has nothing to do with exception handling. And yes, it is modern, because needed for shared libs. Other than that, it is exactly the same as all the .ctors sections, just without the leading and trailing elements. |
Done, I've pushed my work for the above (NOT the DWARF-2 experiments). Ubuntu binaries are currently building. As expected, now constructors use the .init_array section (instead of .ctors) and destructors use the .fini_array section (instead of .dtors). Also, the main() function in user programs doesn't implicitly call __main() anymore at the beginning. Any program expecting .init_array/.fini_array sections, starting by GCC itself, will work out of the box. Key point is that modern ELF systems use .init_array/.fini_array for intitialization. For dynamic libraries, yes, but also for static libraries and executables. Combined with crtbegin.o/crtend.o support, the mintelf target is now closer to the Linux behaviour. So regarding to GCC's mess of #ifdef's, we now use a much more standard ELF configuration, keeping away from obscure and poorly tested code paths. Finally, that will require less patches. Some of them could already be removed. So that's a step forward. Anyway, my initial motivation for this .init_array stuff was the ability to easily enable DWARF-2 exceptions and their .eh_frame stuff. Combined with @th-otto's Alas, it still fails like previously. As soon as the program throws an exception, it displays |
I still don't see why this should put us forward. Especially it is a step backward regarding exception handling, since the initial call to m68k-atari-mint-gcc/libgcc/libgcc2.c Lines 2393 to 2398 in 44322d4
It is also now impossible to omit that additional code when you know that you don't need it, by providing a dummy __main() function. This is done by several small tools, and often also in projects that use libcmini. I also don't see that this saves any patches. Quite contrary, it needs additional patches in other projects. MiNT is not linux, and we don't should not pretend to be like it. And btw, init_array is not used at all by linux, which uses the .init section instead. Apart from that, your patch in gcc is misplaced, where it is only done for cross-compilers. If at all, it must go into config.gcc. But really, i would ask to revert that. It is not going to help us in any way. |
Well my above comments where not completely right, there is an entry made into the .init_array that calls frame_dummy which in turn calls And it will also pull in all the exception handling stuff, even for C-compiled code that does not need it. |
I've pushed fixes for this to gcc and mintlib now. But i'm still not happy with this. Amongst others: in your previous commit we got rid of the The binutils should be independent of the compiler. We might be using different compilers lke gcc-4.6.4, gcc-7, gcc-13 etc, but all use the same binutils. One solution to this would be to |
This was my initial motivation.
Indeed, I noticed that TM clone stuff in the sources. I saw on the web this was related to threading/atomic stuff, so most likely useless for our platform. It seems that most embedded platflorms use
Ah, this is very bad. BTW, I wonder why this is different from the __main() solution. I would have said that such difference could come when switching from SJLJ to DWARF-2. Anyway, I haven't strong opinions. If, after in-depth tests, it appears that the old method is clearly better, then go for it. We have seen that it's easy to configure gcc for one or other method. However, I'm keen to keep the new method for some time, to give us the time familiarize with the new method, and see precisely what is better or worse. When time comes to make a final choice, we will know why, regarding to pros and cons. |
In libgcc2.c that function is also called, but only when the target defines
Yes, i'll keep that for now. Switching the methods is a pain, because you have to rebuild & install all of binutils, mintlib and gcc. But whatever method is choosen, we'll have to find a way to move the call to |
Yes, it's the benefit of using crtbegin.o.
True. And if you want my 2 cents, the GNU people should also have used crtbegin.o/crtend.o for those
Using the same ld for different gcc versions, yes. But I wouldn't have imagined using the same ld binary for both the mint and mintelf gcc. Or you mean using the same ld for ELF-underscore and ELF-not-underscore variants? As a big fan of minimalistic patches, I wouldn't have imagined that.
Yes, that's a solution. Or provide aliases in C files, like EmuTOS. If you want other ideas: the linker script is generated by a shell script called
No problem. The |
That actually might work, if you tell m68k-atari-mintelf to use the m68kmint (old a.out-mintprg) emulation. But that would be strange.
Yes, this is what i had in mind. Not that i'm a fan of supporting both (we still have to finally decide which one we use), we certainly don't want to confuse people. But it might take some time until we have backported the change to old compilers. And that should certainly be done for atleast 4.6.4 (for EmuTOS) and 7.5.0 (currently used for freemint).
Its only a few lines more in the linker script, which is completely new anyway. No other patches to existing code.
In Emutos that was only needed for functions from libgcc. But it would be strange to require such "hacks" in every project. In fact, in most cases only a few symbols are affected, the newly introduced __init_array etc that are now referenced by mintlib, and etext() that was already referenced before.
That would require different, non-standard compiler switches depending on which compiler is used. Not a real option imho, and difficult to understand by casual users why this is needed. |
I've tracked it down to _Unwind_Find_FDE returning a NULL pointer, which should not happen. I also wonder whether code like m68k-atari-mint-gcc/libgcc/unwind-dw2.c Lines 184 to 185 in 5557aef
Also dubios: m68k-atari-mint-gcc/libgcc/unwind-dw2.c Lines 193 to 194 in 5557aef
|
Excellent job, @th-otto 😃 This immediately fixes my general testcase. I'm in the mood of definitely enabling |
I have been using it for a week now without something suspicious but my tests have been very limited so far. |
Done. I've rebuilt all my mintelf binaries with current sources, and used BTH, there is still a bug with DWARF-2 exceptions. Basic testcase.
We know that Why is there a difference with or without MonST2? Is that |
New tests:
How possible? |
Yes, i plan to use that setting too. If some problems come up, we have to fix it, but that way more users will test it.
TOS atleast sets all vectors to an rte instruction upon boot, before initializing specific vectors like Maybe MonST2 overwrites that vector to catch errors? It certainly does so for buserror etc. As a first step, i would configure gcc with |
I finally compiled gcc with What's the difference between MONST2 and without it? Alignment? Relocation? TPA zero-initialization? Vectors initialization? Or just a MONST2 bug? Sigh. |
At least, I can use the MonST2 debugger. I see that |
@th-otto I've reverted the 7562d09 patch, and now everything works perfectly well. With or without MonST2. With Steem and its 68000 emulation, where unaligned access should cause address errors. I suggest you to reconsider your patch. I would still be interested to see a testcase justifying its need. |
BTW, generally speaking, the evil
Consequence of the second point: all accesses to members, including As illustration, here is a testcase:
So if the goal of 7562d09 was to avoid unaligned access, it's useless because |
Uh oh. That's bad. I'll have a look.
For 68020+ its not useless, because it avoids the slowness of byte moves.
Yes, but in this case we cannot avoid using packed structs. They are defined that way in the dwarf-2 spec, and access to them must match how they are generated by the compiler. |
There are a few things that irritate me about that issue
So at least, the patch currently does not help for 68020, only for coldfire. |
Oh but you didn't push it. I should have checked before starting recompilation. :) I'm going to revert & push it then until we have a 100% working version. |
This reverts commit 7562d09. There are some issues to be investigated: #23 (comment)
@th-otto Indeed, we are far from having explained anything. Yesterday, I tried to revert the patch, as a desperate attempt, and it worked immediately. Before that (with patch enabled) I experienced some randomness, as it worked fine from desktop but failed with MonST2. I'm not aware of anything that could cause simple user programs to behave differently from desktop or inside the debugger. One possible cause could be uninitialized variables, causing randomness. In that case, fact that the reversion of the patch solved the issue could be pure chance. Real cause could very well be something else. I will have a look again. |
I'm back and kicking. @mikrosk I see that you've reverted and pushed @th-otto's "unaligned access" patch on 03/10/2023. That's fine, as that patch doesn't seem to fix any real-world issue. Less patches we have, less is the probability we introduce new issues. So let's keep away from that, at least for now. I've rebuilt everything again (pushed on my Ubuntu PPA ELF). But even without the above patch, the MonST2 issue is back! I still haven't added proper I made 3 tests:
So there's something new: Omikron Basic behaves just like MonST2. It fails. While it works well from the desktop. Definitely, there is some randomness. I can only imagine:
Anyway, I'm determined to find the root cause of that random issue. Whatever it is, I will find it. Using debuggers, traces, or whatever possible. |
The problem is here: m68k-atari-mint-gcc/libgcc/unwind-pe.h Lines 201 to 207 in 2df5021
The I added a quick and dirty fix to deal with that wrong alignment at runtime. It seems to work on first hit, but crashes a bit further. I will continue investigation. But now we know why we have more trouble than other m68k targets:
It's incredible to see that, whenever our runtime environment is different from UNIX or embedded standards, we get into major trouble. |
Great find. I saw that code already, too bad i didn't realize that it could cause trouble. But instead of fixing anything at runtime, maybe you can try the attached patch. It fixed the code above, and the counterpart where that alignment is generated. |
Ah, I've been severely fooled 😩 Fact is that
After having added my patch, I only rebuilt libgcc.a, but not libstdc++.a. So my test program was only half-baked. After rebuilding libstdc++.a, it works perfectly, even when the program is loaded at a not long-aligned address. For memory, here is my simple runtime patch. Could be refined, but perfectly working. We also have @th-otto's solution to force GCC to produce 2-byte aligned So we have 2 potential solutions:
So what to do? Basically:
Note that the difference is an ABI change. So if that changes, everything has to be rebuilt again. I don't mind, but this has to be taken in consideration for the future. Another option would probably be to introduce a new nonstandard 2-byte alignment define such as |
Would be nice if you could try it. Your patch has the problem that it generates references to
Huh? Whats so complicated about this?
Yes, that is indeed the case. But only for an ABI that did not exist until now ;) And it is an ABI that is solely used by gcc itself, and not directly accessible to applications.
That's also possible, but it does not make the patch simpler. It would infact make it more complicated, because you then have to change some other locations where DW_EH_PE_aligned is checked. |
First of all, I have no strong opinion between the 2 solutions. But it would be better to do the best choice right now, if there is one, by examining the pros and cons.
The patch doesn't apply to the GCC 13 branch. One reason is that Anyway, I slightly adjusted the patch to make it compile. Unfortunately, it doesn't work. The test program fails with a Bus Error due to an invalid pointer (address 0x0e IIRC), even when run from desktop. So that's independent of the program's alignment. However, I still have good news. The patch works partially. I traced So there is something else to fix.
Look at the number of files and lines modified by the patch, and you will have a clue. Concretely, I'm worried about patches which changes several places over the code. It's easy to forget some places (could even be the current issue - or not). If the GCC people add new code about that in newer GCC versions, we might again miss it, causing again hidden issues hard to debug, etc. This is really what worries me. regarding to the mint* situation, I would prefer simple patches unlikely to cause trouble in the future. If we are confident enough that any patch won't cause trouble in the future, then go for it.
I wonder if that
Yes, I know that. That was a quick and dirty patch, as proof of concept. I used Anyway @th-otto, I suggest you to test your own patch. You should see a Bus Error when running a testcase, whenever the program is long-aligned or not. |
By thinking twice, my test may not have been accurate. This is because such compile-time patch is an ABI change. So to test it accurately, everything must be recompiled, including all the libs. To ease testing gcc patches, I only recompile gcc. Then I use a shell script that transparently redirects gcc/g++ to the newly compiled xgcc/xg++ with proper -B options. This is perfectly valid as long as gcc/libgcc are the only affected components. But as said earlier, As a rule of thumb, we must be very careful when testing gcc patches, specially those involving ABI changes. It's easy to get fooled if everything hasn't been recompiled with the new compiler. |
Apologies. Looks i did the changes in the gcc-7 branch.
The patches changes 3 files. One of them is mint.h, which will certainly not be changed by GNU people because it isn't even in the distribution. Another is unwind-pe.h, patching the same location as your patch. And (except for the changed filename) the patch can still be used for gcc-13, it applies cleanly only at some different line numbers.
... unless gcc is too smart and optimizes away (addr & 3) when it "knows" that the variable is aligned on 4 bytes ;) And indeed, it does optimize it, so that variable can't be in the same source file atleast:
results in
Its dangerous to do such things. I once did similar things, but you will sometimes shoot yourself in the foot. The build process it quite complex, and it is sometimes not even safe to re-run "make" in the build tree after changing source-files, depending on what components have already been compiled. I have given up on that, and do the "make install" instead into 2 different directories, one empty to generate the tarball, and other which is in on my
I have no idea why that header file is used in libstdc++, but that library has to be recompiled anyway, because of the different layout of the generated .eh_frame structures.
I don't get a bus-error, but a call to abort() instead. Have to check where that comes from. |
Problem with my patch seems to be the linker, which also needs to know how many bytes to skip for DW_EH_PE_aligned: So maybe really stick to your solution, but please clean it up a bit and put it inside ifdef Edit: maybe something like attached patch will do. PS: how did you attach your patch above? When i try to use such a filename, i get |
Oh! Good catch @th-otto, I wouldn't have imagined that. Sadly, this is what I feared. Even if the compile-time patch is theoretically better, it potentially requires patches at several places. And having to patch both gcc and the linker is a severe drawback.
OK, let's go for it.
That looks fine, and minimal. So please push it, with a reference to this issue in the commit message, and that will be perfect.
Hmm, nothing special. This might be related to the MIME type. This could be different among systems/browsers. I don't remember if I used Firefox or Chrome, on Windows or Linux. |
gcc & binutils both assume that the pointer for such addresses is aligned at a 4-byte boundary, but this cannot be guaranteed by TOS at runtime See also #23 (comment)
BTW, there must be something that we have missed when updating from gcc 4.6.4 to gcc-7 and above. With the following code:
gcc-4.6.4 -m68000 produces
So it does not care about the alignment.
So it uses single byte moves, shifts etc. That's why the access to unaligned pointer works. As already mentioned above, i think the distinct behaviour is because of m68k-atari-mint-gcc/gcc/config/m68k/m68k.cc Lines 549 to 550 in 0e0cacb
m68k-atari-mint-gcc/gcc/config/m68k/m68k.h Lines 307 to 308 in 0e0cacb
|
Many thanks @th-otto for your commit 0e0cacb. I've rebuilt my Ubuntu Binaries, that works fine. Well, that DWARF-2 exception stuff has been incredibly hard to setup. We spent many weeks on it. But we finally understood all the issues, and fixed them. So I propose to close this issue. We could always open new Issues for future specific cases.
Well, that alignment question is a different topic. So I propose to open another Issue for that. I don't remember well how ColdFire behaves. IIRC it can vary among different cores and hardware. |
Nobody objects, closing. ;) |
SJLJ exceptions slowly become obsolete, in favor to DWARF2 exceptions.
m68k-elf
andm68k-linux
already use DWARF2 exceptions. So we should do the same in the newm68k-atari-mintelf
toolchain.It is stated there that DWARF2
.eh_frame
section contains relocations to odd addresses:m68k-atari-mint-gcc/gcc/config/m68k/mint.h
Lines 77 to 79 in f2ad8b0
Is this really true? Why? Is there a way to avoid that?
I've just asked to the GCC mailing list. Let's see.
https://gcc.gnu.org/pipermail/gcc/2023-September/242414.html
The text was updated successfully, but these errors were encountered: