Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support of OCaml 5.0 (cleaned version) #124

Closed
wants to merge 42 commits into from
Closed

Support of OCaml 5.0 (cleaned version) #124

wants to merge 42 commits into from

Conversation

dinosaure
Copy link
Member

@dinosaure dinosaure commented Jan 11, 2023

This is the cleaned version of #122. From what I saw, I have some questions:

  • It seems that we export stub usleep but we don't implement it. We should probably implement it when Solo5 has a function to sleep (see solo5_yield)
  • Some functions are just exported (like qsort) but no implementation are needed - at least, at this level of our usual compilation path. I wonder what happens to these functions. If they are exported, is it because they are required? By whom? For what?

Finally, this PR only offers something that compiles. The project has not yet tested with a unikernel. It should be reconfirmed that qubes-firewall works and if we can get

  • dns-resolver (small unikernel)
  • unipi (HTTP & Git)
  • opam-mirror (block device)

This would be very good to confirm that this PR can be merged and that the MirageOS world can move to OCaml 5.

dinosaure and others added 20 commits January 11, 2023 14:23
It's probably a better way to add libraries when we link objects from
OCaml. With these options, we allow the linker to resolve all symbols
regardless the order of these libraries. Indeed, the linker will
repeatly lookup on these libraries until all symbols are resolved.

However, it seems that these options can have an impact about
performances of the linker. If we need to improve this situation, we
must recalculate a topological order of libraries. But for our purpose,
that's probably fine.

Co-authored-by: Pierre Alain <[email protected]>
Co-authored-by: Kate <[email protected]>
Co-authored-by: Christiano Haesbaert <[email protected]>
`sed -i` is not POSIX, it's better to generate a `*.sed` file and move
it to the real destination.

Co-authored-by: Pierre Alain <[email protected]>
Co-authored-by: Kate <[email protected]>
Co-authored-by: Christiano Haesbaert <[email protected]>
Co-authored-by: Pierre Alain <[email protected]>
Co-authored-by: Kate <[email protected]>
Co-authored-by: Christiano Haesbaert <[email protected]>
With OCaml 5, the pattern was updated and has $$PTHREAD_LIB. We update
accordingly.

Co-authored-by: Pierre Alain <[email protected]>
Co-authored-by: Kate <[email protected]>
Co-authored-by: Christiano Haesbaert <[email protected]>
Co-authored-by: Pierre Alain <[email protected]>
Co-authored-by: Kate <[email protected]>
Co-authored-by: Christiano Haesbaert <[email protected]>
Co-authored-by: Pierre Alain <[email protected]>
Co-authored-by: Kate <[email protected]>
Co-authored-by: Christiano Haesbaert <[email protected]>
Co-authored-by: Pierre Alain <[email protected]>
Co-authored-by: Kate <[email protected]>
Co-authored-by: Christiano Haesbaert <[email protected]>
…efile

Co-authored-by: Pierre Alain <[email protected]>
Co-authored-by: Kate <[email protected]>
Co-authored-by: Christiano Haesbaert <[email protected]>
Co-authored-by: Pierre Alain <[email protected]>
Co-authored-by: Kate <[email protected]>
Co-authored-by: Christiano Haesbaert <[email protected]>
Currently, OCaml 5.0 wants to create 128 domains which requires a big
allocation. We decided to allow the usage of only 1 domain.

Co-authored-by: Pierre Alain <[email protected]>
Co-authored-by: Kate <[email protected]>
Co-authored-by: Christiano Haesbaert <[email protected]>
Co-authored-by: Pierre Alain <[email protected]>
Co-authored-by: Kate <[email protected]>
Co-authored-by: Christiano Haesbaert <[email protected]>
Co-authored-by: Pierre Alain <[email protected]>
Co-authored-by: Kate <[email protected]>
Co-authored-by: Christiano Haesbaert <[email protected]>
Co-authored-by: Pierre Alain <[email protected]>
Co-authored-by: Kate <[email protected]>
Co-authored-by: Christiano Haesbaert <[email protected]>
Co-authored-by: Pierre Alain <[email protected]>
Co-authored-by: Kate <[email protected]>
Co-authored-by: Christiano Haesbaert <[email protected]>
Co-authored-by: Pierre Alain <[email protected]>
Co-authored-by: Kate <[email protected]>
Co-authored-by: Christiano Haesbaert <[email protected]>
…file)

Co-authored-by: Pierre Alain <[email protected]>
Co-authored-by: Kate <[email protected]>
Co-authored-by: Christiano Haesbaert <[email protected]>
Co-authored-by: Pierre Alain <[email protected]>
Co-authored-by: Kate <[email protected]>
Co-authored-by: Christiano Haesbaert <[email protected]>
Co-authored-by: Pierre Alain <[email protected]>
Co-authored-by: Kate <[email protected]>
Co-authored-by: Christiano Haesbaert <[email protected]>
Initially, we used NULL and it seems that these values are used by GMP.
We probably break something here if NULL <> 0 but conventionally,
NULL = 0.

Co-authored-by: Pierre Alain <[email protected]>
Co-authored-by: Kate <[email protected]>
Co-authored-by: Christiano Haesbaert <[email protected]>
Co-authored-by: Pierre Alain <[email protected]>
Co-authored-by: Kate <[email protected]>
Co-authored-by: Christiano Haesbaert <[email protected]>
@haesbaert
Copy link
Member

IMHO this is far from ready, we didn't really review much, we just got it to work.

@dinosaure
Copy link
Member Author

A precision about words used in my message:

  • stub is when we actually provide an implementation which raises an error
  • export is when we just describe the function (no implementation is done here, only strcpy is a special case when it is provided by Solo5)
  • define is about values

dinosaure and others added 2 commits January 11, 2023 15:08
Co-authored-by: Pierre Alain <[email protected]>
Co-authored-by: Kate <[email protected]>
Co-authored-by: Christiano Haesbaert <[email protected]>
Co-authored-by: Pierre Alain <[email protected]>
Co-authored-by: Kate <[email protected]>
Co-authored-by: Christiano Haesbaert <[email protected]>
nolibc/puts.c Show resolved Hide resolved
nolibc/sysdeps_solo5.c Outdated Show resolved Hide resolved
nolibc/sysconf.c Outdated Show resolved Hide resolved
nolibc/stubs.c Show resolved Hide resolved
nolibc/include/unistd.h Show resolved Hide resolved
nolibc/include/stdio.h Outdated Show resolved Hide resolved
nolibc/mmap.c Outdated Show resolved Hide resolved
nolibc/sysconf.c Outdated Show resolved Hide resolved
nolibc/assert.c Outdated Show resolved Hide resolved
nolibc/include/sys/mman.h Outdated Show resolved Hide resolved
@hannesm
Copy link
Member

hannesm commented Jan 12, 2023

From the cirrus CI, there is one warning and one error:

Entering directory '/tmp/cirrus-ci-build/test'
(cd _build/solo5 && /.opam/5.0.0/bin/x86_64-solo5-none-static-cc -I/.opam/5.0.0/solo5-sysroot/include/nolibc/ -include _solo5/overrides.h -O2 -fno-strict-aliasing -fwrapv -pthread -D_FILE_OFFSET_BITS=64 -I/.opam/5.0.0/.opam-switch/build/ocaml-solo5.0.8.1/nolibc/include -include _solo5/overrides.h -I/.opam/5.0.0/.opam-switch/build/ocaml-solo5.0.8.1/openlibm/include -I/.opam/5.0.0/.opam-switch/build/ocaml-solo5.0.8.1/openlibm/src -nostdlib -I/.opam/5.0.0/solo5-sysroot/include/nolibc/ -include _solo5/overrides.h -O2 -fno-strict-aliasing -fwrapv -pthread -g -I /.opam/5.0.0/solo5-sysroot/lib/ocaml -o startup.o -c startup.c)
startup.c:16:1: warning: control reaches end of non-void function [-Wreturn-type]
}
^
1 warning generated.
File "dune", line 2, characters 7-11:
2 |  (name main)
           ^^^^
(cd _build/solo5 && /.opam/5.0.0/solo5-sysroot/bin/ocamlopt -w @[email protected]@30..39@[email protected]@[email protected] -strict-sequence -strict-formats -short-paths -keep-locs -g -o main.exe manifest.o startup.o .main.eobjs/native/dune__exe__Main.cmx -g -w +A-4-41-42-44 -bin-annot -strict-sequence -principal -safe-string -color always -cclib '-z solo5-abi=hvt')
ld.lld: error: /.opam/5.0.0/solo5-sysroot/lib/ocaml/libasmrun.a(domain.n.o) has an STT_TLS symbol but doesn't have an SHF_TLS section
cc: error: linker command failed with exit code 1 (use -v to see invocation)
File "caml_startup", line 1:
Error: Error during linking (exit code 1)

Would be great if someone more into TLS could take a look at the TLS issue (I suspect this is due to usage of clang/llvm both as compiler and as linker).

@palainp
Copy link
Member

palainp commented Jan 12, 2023

startup.c:16:1: warning: control reaches end of non-void function [-Wreturn-type]

Indeed the solo5_app_main (in test/startup.c and also example/startup.c) does not return anything (and probably never reach that point), mirage/mirage-xen returns 0, and the tests suites in Solo5 return SOLO5_EXIT_SUCCESS or SOLO5_EXIT_FAILURE. I guess return 0; should be fine here.

ld.lld: error: /.opam/5.0.0/solo5-sysroot/lib/ocaml/libasmrun.a(domain.n.o) has an STT_TLS symbol but doesn't have an SHF_TLS section

It's interesting that the error was already spotted in Solo5 in https://github.com/Solo5/solo5/blob/7ba74eb9f645e07570cd35a833eddec4918eaf64/tests/test_tls/test_tls.c. And the test is deactivated when compiled with lld. The check in lld seems to be https://github.com/llvm-mirror/lld/blob/64b024a57c56c3528d6be3d14be5e3da42614a6f/ELF/Symbols.cpp#L112.

dinosaure and others added 5 commits January 13, 2023 13:11
Co-authored-by: Pierre Alain <[email protected]>
Co-authored-by: Kate <[email protected]>
Co-authored-by: Christiano Haesbaert <[email protected]>
Co-authored-by: Pierre Alain <[email protected]>
Co-authored-by: Kate <[email protected]>
Co-authored-by: Christiano Haesbaert <[email protected]>
Co-authored-by: Pierre Alain <[email protected]>
Co-authored-by: Kate <[email protected]>
Co-authored-by: Christiano Haesbaert <[email protected]>
Co-authored-by: Pierre Alain <[email protected]>
Co-authored-by: Kate <[email protected]>
Co-authored-by: Christiano Haesbaert <[email protected]>
@hannesm
Copy link
Member

hannesm commented Jan 17, 2023

Thanks for all the work @palainp @kit-ty-kate @dinosaure @haesbaert

About the thread-local storage, I see some issues:

I started to read https://maskray.me/blog/2021-02-14-all-about-thread-local-storage to get a better understanding of thread local storage, but won't have much time this month to work on a more thorough review or patches.

nolibc/include/stdatomic.h Outdated Show resolved Hide resolved
nolibc/include/stdatomic.h Outdated Show resolved Hide resolved
nolibc/include/stdatomic.h Outdated Show resolved Hide resolved
@palainp
Copy link
Member

palainp commented Feb 1, 2023

EDIT: The following was due to a faulting linker script. This is now fixed with the latest release.

Along with the solo5 PR (Solo5/solo5#542), I tried to correctly use the TLS variant II for my processor (https://github.com/Solo5/solo5/blob/8c3a744f998b9977cfc6cd29e0cc40ae2efba167/tests/test_tls/test_tls.c#L52 TLS point to an address after the area) on OpenBSD. And it works (at least a few steps further)!

However I still have a page fault error:

  __|  _ \  |  _ \ __ \
\__ \ (   | | (   |  ) |
____/\___/ _|\___/____/
Solo5: Bindings version v0.7.1-66-gf1daea2
Solo5: Memory map: 512 MB addressable:
Solo5:   reserved @ (0x0 - 0xfffff)
Solo5:       text @ (0x100000 - 0x21bfff)
Solo5:     rodata @ (0x21c000 - 0x22cfff)
Solo5:       data @ (0x22d000 - 0x2e3fff)
Solo5:       heap >= 0x2e4000 < stack < 0x20000000
Solo5: trap: type=#PF ec=0x0 rip=0x1f8f70 rsp=0x1ffffe98 rflags=0x10002
Solo5: trap: cr2=0x0
Solo5: ABORT: cpu_x86_64.c:181: Fatal trap

The 0x1f8f70 address is something really strange here:

00000000001f8f70 <caml_alloc_final_info>:
  1f8f70:       4c 8b 1d 89 70 e0 ff    mov    -2068343(%rip),%r11        # 0 <domain_field_caml_young_limit>
  1f8f77:       4c 33 1c 24             xor    (%rsp),%r11
  1f8f7b:       55                      push   %rbp
  1f8f7c:       48 89 e5                mov    %rsp,%rbp
  1f8f7f:       41 53                   push   %r11
  1f8f81:       41 56                   push   %r14
  1f8f83:       bf 70 00 00 00          mov    $0x70,%edi
  1f8f88:       e8 b3 22 01 00          callq  20b240 <caml_stat_alloc_noexc>
  1f8f8d:       49 89 c6                mov    %rax,%r14
  1f8f90:       48 85 c0                test   %rax,%rax
  1f8f93:       74 0f                   je     1f8fa4 <caml_alloc_final_info+0x34>
  1f8f95:       ba 70 00 00 00          mov    $0x70,%edx
  1f8f9a:       4c 89 f7                mov    %r14,%rdi
  1f8f9d:       31 f6                   xor    %esi,%esi
  1f8f9f:       e8 7c 3b f8 ff          callq  17cb20 <memset>
...

The top instruction is related to stack protection and for all other function I inspected it uses a retguard symbol (as example the previous function in the same C file):

00000000001f8f20 <caml_final_release>:
  1f8f20:       4c 8b 1d c1 2f 0e 00    mov    929729(%rip),%r11        # 2dbee8 <__retguard_3085>
  1f8f27:       4c 33 1c 24             xor    (%rsp),%r11
  1f8f2b:       55                      push   %rbp
  1f8f2c:       48 89 e5                mov    %rsp,%rbp
...

I don't understand why for that specific function the behavior has changed :(

@hannesm
Copy link
Member

hannesm commented May 19, 2023

I'm curious, now that solo5 0.8 hit the release road, what is the status of this PR? Does it work on some (all?) platforms? Should we proceed (with review, more testing) in order to get MirageOS working with OCaml 5?

A brief message would be welcome, maybe @palainp knows the details (since he was involved in the most recent commits).

Thanks a lot.

@palainp
Copy link
Member

palainp commented May 19, 2023

Thanks @hannesm for the reminder.
From my POV it would be great to restart the CI test (the last commit was pushed when solo5 was not released in its last version) and eventually catch any error.
With this PR pinned I currently use Ocaml 5 with Xen and spt targets really fine :)
Also testing on other systems should be great!

@hannesm hannesm mentioned this pull request May 19, 2023
@TheLortex
Copy link
Member

Hi, thank you again for this work !
I managed to build eio-solo5 on top of that and assemble the network stack up to dream.(https://github.com/TheLortex/eio-solo5)

Using this proof of concept I've been able to test solo5-hvt with ocaml 5 / eio and didn't encounter any issue.

@hannesm
Copy link
Member

hannesm commented May 19, 2023

Hi, thank you again for this work ! I managed to build eio-solo5 on top of that and assemble the network stack up to dream.(https://github.com/TheLortex/eio-solo5)

Using this proof of concept I've been able to test solo5-hvt with ocaml 5 / eio and didn't encounter any issue.

Great to hear. From my point of view, we should first review this PR, test it thoroughly and release it (without switching the effective layer in MirageOS), also evaluate performance etc.

I understand that developing the cool new stuff is exciting, but if nobody cares about upstreaming and pushing the PR to something that we're happy to maintain, there'll be all these vendoring / pin to random branches (that are a nightmare to update).

So, TL;DR: my approach is still "upstream first". Since I'm a bit short on time (and my OCaml 5 knowledge is barely existant), I asked whether there are others (including you :) that would go through the code and thoroughly review it. :)

@@ -22,7 +22,7 @@ depends: [
"conf-which" {build}
"ocamlfind" {build} # needed by dune context (for tests)
"ocaml-src" {build}
"ocaml" {>= "4.12.1" & < "4.15.0"}
"ocaml" {>= "5.0" & < "5.1"}
"solo5" {>= "0.7.0"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be "solo5" {>= "0.8.0"}.

@@ -21,7 +21,7 @@ depends: [
"conf-which" {build}
"ocamlfind" {build} # needed by dune context (for tests)
"ocaml-src" {build}
"ocaml" {>= "4.12.1" & < "4.15.0"}
"ocaml" {>= "5.0" & < "5.1"}
"solo5" {>= "0.7.0"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be "solo5" {>= "0.8.0"}.

Copy link
Contributor

@fabbing fabbing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shym and I have reviewed this PR and it looks good to us, with a few minor suggestions. Although we only tested it with toy examples.

We also have a couple of questions:

  • Why not use only one sed command per file in the Makefile to improve readability and to avoid multiple command executions and temporary files?
  • Shouldn't a comment be added to nolibc/stubs.c to explain the choice between preferring one of STUB_ABORT, STUB_IGNORE, STUB_WARN_ONCE?

We think it would be interesting to target OCaml 5.1.1 directly, as it brings bug and performance fixes. The build system has also changed a lot since then, which would make it difficult to support both 5.0.0 and 5.1.1.
We are considering working on this, but would be eager to hear your opinion before starting.


test-headers: $(TEST_H_OBJS)
test-headers: test-include/sys/ $(TEST_H_OBJS)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
test-headers: test-include/sys/ $(TEST_H_OBJS)
test-headers: $(TEST_H_OBJS) | test-include/sys/

The dependency on test-include/sys/ is order only.

Comment on lines 13 to +16
* The following definitions are not required by the OCaml runtime, but are
* needed to build the freestanding version of GMP used by Mirage.
* For OCaml 5.0.0, it's not totally true. SIG_{BLOCK,SETMASK,IGN,DFL) are
* needed by the OCaml runtime.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* The following definitions are not required by the OCaml runtime, but are
* needed to build the freestanding version of GMP used by Mirage.
* For OCaml 5.0.0, it's not totally true. SIG_{BLOCK,SETMASK,IGN,DFL) are
* needed by the OCaml runtime.
* The following definitions are required to build the freestanding version of GMP used by Mirage.
OCaml 5.0.0 also required the SIG_{BLOCK,SETMASK,IGN,DFL) definition.

We rephrased part of the comment that was outdated by OCaml 5.0.0 support.

@@ -11,7 +11,10 @@ struct stat {
#define S_IFMT 0
#define S_IFREG 0
#define S_ISREG(x) (0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#define S_ISREG(x) (0)
#define S_ISREG(x) 0

Nitpick, the parentheses were not useful here.

@palainp
Copy link
Member

palainp commented Jan 30, 2024

@fabbing @shym, thank you for the review!

I'm only speaking for myself, but I totally approve if the building could be reworked (even if it doesn't work with 5.0.0, as it doesn't seem possible to deal with both 4.14.1 and 5+, it doesn't matter if 5.0 is skipped in the process).

One thing that still needs work is the current absence of munmap (currently STUB_IGNORE) which leads to memory leaks (probably only visible with bigger tests than the toy examples). So far, I've written code for that which needs to be reviewed and tested in the pending #129 PR to this PR 😅 or in #131 for the 4.14 backport.

@dinosaure
Copy link
Member Author

Why not use only one sed command per file in the Makefile to improve readability and to avoid multiple command executions and temporary files?

The advantage of having several seds is that you can contextualise them in relation to their commits and the PRs that have integrated them. These modifications can be difficult to understand in their current form, and historical information helps us to better understand the ins and outs of these arbitrary choices.

Shouldn't a comment be added to nolibc/stubs‧c to explain the choice between preferring one of STUB_ABORT, STUB_IGNORE, STUB_WARN_ONCE?

More comments is better 👍.

We think it would be interesting to target OCaml 5.1.1 directly, as it brings bug and performance fixes. The build system has also changed a lot since then, which would make it difficult to support both 5.0.0 and 5.1.1.

This PR is always set aside because these versions (5.0 and 5.1) lack compaction. The unikernels we are developing are mainly services. We therefore need a runtime capable of using memory that can be defragmented (and that the GC compacts) in order to provide these services for as long as possible.

As @palainp pointed out, we may need to test allocation logic other than dlmalloc. Here again, we need to do some testing (in particular with unikernels such as dns-resolver or tlstunnel) with such an allocator to see whether it corresponds to our use or not.

We will be delighted to have an extension of this PR with OCaml 5.2. And I'd be delighted to see more details of a possibility to experiment with OCaml 5 while keeping OCaml 4.14 support (I'll be more available for February).

@hannesm
Copy link
Member

hannesm commented Feb 1, 2024

Dear @fabbing @shym,

your review is highly welcome. I do agree with @dinosaure that likely we should target OCaml 5.2 (which allows compaction), and skip 5.0, 5.1 series.

What I'm curious about is that we have some review comments in #122 (or are they all addressed?) -- would be great if you could take a look at them (and comment those which are resolved, maybe re-raise them here if they are not yet resolved).

And then there is #129 -- If I understand correctly, that's mostly additional mmap stuff. In the end, it would be nice to agree on a single PR (this one?), and put the mmap changes here as well.

In terms of ocaml-solo5 and OCaml 4 / OCaml 5 support: I don't think it is worth to have ocaml-solo5 that supports both versions of OCaml at the same time, so we should move the OCaml 4 support to a dedicated branch (in order to keep it up to date when OCaml upstream changes), and support only OCaml 5 in the main branch.

@shym shym mentioned this pull request Apr 18, 2024
6 tasks
@hannesm
Copy link
Member

hannesm commented May 6, 2024

This has been superseeded by #134, closing. Please let me know if this is not the case. The review comments have been addressed in #134 as far as I know. But obviously, another round of reviews or triaging the comments here and in #122 whether they're still applicable to #134 won't hurt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants