Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade futures-lite #5

Merged
merged 2 commits into from
Aug 23, 2024
Merged

Upgrade futures-lite #5

merged 2 commits into from
Aug 23, 2024

Conversation

fko-kuptec
Copy link
Contributor

@fko-kuptec fko-kuptec commented Aug 22, 2024

All tests are still passing. The upgrade does not seem to have had an impact on the code.

Open Questions

  • Update crate's version to 0.2.0, since this is a breaking change in the public API?

@ivmarkov
Copy link
Owner

Open Questions

  • Update crate's version to 0.2.0, since this is a breaking change in the public API?

It is not published on crates-io yet anyway, so not sure it matters, but why not? Please do update to 0.2

@fko-kuptec
Copy link
Contributor Author

Done :)

@ivmarkov
Copy link
Owner

Done :)

Give me a couple of hours to return, to tag the current git repo surface and to merge your changes.

Does it work for you btw? I mean async-io-mini? If yes, I'll publish the version with your changes at crates.io

@fko-kuptec
Copy link
Contributor Author

async-io-mini seems to work, I think.

I personally had just some struggle getting the embassy timers to work. In my point of view, it would be great if that were explained somewhere easy to find. What I had to do for the ESP was:

  • Import esp-idf-svc with feature embassy-time-driver
  • Import embassy-time with just the feature generic-queue, because otherwise no timer queue driver was available. Also selecting the feature std would result in two conflicting time drivers being present
  • Import critical-section with feature std, because otherwise no critical-section implementation was selected

It would probably be good, if edge-net would be updated as well, if you want to publish async-io-mini. I already started a PR there, because it also depends on futures-lite v1.

Finally, I had some issues when trying to use edge-http in combination with edge-nal-std and edge-executor. When I included my HTTP server handling code, I could no longer connect to the ESP Wifi network. When I commented that part out, I could connect again. Trying to use the edge-nal-std TCP stack directly caused at some point weird panics, that went away, when I used Async<Tcp*> directly instead. I did not dig much into it, but something weird is going on there. Maybe a compiler bug again? I don't know. I am not sure, if these issues have something to do with async-io-mini or not.

@fko-kuptec
Copy link
Contributor Author

Just as a note: Since this is my work account, I will not be available until next Monday. So any additions to the PRs that might still need to be done by me will probably have to wait until then.

@ivmarkov
Copy link
Owner

ivmarkov commented Aug 23, 2024

I personally had just some struggle getting the embassy timers to work. In my point of view, it would be great if that were explained somewhere easy to find.

Ah yes. Will think about that. Maybe the README of async-io-mini would suffice. The thing is, making embassy-time timers work is actually a concern you'll have even if you don't use async-io-mini, but just plain ESP IDF... Probably this belongs to the Rust esp-rs book, in the STD section somewhere (I mean, support for the "embassy" echosystem in general, because besides embassy-time and critical-section we need to mention embassy-sync too; and that embassy-executor also works as long as you enable its std feature).

What I had to do for the ESP was:

  • Import esp-idf-svc with feature embassy-time-driver

Yes, this is necessary. In theory, you can also just enable feature std on embassy-time as that would enable the built-in STD timer driver impl, but that would give you timers with only 10ms resolution on the ESP IDF, so you want the native ESP IDF embassy-time-driver impl indeed.

  • Import embassy-time with just the feature generic-queue, because otherwise no timer queue driver was available.

esp-idf-svc used to have its own timer queue, but it was essentially a copy-paste of the generic one, so I removed it.
In fact, the only other queue besides generic-queue is the one which is embedded inside embassy-executor. So if you don't use embassy-executor, your only other option is generic-queue of course.

Also selecting the feature std would result in two conflicting time drivers being present

Yes, as per above, because std on embassy-time brings-in the STD time driver impl. And only that by the way, so for all practical purposes, the std feature on embassy-time is useless on MCUs.

  • Import critical-section with feature std, because otherwise no critical-section implementation was selected

This, or you can enable the critical-section feature of esp-idf-svc. Unlike the situation with std of embasy-time, there is not much of difference between critical-section/std and esp-idf-svc/critical-section.

It would probably be good, if edge-net would be updated as well, if you want to publish async-io-mini. I already started a PR there, because it also depends on futures-lite v1.

Will do, BUT: The major difference between async-io-mini/futures-lite and edge-net/futures-lite however is that:

  • In the async-io-mini case futures-lite is an API (because it is also an API in async-io upstream), and as such raising the futures-lite version from "1" to "2" (i.e. backwards incompatible) does mean raising the async-io-mini version from "0.1" to "0.2" instead of to "0.1.1" (should be backwards incompatible again).
  • For edge-net, the futures-lite crate is not even a dependency, but a dev-dependency. I.e. it is used only in the examples and nothing else. So raising it to v2 does not affect anything.

Finally, I had some issues when trying to use edge-http in combination with edge-nal-std and edge-executor. When I included my HTTP server handling code, I could no longer connect to the ESP Wifi network. When I commented that part out, I could connect again.

This sounds weird to say the least. What the HTTP server does should not and does not have anything to do with Wifi. Can you share an example?

Trying to use the edge-nal-std TCP stack directly caused at some point weird panics, that went away, when I used Async<Tcp*> directly instead. I did not dig much into it, but something weird is going on there. Maybe a compiler bug again? I don't know. I am not sure, if these issues have something to do with async-io-mini or not.

Could the reason for the panics be that in some cases you are missing to call this function of esp-idf-svc? The thing is, both async-io-mini and edge-nal-std - while created primarily with ESP-IDF in mind - are generic STD code (just like async-io). They "don't know" that the ESP IDF eventfd subsystem needs to be explicitly initialized, so you have to make sure you do that early in your program.

@ivmarkov ivmarkov merged commit 30d7fd7 into ivmarkov:master Aug 23, 2024
1 check passed
@fko-kuptec
Copy link
Contributor Author

This, or you can enable the critical-section feature of esp-idf-svc. Unlike the situation with std of embasy-time, there is not much of difference between critical-section/std and esp-idf-svc/critical-section.

Thanks for that tip. I wasn't aware of that feature flag.

Could the reason for the panics be that in some cases you are missing to call this function of esp-idf-svc? The thing is, both async-io-mini and edge-nal-std - while created primarily with ESP-IDF in mind - are generic STD code (just like async-io). They "don't know" that the ESP IDF eventfd subsystem needs to be explicitly initialized, so you have to make sure you do that early in your program.

I am enabling this vfs before running the server code. Therefore, that should not be the root cause.

@fko-kuptec
Copy link
Contributor Author

This sounds weird to say the least. What the HTTP server does should not and does not have anything to do with Wifi. Can you share an example?

I tried to reproduce the issue, but now, I am facing some other panics, which change if I include some logging or not. To me, that sounds like a race condition. I uploaded the code here, if you want to look into it. The README contains some more information. However, this is not critical for me, at least not now. My actual project seems to work, and since this is my job, I currently don't want to spend more time in debugging.

@ivmarkov
Copy link
Owner

You are running this project with the default stack size of the main esp-idf task which is very small (3k). Increase it. The http server needs more, as it internally allocates two large buffers.

@fko-kuptec
Copy link
Contributor Author

But the server code itself is run in a separate thread with a 32k stack, at least that's what I intended?

@ivmarkov
Copy link
Owner

ivmarkov commented Aug 27, 2024

Yes I noticed in the meantime. Unlikely, but you might still be running out of stack, because you are allocating the DefaultServer inside the future. You can try as an experiment to allocate it outside the future but still in your own thread, and then pass to the future only a &mut ref to the server. Also and as an experiment, you don't need the executor as you have a single future (the server). You can just use esp_idf_hal::task::block_on.

@fko-kuptec
Copy link
Contributor Author

It still seems to panic, even without executor and with the server passed at mutable reference to the future. The root cause for the panic now seems to be at least consistent, though. Interestingly still the exact same assertion, that fails, but with a different callstack:

assert failed: xQueueSemaphoreTake queue.c:1713 (pxQueue->uxItemSize == 0)


Backtrace: 0x40376a92:0x3fcb10c0 0x4037b5c5:0x3fcb10e0 0x40382a89:0x3fcb1100 0x4037bd52:0x3fcb1220 0x42076126:0x3fcb1260 0x42076465:0x3fcb1280 0x42069b78:0x3fcb12a0 0x42069bb3:0x3fcb12c0 0x42077609:0x3fcb12e0 0x420766a5:0x3fcb1300 0x420692dd:0x3fcb1350 0x420332e4:0x3fcb1370 0x420332d1:0x3fcb13a0 0x4203367c:0x3fcb13c0 0x42009933:0x3fcb1420 0x420098c4:0x3fcb1450 0x42005b93:0x3fcb14e0 0x420073d6:0x3fcb1510 0x420089c1:0x3fcb1560 0x42009838:0x3fcb7920 0x42007a4a:0x3fcbc0f0 0x4202b127:0x3fcbc120 0x4202b144:0x3fcbc150 0x4202b7ac:0x3fcbc180 0x4205cd30:0x3fcbc1a0
0x40376a92 - panic_abort
    at /home/florian/projects/esp-tokio-bug/.embuild/espressif/esp-idf/v5.2.2/components/esp_system/panic.c:466
0x4037b5c5 - esp_system_abort
    at /home/florian/projects/esp-tokio-bug/.embuild/espressif/esp-idf/v5.2.2/components/esp_system/port/esp_system_chip.c:93
0x40382a89 - __assert_func
    at /home/florian/projects/esp-tokio-bug/.embuild/espressif/esp-idf/v5.2.2/components/newlib/assert.c:81
0x4037bd52 - xQueueSemaphoreTake
    at /home/florian/projects/esp-tokio-bug/.embuild/espressif/esp-idf/v5.2.2/components/freertos/FreeRTOS-Kernel/queue.c:1713
0x42076126 - sys_mutex_lock
    at /home/florian/projects/esp-tokio-bug/.embuild/espressif/esp-idf/v5.2.2/components/lwip/port/freertos/sys_arch.c:63
0x42076465 - sys_arch_protect
    at /home/florian/projects/esp-tokio-bug/.embuild/espressif/esp-idf/v5.2.2/components/lwip/port/freertos/sys_arch.c:470
0x42069b78 - do_memp_malloc_pool
    at /home/florian/projects/esp-tokio-bug/.embuild/espressif/esp-idf/v5.2.2/components/lwip/lwip/src/core/memp.c:255
0x42069bb3 - memp_malloc
    at /home/florian/projects/esp-tokio-bug/.embuild/espressif/esp-idf/v5.2.2/components/lwip/lwip/src/core/memp.c:350
0x42077609 - netconn_alloc
    at /home/florian/projects/esp-tokio-bug/.embuild/espressif/esp-idf/v5.2.2/components/lwip/lwip/src/api/api_msg.c:733
0x420766a5 - netconn_new_with_proto_and_callback
    at /home/florian/projects/esp-tokio-bug/.embuild/espressif/esp-idf/v5.2.2/components/lwip/lwip/src/api/api_lib.c:155
0x420692dd - lwip_socket
    at /home/florian/projects/esp-tokio-bug/.embuild/espressif/esp-idf/v5.2.2/components/lwip/lwip/src/api/sockets.c:1720
0x420332e4 - std::sys::pal::unix::net::Socket::new_raw
    at /home/florian/.rustup/toolchains/esp/lib/rustlib/src/rust/library/std/src/sys/pal/unix/net.rs:98
0x420332d1 - std::sys::pal::unix::net::Socket::new
    at /home/florian/.rustup/toolchains/esp/lib/rustlib/src/rust/library/std/src/sys/pal/unix/net.rs:68
0x4203367c - std::sys_common::net::TcpListener::bind
    at /home/florian/.rustup/toolchains/esp/lib/rustlib/src/rust/library/std/src/sys_common/net.rs:405
0x42009933 - core::ops::function::FnMut::call_mut
    at /home/florian/.rustup/toolchains/esp/lib/rustlib/src/rust/library/core/src/ops/function.rs:166
0x420098c4 - std::net::each_addr
    at /home/florian/.rustup/toolchains/esp/lib/rustlib/src/rust/library/std/src/net/mod.rs:81
0x42005b93 - std::net::tcp::TcpListener::bind
    at /home/florian/.rustup/toolchains/esp/lib/rustlib/src/rust/library/std/src/net/tcp.rs:772
0x420073d6 - async_io_mini::io::Async<std::net::tcp::TcpListener>::bind
    at /home/florian/.cargo/git/checkouts/async-io-mini-e233321a2e3791f3/30d7fd7/src/io.rs:820
0x420089c1 - <edge_nal_std::Stack as edge_nal::stack::tcp::TcpBind>::bind::{{closure}}
    at /home/florian/.cargo/git/checkouts/edge-net-465b5694b2f162db/947929d/edge-nal-std/src/lib.rs:55
0x42009838 - esp_tokio_bug::main::{{closure}}
    at /home/florian/projects/esp-tokio-bug/src/main.rs:61
0x42007a4a - std::thread::Builder::spawn_unchecked_::{{closure}}::{{closure}}
    at /home/florian/.rustup/toolchains/esp/lib/rustlib/src/rust/library/std/src/thread/mod.rs:542
0x4202b127 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
    at /home/florian/.rustup/toolchains/esp/lib/rustlib/src/rust/library/alloc/src/boxed.rs:2063
0x4202b144 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
    at /home/florian/.rustup/toolchains/esp/lib/rustlib/src/rust/library/alloc/src/boxed.rs:2063
0x4202b7ac - std::sys::pal::unix::thread::Thread::new::thread_start
    at /home/florian/.rustup/toolchains/esp/lib/rustlib/src/rust/library/std/src/sys/pal/unix/thread.rs:108
0x4205cd30 - pthread_task_func
    at /home/florian/projects/esp-tokio-bug/.embuild/espressif/esp-idf/v5.2.2/components/pthread/pthread.c:196

@ivmarkov
Copy link
Owner

It is still in malloc which is suspicious. I'll flash later your sample on the esp32c3. We don't have any miscompilatios there for sure...

@ivmarkov
Copy link
Owner

@fko-kuptec Ah, yes! I think I might know what it is. You are missing a Wifi.connect call. The problem with that is that as a result, you don't have even a single netif which is active and working, and the LwIP impl crashes as a result, instead of - you know - returning some sort of other meaningful error.

I haven't spend time so far figuring out how to setup the esp idf netif config in such a state, that it does not simply crash when you don't yet have a working wifi or eth netif first. I suspect though, that if we just create a loop back netif, like the one you have on regular os-es, it would stop crashing even if you don't have others available...

@fko-kuptec
Copy link
Contributor Author

@ivmarkov I don't think that this is the issue. As far as I could test, calling connect() on a Wifi configured for AccessPoint only returns an error.

I tried a little more and came up with this updated code. I've described the issue in the README and the code comments. In short: The code works, if I simply use Async<TcpListener> directly, and it breaks under weird conditions, when I use edge-nal-std::Stack. The latter needs more stack in some unspecified pthread, for some reason, and panics, when I log something right before accepting a connection...

@ivmarkov
Copy link
Owner

ivmarkov commented Aug 28, 2024

@ivmarkov I don't think that this is the issue. As far as I could test, calling connect() on a Wifi configured for AccessPoint only returns an error.

You are right. I did not notice that you are running AP rather than a STA. Truth is, I'm guessing as I don't have the physical time to do a proper test ATM. :(

I tried a little more and came up with this updated code. I've described the issue in the README and the code comments. In short: The code works, if I simply use Async<TcpListener> directly, and it breaks under weird conditions, when I use edge-nal-std::Stack. The latter needs more stack in some unspecified pthread, for some reason, and panics, when I log something right before accepting a connection...

Might be though I can't recall any other unnamed pthread besides the one of async-io-mini. But I should really just stop speculating and test it for real.

@fko-kuptec
Copy link
Contributor Author

fko-kuptec commented Aug 28, 2024

I'm guessing as I don't have the physical time to do a proper test ATM. :(

No problem for me, at least not at the moment. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants