-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bootloop #196
Comments
It looks like the flashing went wrong. |
Thank you Dave. |
that is odd. What node-shop board version do you have? (I don't expect to see this error, when it is something with the board/power supply) Did you try to do a full flash erase before flashing? I normally do this when having odd issues with an esp8266. |
The latest, i got it on Wednesday, soldered and ready to go. I tried your suggestion with a full flash erase. Even dropped the baud rate to a lower rate, same issue. |
weird.. I recently did it fine. |
Hi, Ended up here after having the same issue. Updated from 0.9.5 to 0.10. 2023-02-02 23:49:12 - reboot cause: Exception (2) - Access to invalid address (28) etc.... Took the OTGW out from the OpenTherm bus and the system seems stable now. |
so, it sounds like we are following a invalid pointer. |
Since the resets stop happening for me when I disconnect the OpenTherm connection, I'd wager it must be something device-specific. i.e. thermostat/boiler send data the firmware doesn't like. I'm headed to bed for today but if you're interested, I'll try connecting boiler/thermostat separately tomorrow and see if either of them triggers the issue. |
That would be useful, also please report your setup (boiler/thermostat). |
Can you change setting GPIOSENSORSenabled": true, |
Hi Dave, Yes, of course. The following is my setup:
Just did some tests.
I reset the power on the boiler and OTGW every time I performed a test, so message intervals from any module shouldn't affect the tests, if my reasoning is correct. I should also provide my settings; they are as follows. { The system was stable with 0.9.5 but since I run PIC FW 6.4, I figured I'd update to 0.10 since the changelog mentions improved compatibility. If there's anything else I can provide or try out, please let me know. |
Thanks, this one has no GPIO attached, great, so we can forget about that, a have seen a problem with Onewire detecting a strange device causing this. But with GPIOSENSORSenabled": false this code is not executed. I have tested with the Honeywell ChronoTherm Touch Modulation as well, but different boiler. No problem there. I think we need at least a telnet trace or better a trace of the opentherm with OTMonitor. Suggestion, can we give a version compiled with 2.7.4 a try ? |
As this seems to be caused by some specific data the ESP receives from the PIC, it may be interesting to see what the PIC is sending. With at least hardware v2.3 and later, it is possible to power the board from a USB port of the PC and receive the serial data there, in addition to having a Wemos installed on the OTGW. Running a terminal emulator (or even OTmonitor) on the USB port may provide some valuable insights. |
@hvxl Can you confirm that this will work? Based on the manual for HW rev. 2.3, it seems not to be meant for this usecase: "Do not connect a Micro USB cable to the WeMos D1 Mini while it is connected to the gateway!", so I am a bit wary to destroy my new toy 😉 @Roos-AID Does 2.7.4 refer to a version of library or core, or something like that? Either way, I'd be happy to try. |
I think @hvxl is talking about the USB board on the main board, not the wemos. |
You can do a debug log display with Telnet ipadres. Alternative use OTMonitor and connect to port 25238 If you do Telnet , open the telnet before you connect power, otherwise you might miss the first messages |
Sorry, I should have been clearer. Yes, what I meant was to power the OTGW board from a USB port on the PC. @Roos-AID It's not possible to connect telnet before connecting the power. With the ESP booting every 10 seconds, there is hardly any chance to connect via TCP at all. That's why I suggested to monitor via USB. |
In my case it's connected to a Honeywell Chronotherm Touch Modulation and Atlantic Loria heatpump. From OTGW documentation error 03 suggests a voltage issue. That could have been an explanation for my issue since my board is new. But not for JvHummel. I'll do my best to dump a log. |
@LacsapOV My board is also new, soldered it just 2 nights ago :) But shipped with v0.9.5. My theory is that nodo-shop batch pre-programs them ahead of time. Anyhow, you are right that it doesn't explain why 0.9.5 was stable for me. |
When I replay that I also get exception 28:
|
aah nice, we should be able to translate that back into a readable backtrace (if we have original elf) (I think there is a plugin for this: https://github.com/me-no-dev/EspExceptionDecoder) |
Strangely I can't reproduce the issue on v0.10.0rc5 or when I compile v0.10.0 myself. |
With what core are you compiling 3.0.2 or 2.7.4? |
I cannot reproduce with 3.0.2 or 2.7.4. |
I tried both. However, comparing the reported debug information, I seem to end up with a different binary than you:
Firmware version is probably different because I didn't use autoinc-semver. Heap usage changes dynamically. But I expected the sketch size to be the same. There's probably a difference in the libraries we use. I have the impression your "How to compile the OTGW firmware" wiki page is not current. Did you manage to run the stack trace through the exception decoder? |
We need the original elf I think to decode the stacktrace. |
I can reproduce the crash with release btw:
|
This backtrace is not correct as far as I can tell, so I really need the original elf:
|
I've got mine up and running and stable. I compiled it myself using the steps in the documentation. Seems the binary in the installation documentation is faulty. I did have an issue with the Acetime version. It states version 1.9.0 but it's missing a function unixSeconds64, i updated tot the latest 1.x branch. |
Glad to hear that a new build does work. It’s the same conclusion we are reaching on the firmware chat on discord. What was the issue you ran into, so I can correct it. Also AceTime needs the latest and Brian is very actively improving his lib too. |
Can confirm that doing a build myself and flashing that, remedies the bootloops. |
@JvHummel thanks for confirming that. I just build a new release 0.10.1... would you be so kind to test this, it's in the beta channel on discord. |
Good evening Robert, was just messing with a DS18B20 so had my OTGW out anyway. Good timing. Flashed 0.10.1-beta+7b22d7d and connected it to boiler/thermostat. No bootloops! |
Wemos D1 reboots about every 10 seconds
Reboot log
2023-02-02 19:48:17 - reboot cause: Exception (2) - Access to invalid address (28)
ESP register contents: epc1=0x40241688, epc2=0x00000000, epc3=0x00000000, excvaddr=0x144c0000, depc=0x00000000
2023-02-02 19:48:07 - reboot cause: Exception (2) - Access to invalid address (28)
ESP register contents: epc1=0x40241688, epc2=0x00000000, epc3=0x00000000, excvaddr=0x144c0000, depc=0x00000000
2023-02-02 19:47:57 - reboot cause: Exception (2) - Access to invalid address (28)
ESP register contents: epc1=0x40241688, epc2=0x00000000, epc3=0x00000000, excvaddr=0x144c0000, depc=0x00000000
Firmware Version
0.10.0+eeeb22c
PIC Firmware Version
6.4
Settings
{
"hostname": "OTGW",
"MQTTenable": true,
"MQTTbroker": "192.168.2.13",
"MQTTbrokerPort": 1883,
"MQTTuser": "",
"MQTTpasswd": "",
"MQTTtoptopic": "otgw",
"MQTThaprefix": "homeassistant",
"MQTTuniqueid": "otgw",
"MQTTOTmessage": true,
"MQTTharebootdetection": true,
"NTPenable": true,
"NTPtimezone": "Europe/Amsterdam",
"NTPhostname": "pool.ntp.org",
"LEDblink": true,
"GPIOSENSORSenabled": true,
"GPIOSENSORSpin": 13,
"GPIOSENSORSinterval": 20,
"S0COUNTERenabled": false,
"S0COUNTERpin": 12,
"S0COUNTERdebouncetime": 80,
"S0COUNTERpulsekw": 1000,
"S0COUNTERinterval": 60,
"OTGWcommandenable": false,
"OTGWcommands": "GW=1",
"GPIOOUTPUTSenabled": false,
"GPIOOUTPUTSpin": 16,
"GPIOOUTPUTStriggerBit": 0
}
The text was updated successfully, but these errors were encountered: