Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Allocation Failure in LVGL library in BSP_Dev branch #98

Open
pmumby opened this issue Oct 6, 2022 · 2 comments
Open

Memory Allocation Failure in LVGL library in BSP_Dev branch #98

pmumby opened this issue Oct 6, 2022 · 2 comments

Comments

@pmumby
Copy link

pmumby commented Oct 6, 2022

Summary:

We are experiencing a memory allocation failure in the LVGL library when trying to allocate DMA enabled heap for SPI use. Under normal circumstances this error is non-fatal (without customizing the build config so that it throws a fatal error). The end result is that the UI task hangs, and our UI and touchscreen become completely locked up (all other functions of the device continue without interruption).

Background:

We are implementing this component as a hardware abstraction library for our use of the M5Stack Core2 for an IoT Product. To that end, we've forked this repo to: https://github.com/Flow-Coffee-Limited/Core2-for-AWS-IoT-EduKit

In order to integrate the component into our ESP-IDF 4.4 project, we've created a branch which is pinned to the same commit hash as the template project at: https://github.com/aws-iot-edukit/Project_Template-Core2_for_AWS this branch we've called flow_fgv3_active

We are building using ESP_IDF 4.4 as I mentioned above. We have a dockerized build environment that pre-installs IDF and prerequisites into a docker container, then we use scripts to run the build within the container.

We were on ESP_IDF 4.2, and running older AWS MQTT lib, and older Core2 for AWS lib, unfortunately we had another problem with the MQTT stack which required us to update the MQTT library, which in turn required upgrading to IDF 4.4, and subsequently required upgrading to the BSP_Dev branch of Core2 for AWS lib.

It should be noted that we did not encounter this bug on the previous version of the Core2 for AWS lib, however that was an older version and not the refactored BSP_Dev branch, additionally much else has changed with roll to IDF 4.4, and new MQTT lib, so hardly a clean test, unfortunately these components can't easily be swapped in isolation.

Docker Build Container:

The Docker file builds ESP_IDF branch release/v4.4 and installs all other prerequisites on a base of ubuntu:22.04 (dockerfile attached)
esp_idf_dockerfile.zip
And here is an example of how to use said docker file:
docker run -it -v $(pwd):"/project" esp-idf-build-container $(id -u) $(id -g) /bin/bash
Which will provide a bash console inside the container, mounting the current working directory into the /project volume in the container (assuming your current working directory is the root of an IDF project).
Alternatively you could substitute /bin/bash for an idf command such as idf.py build since the idf environment is automatically sourced inside the build container.

Stack Trace of the error we're seeing:

(note paths have been sanitized, but everything else is intact)

Memory allocation failed


Backtrace: 0x400823b5:0x3ffddba0 0x400972f9:0x3ffddbc0 0x400d6267:0x3ffddbe0 0x4008270f:0x3ffddc00 0x40082d6a:0x3ffddc20 0x40082e25:0x3ffddc60 0x4008da7a:0x3ffddc80 0x40137b2f:0x3ffddca0 0x400ec5ed:0x3ffddce0 0x400ec376:0x3ffddd50 0x400ebfe1:0x3ffddd90 0x400dd586:0x3ffdddb0 0x400dd673:0x3ffdde20 0x400ddc25:0x3ffddeb0 0x400e6bc5:0x3ffddf00 0x400e6cc8:0x3ffddf20 0x400d8dcd:0x3ffddf50
0x400823b5: panic_abort at /esp/esp-idf-4.4-release/components/esp_system/panic.c:402

0x400972f9: esp_system_abort at /esp/esp-idf-4.4-release/components/esp_system/esp_system.c:128

0x400d6267: heap_caps_match at /esp/esp-idf-4.4-release/components/heap/heap_caps.c:90

0x4008270f: heap_caps_malloc at /esp/esp-idf-4.4-release/components/heap/heap_caps.c:177

0x40082d6a: trace_malloc at /esp/esp-idf-4.4-release/components/heap/include/heap_trace.inc:93

0x40082e25: __wrap_heap_caps_malloc at /esp/esp-idf-4.4-release/components/heap/include/heap_trace.inc:182

0x4008da7a: setup_priv_desc at /esp/esp-idf-4.4-release/components/driver/spi_master.c:771 (discriminator 15)

0x40137b2f: spi_device_queue_trans at /esp/esp-idf-4.4-release/components/driver/spi_master.c:828
 (inlined by) spi_device_queue_trans at /esp/esp-idf-4.4-release/components/driver/spi_master.c:786

0x400ec5ed: disp_spi_transaction at /project/components/Core2-for-AWS-IoT-EduKit/lib/display/lib/lvgl_esp32_drivers/lv_port/disp_spi.c:268

0x400ec376: disp_spi_add_device_config at /project/components/Core2-for-AWS-IoT-EduKit/lib/display/lib/lvgl_esp32_drivers/lv_port/disp_spi.c:106

0x400ebfe1: ili9341_send_data at /project/components/Core2-for-AWS-IoT-EduKit/lib/display/lib/lvgl_esp32_drivers/lvgl_tft/ili9341.c:182

0x400dd586: lv_refr_vdb_flush at /project/components/Core2-for-AWS-IoT-EduKit/lib/display/lib/lvgl/src/lv_core/lv_refr.c:962

0x400dd673: lv_refr_area_part at /project/components/Core2-for-AWS-IoT-EduKit/lib/display/lib/lvgl/src/lv_core/lv_refr.c:507 (discriminator 1)

0x400ddc25: lv_refr_area at /project/components/Core2-for-AWS-IoT-EduKit/lib/display/lib/lvgl/src/lv_core/lv_refr.c:469
 (inlined by) lv_refr_areas at /project/components/Core2-for-AWS-IoT-EduKit/lib/display/lib/lvgl/src/lv_core/lv_refr.c:400
 (inlined by) _lv_disp_refr_task at /project/components/Core2-for-AWS-IoT-EduKit/lib/display/lib/lvgl/src/lv_core/lv_refr.c:199

0x400e6bc5: lv_task_exec at /project/components/Core2-for-AWS-IoT-EduKit/lib/display/lib/lvgl/src/lv_misc/lv_task.c:394
 (inlined by) lv_task_exec at /project/components/Core2-for-AWS-IoT-EduKit/lib/display/lib/lvgl/src/lv_misc/lv_task.c:380

0x400e6cc8: lv_task_handler at /project/components/Core2-for-AWS-IoT-EduKit/lib/display/lib/lvgl/src/lv_misc/lv_task.c:135 (discriminator 1)

0x400d8dcd: _gui_task at /project/components/Core2-for-AWS-IoT-EduKit/lib/display/core2foraws_display.c:83

What have we tried so far?

We have tried the following to troubleshoot/resolve the issue:

  • Enable all heap debugging, so that we could get the above stack trace
  • Confirmed the problem (appears to be) purely within this component (within the task fired within this component for LVGL, rest of stack trace purely within lvgl, crashes are common/frequent, and always identical stack traces)
  • Turned on heap tracing, and monitored in our telemetry for heap usage, fragmentation, etc... To see if we had a leak or other problem applying pressure to memory allocation which is causing the problem. No leaks were found, and no apparent pressure seen, no fragmentation problem (fragmentation is flat, not growing)
  • Attemped disabling DMA on SPI master, as a workaround for the problem (at cost of performance) but that caused too many other problems so had to abandon that path.
  • Attempted varying amount of heap allocated to the UI thread. Unfortunately our QR code generation lib causes memory problems if we don't give the LVGL task more heap.
  • Implemented watchdog that monitors LVGL task, and upon task hang/halt it reboots the device (once a tolerance threshold is exceeded), this is also tracked/monitored in our telemetry data
  • Monitoried telemetry data from 10 devices experiencing the issue for 72 hours, and collated data for review/analysis. Our finding is that on average this crash occurs at random intervals every 30-60 minutes (some devices it's only every 3-4 hours, others are every 20min, but the majority fall in the 30-60 minute window), and even with the above watchdog as a failsafe (which works), the frequency of reboots that results will significantly impact utility of the devices, so we need to find a better solution.

Happy to provide any further info as needed.

Any assistance would be greatly appreciated.

@rashedtalukder
Copy link
Collaborator

Have you set an increased LV_MEM_SIZE?

@pmumby
Copy link
Author

pmumby commented Oct 13, 2022

Have you set an increased LV_MEM_SIZE?

Hi Rashed, I have not changed LV_MEM_SIZE from what is set as default in this library (32kb).

Is that size being used for the DMA buffers for SPI though?

If you look in the stack trace of the error, you can see the allocation failure is specifically happening in the code for allocating SPI buffers for the SPI connected display, which require DMA capability.

Obviously DMA is the most constrained memory, but the code runs fine for anywhere from 30min to several hours, with UI working perfectly, then fails (at somewhat random intervals). During this time monitoring heap usage and fragmentation shows that we have between 20kb and 35kb of free DMA memory depending on the circumstances. So if the issue were allocation of 32kb block of DMA memory one would think it would be throwing the error immediately, or at least far more commonly.

For example here is a dump of heap telemetry we are sending from the device. This was taken from a running device that had been online for nearly 1 hour, on which the UI had not yet locked up (it had not yet thrown the allocation exception):

    "heap": {
        "internal": {
          "allocatedBlocks": 428,
          "totalFreeBytes": 40027,
          "minimumFreeBytes": 16199,
          "largestFreeBlock": 19456,
          "freeBlocks": 17,
          "totalBlocks": 445
          "freeBlocks": 16,
          "totalBlocks": 444
        },
        "dram": {
          "allocatedBlocks": 553,
          "totalFreeBytes": 3614854,
          "minimumFreeBytes": 3566798,
          "largestFreeBlock": 3538944,
          "freeBlocks": 25,
          "totalBlocks": 578
          "freeBlocks": 24,
          "totalBlocks": 577
        },
        "iram": {
          "allocatedBlocks": 0,
          "totalFreeBytes": 0,
          "minimumFreeBytes": 0,
          "largestFreeBlock": 0,
          "freeBlocks": 0,
          "totalBlocks": 0
        },
        "dma": {
          "allocatedBlocks": 428,
          "totalFreeBytes": 38435,
          "minimumFreeBytes": 14623,
          "largestFreeBlock": 19456,
          "freeBlocks": 16,
          "totalBlocks": 444
          "freeBlocks": 15,
          "totalBlocks": 443
        },
        "spiram": {
          "allocatedBlocks": 125,
          "totalFreeBytes": 3576419,
          "minimumFreeBytes": 3552175,
          "largestFreeBlock": 3538944,
          "freeBlocks": 9,
          "totalBlocks": 134
        }
      },

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants