Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[email protected] fails to catch an exception #850

Open
rinigus opened this issue Nov 2, 2024 · 31 comments
Open

[email protected] fails to catch an exception #850

rinigus opened this issue Nov 2, 2024 · 31 comments
Labels

Comments

@rinigus
Copy link

rinigus commented Nov 2, 2024

Platform: nagara
Device: pdx223
Kernel version: 5.15
Android version: android-14_r67
Software binaries version: Patched SW_binaries_for_Xperia_Android_13_5.15_v4a_nagara.zip

Previously working on

Same issue was for me on AOSP13 as well, recent builds.

Description

On boot, device is stuck on ANDROID logo. In logcat, there are failures with [email protected]:

01-01 00:00:06.314   478   478 I hwservicemanager: Since [email protected]::IHealth/default is not registered, trying to start it as a lazy HAL.
01-01 00:00:06.311   720   720 I android.hardwar: type=1400 audit(0.0:534): avc: denied { read } for name="qcom,qseecom" dev="tmpfs" ino=548 scontext=u:r:hal_gatekeeper_default:s0 tcontext=u:object_r:dmabuf_heap_device:s0 tclass=chr_file permissive=1
01-01 00:00:06.311   720   720 I android.hardwar: type=1400 audit(0.0:535): avc: denied { open } for path="/dev/dma_heap/qcom,qseecom" dev="tmpfs" ino=548 scontext=u:r:hal_gatekeeper_default:s0 tcontext=u:object_r:dmabuf_heap_device:s0 tclass=chr_file permissive=1
01-01 00:00:06.315   720   720 I android.hardwar: type=1400 audit(0.0:536): avc: denied { ioctl } for path="/dev/dma_heap/qcom,qseecom" dev="tmpfs" ino=548 ioctlcmd=0x4800 scontext=u:r:hal_gatekeeper_default:s0 tcontext=u:object_r:dmabuf_heap_device:s0 tclass=chr_file permissive=1
01-01 00:00:06.315   720   720 D QSEECOMAPI: QSEECom_get_handle sb_length = 0xa000
01-01 00:00:06.315   720   720 D QSEECOMAPI: App is already loaded QSEE and app id = 1
01-01 00:00:06.316   720   720 I DMABUFHEAPS: Using DMA-BUF heap named: qcom,qseecom
01-01 00:00:06.312     0     0 I         : wlan_hdd_state wlan major(474) initialized
01-01 00:00:06.317     0     0 I init    : processing action (boot) from (/vendor/etc/init/init.usb.rc:41)
01-01 00:00:06.317     0     0 I init    : Command 'mount configfs none /config' action=boot (/vendor/etc/init/init.usb.rc:42) took 0ms and failed: mount() failed: Device or resource busy
01-01 00:00:06.317   478   478 I hwservicemanager: Since [email protected]::IHealth/default is not registered, trying to start it as a lazy HAL.
01-01 00:00:06.317   721   788 I HidlServiceManagement: getService: Trying again for [email protected]::IHealth/default...
01-01 00:00:06.318   721   721 I HidlServiceManagement: Registered [email protected]::IGnss/default
01-01 00:00:06.318   478   790 W libc    : Unable to set property "ctl.interface_start" to "[email protected]::IHealth/default": error code: 0x20
01-01 00:00:06.318   721   721 I HidlServiceManagement: Removing namespace from process name [email protected] to [email protected].
01-01 00:00:06.318   478   790 I hwservicemanager: Tried to start [email protected]::IHealth/default as a lazy service, but was unable to. Usually this happens when a service is not installed, but if the service is intended to be used as a lazy service, then it may be configured incorrectly.
01-01 00:00:06.318   721   721 I LegacySupport: Registration complete for [email protected]::IGnss/default.
01-01 00:00:06.319   721   721 I [email protected]: LOC_HIDL_VERSION not defined.
01-01 00:00:06.319     0     0 E init    : Control message: Could not find '[email protected]::IHealth/default' for ctl.interface_start from pid: 478 (/system/bin/hwservicemanager)
01-01 00:00:06.320   720   720 E KeymasterUtils: TA API Major Version: 4
01-01 00:00:06.320   720   720 E KeymasterUtils: TA API Minor Version: 1
01-01 00:00:06.320   720   720 E KeymasterUtils: TA Major Version: 4
01-01 00:00:06.320   720   720 E KeymasterUtils: TA Minor Version: 1012
01-01 00:00:06.321   709   709 I AndroidRuntime: Using default boot image
01-01 00:00:06.321   709   709 I AndroidRuntime: Leaving lock profiling enabled
01-01 00:00:06.322     0     0 E init    : Control message: Could not find '[email protected]::IHealth/default' for ctl.interface_start from pid: 478 (/system/bin/hwservicemanager)
01-01 00:00:06.323     0     0 I logd    : logdr: UID=1000 GID=1000 PID=776 n tail=0 logMask=8 pid=722 start=0ns deadline=0ns
01-01 00:00:06.322   478   793 W libc    : Unable to set property "ctl.interface_start" to "[email protected]::IHealth/default": error code: 0x20
01-01 00:00:06.324     0     0 I logd    : logdr: UID=1000 GID=1000 PID=776 n tail=0 logMask=1 pid=722 start=0ns deadline=0ns
01-01 00:00:06.324   478   793 I hwservicemanager: Tried to start [email protected]::IHealth/default as a lazy service, but was unable to. Usually this happens when a service is not installed, but if the service is intended to be used as a lazy service, then it may be configured incorrectly.
01-01 00:00:06.325   776   776 F DEBUG   : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
01-01 00:00:06.325   776   776 F DEBUG   : Build fingerprint: 'Sony/aosp_xqct54/pdx223:13/TQ3A.230901.001/eng.root.20241031.073518:userdebug/test-keys'
01-01 00:00:06.325   776   776 F DEBUG   : Revision: '0'
01-01 00:00:06.325   776   776 F DEBUG   : ABI: 'arm64'
01-01 00:00:06.325   776   776 F DEBUG   : Timestamp: 1970-01-01 00:00:06.309160049+0000
01-01 00:00:06.325   776   776 F DEBUG   : Process uptime: 0s
01-01 00:00:06.325   776   776 F DEBUG   : Cmdline: /vendor/bin/hw/[email protected]
01-01 00:00:06.325   776   776 F DEBUG   : pid: 722, tid: 722, name: android.hardwar  >>> /vendor/bin/hw/[email protected] <<<
01-01 00:00:06.325   776   776 F DEBUG   : uid: 1000
01-01 00:00:06.325   776   776 F DEBUG   : tagged_addr_ctrl: 0000000000000001 (PR_TAGGED_ADDR_ENABLE)
01-01 00:00:06.325   776   776 F DEBUG   : pac_enabled_keys: 000000000000000f (PR_PAC_APIAKEY, PR_PAC_APIBKEY, PR_PAC_APDAKEY, PR_PAC_APDBKEY)
01-01 00:00:06.325   776   776 F DEBUG   : signal 6 (SIGABRT), code -1 (SI_QUEUE), fault addr --------
01-01 00:00:06.325   776   776 F DEBUG   : Abort message: 'terminating with uncaught exception of type St16invalid_argument: stoi: no conversion'
01-01 00:00:06.325   776   776 F DEBUG   :     x0  0000000000000000  x1  00000000000002d2  x2  0000000000000006  x3  0000007fddfe9e70
01-01 00:00:06.325   776   776 F DEBUG   :     x4  736f646277641f73  x5  736f646277641f73  x6  736f646277641f73  x7  7f7f7f7f7f7f7f7f
01-01 00:00:06.325   776   776 F DEBUG   :     x8  00000000000000f0  x9  000000700b9d1a00  x10 0000000000000001  x11 000000700ba0fde4
01-01 00:00:06.325   776   776 F DEBUG   :     x12 0000000000000006  x13 000000007fffffff  x14 0000000000000baa  x15 0000000013b75690
01-01 00:00:06.325   776   776 F DEBUG   :     x16 000000700ba74d58  x17 000000700ba51c70  x18 000000700d8e8000  x19 00000000000002d2
01-01 00:00:06.325   776   776 F DEBUG   :     x20 00000000000002d2  x21 00000000ffffffff  x22 0000007fddfe9fa0  x23 ffffff80ffffffc8
01-01 00:00:06.325   776   776 F DEBUG   :     x24 000000700ba76f70  x25 0000000000000001  x26 b400006e1b5ec9a0  x27 b400006e4b5f98c8
01-01 00:00:06.325   776   776 F DEBUG   :     x28 b400006e1b5ec9a0  x29 0000007fddfe9ef0
01-01 00:00:06.325   776   776 F DEBUG   :     lr  000000700ba01968  sp  0000007fddfe9e50  pc  000000700ba01994  pst 0000000000001000
01-01 00:00:06.325   776   776 F DEBUG   : backtrace:
01-01 00:00:06.325   776   776 F DEBUG   :       #00 pc 0000000000051994  /apex/com.android.runtime/lib64/bionic/libc.so (abort+164) (BuildId: 2c82bf49529e9aa25cb9e5eca777a56f)
01-01 00:00:06.326   776   776 F DEBUG   :       #01 pc 000000000004910c  /apex/com.android.vndk.v33/lib64/libc++.so (abort_message+248) (BuildId: ac7b79d1f77424245e84c9e5e12121c7)
01-01 00:00:06.326   776   776 F DEBUG   :       #02 pc 00000000000492d4  /apex/com.android.vndk.v33/lib64/libc++.so (demangling_terminate_handler()+208) (BuildId: ac7b79d1f77424245e84c9e5e12121c7)
01-01 00:00:06.326   776   776 F DEBUG   :       #03 pc 0000000000049e84  /apex/com.android.vndk.v33/lib64/libc++.so (std::__terminate(void (*)())+12) (BuildId: ac7b79d1f77424245e84c9e5e12121c7)
01-01 00:00:06.326   776   776 F DEBUG   :       #04 pc 00000000000494ec  /apex/com.android.vndk.v33/lib64/libc++.so (__cxxabiv1::failed_throw(__cxxabiv1::__cxa_exception*)+28) (BuildId: ac7b79d1f77424245e84c9e5e12121c7)
01-01 00:00:06.326   776   776 F DEBUG   :       #05 pc 0000000000049450  /apex/com.android.vndk.v33/lib64/libc++.so (__cxa_throw+116) (BuildId: ac7b79d1f77424245e84c9e5e12121c7)
01-01 00:00:06.326   776   776 F DEBUG   :       #06 pc 000000000009ae94  /apex/com.android.vndk.v33/lib64/libc++.so (std::__1::stoi(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, unsigned long*, int)+384) (BuildId: ac7b79d1f77424245e84c9e5e12121c7)
01-01 00:00:06.326   776   776 F DEBUG   :       #07 pc 000000000000ac38  /vendor/lib64/hw/[email protected] (device::sony::health::CycleCountBackupRestore::Read(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int&)+148) (BuildId: 47284e398f31910ce682e251b7e79c63)
01-01 00:00:06.326   776   776 F DEBUG   :       #08 pc 000000000000ab4c  /vendor/lib64/hw/[email protected] (device::sony::health::CycleCountBackupRestore::Restore()+36) (BuildId: 47284e398f31910ce682e251b7e79c63)
01-01 00:00:06.326   776   776 F DEBUG   :       #09 pc 000000000000a1c0  /vendor/lib64/hw/[email protected] (device::sony::health::SonyHealth::SonyHealth(std::__1::unique_ptr<healthd_config, std::__1::default_delete<healthd_config> >&&)+116) (BuildId: 47284e398f31910ce682e251b7e79c63)
01-01 00:00:06.326   776   776 F DEBUG   :       #10 pc 000000000000a0e0  /vendor/lib64/hw/[email protected] (HIDL_FETCH_IHealth+204) (BuildId: 47284e398f31910ce682e251b7e79c63)
01-01 00:00:06.326   776   776 F DEBUG   :       #11 pc 0000000000052554  /apex/com.android.vndk.v33/lib64/libhidlbase.so (android::hardware::PassthroughServiceManager::get(android::hardware::hidl_string const&, android::hardware::hidl_string const&)::'lambda'(void*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)::operator()(void*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) const+92) (BuildId: 009bd5d0670c92352e8223126010f84e)
01-01 00:00:06.326   776   776 F DEBUG   :       #12 pc 000000000004d70c  /apex/com.android.vndk.v33/lib64/libhidlbase.so (android::hardware::PassthroughServiceManager::openLibs(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::function<bool (void*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)> const&)+1372) (BuildId: 009bd5d0670c92352e8223126010f84e)
01-01 00:00:06.326   776   776 F DEBUG   :       #13 pc 0000000000050970  /apex/com.android.vndk.v33/lib64/libhidlbase.so (android::hardware::PassthroughServiceManager::get(android::hardware::hidl_string const&, android::hardware::hidl_string const&)+92) (BuildId: 009bd5d0670c92352e8223126010f84e)
01-01 00:00:06.326   776   776 F DEBUG   :       #14 pc 000000000004e770  /apex/com.android.vndk.v33/lib64/libhidlbase.so (android::hardware::details::getRawServiceInternal(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, bool, bool)+1608) (BuildId: 009bd5d0670c92352e8223126010f84e)
01-01 00:00:06.326   776   776 F DEBUG   :       #15 pc 0000000000018b9c  /vendor/lib64/[email protected] (android::sp<android::hardware::health::V2_0::IHealth> android::hardware::details::getServiceInternal<android::hardware::health::V2_0::BpHwHealth, android::hardware::health::V2_0::IHealth, void, void>(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, bool, bool)+208) (BuildId: 0bcba204447d37ac45f9cf9e48429db2)
01-01 00:00:06.326   776   776 F DEBUG   :       #16 pc 000000000000631c  /vendor/bin/hw/[email protected] (android::hardware::health::V2_0::IHealth::getService(char const*, bool)+188) (BuildId: 8b1c18b1dc0760c9d0c123f01bd46e1d)
01-01 00:00:06.326   776   776 F DEBUG   :       #17 pc 0000000000006080  /vendor/bin/hw/[email protected] (main+44) (BuildId: 8b1c18b1dc0760c9d0c123f01bd46e1d)
01-01 00:00:06.326   776   776 F DEBUG   :       #18 pc 000000000004a1f4  /apex/com.android.runtime/lib64/bionic/libc.so (__libc_init+96) (BuildId: 2c82bf49529e9aa25cb9e5eca777a56f)
01-01 00:00:06.334     0     0 I         : gsid (779) used greatest stack depth: 9840 bytes left
01-01 00:00:06.334     0     0 I init    : Service 'exec 18 (/system/bin/gsid run-startup-tasks)' (pid 779) exited with status 0 oneshot service took 0.028000 seconds in background
01-01 00:00:06.334     0     0 I init    : Sending signal 9 to service 'exec 18 (/system/bin/gsid run-startup-tasks)' (pid 779) process group...
01-01 00:00:06.335     0     0 I libprocessgroup: Successfully killed process cgroup uid 0 pid 779 in 0ms
01-01 00:00:06.335     0     0 I init    : Service 'health-hal-2-1' (pid 722) received signal 6
01-01 00:00:06.335     0     0 I init    : Sending signal 9 to service 'health-hal-2-1' (pid 722) process group...

Symptoms

Stuck on Android logo while booting

How to reproduce

New build

Additional context

Maybe important: last stock was XQ-CT54_Customized EU-UK_64.2.A.2.185

While maybe it cannot reach battery data, the code seems to have corresponding exception handling in https://github.com/sonyxperiadev/device-sony-common/blob/master/hardware/health/CycleCountBackupRestore.cpp#L92

I could make it work by adding

diff --git a/hardware/health/CycleCountBackupRestore.cpp b/hardware/health/CycleCountBackupRestore.cpp
index 2207f60..cd305c6 100644
--- a/hardware/health/CycleCountBackupRestore.cpp
+++ b/hardware/health/CycleCountBackupRestore.cpp
@@ -92,6 +92,9 @@ void CycleCountBackupRestore::Read(const std::string &path, int &cycles) {
     } catch (std::invalid_argument &e) {
         LOG(WARNING) << "Data format is wrong in persist storage file: " << path;
         return;
+    } catch (...) {
+        LOG(WARNING) << "Data format is wrong in persist storage file and exception sneaked through: " << path;
+        return;
     }
     LOG(VERBOSE) << "Read " << cycles << " battery cycles from " << path;
 }

But I cannot figure out why the original exception does not catch it.

cc: @MarijnS95

@MarijnS95
Copy link
Contributor

MarijnS95 commented Nov 2, 2024

@rinigus that should have exactly nothing to do with what you call a "Related PR" unless the old code in question was broken such that .Restore() never got called before.

Scrolling the error message top-to-bottom and left-to-right with all unrelated lines is incredibly tedious and frustrating, so I'm cutting it down to the bare minimum that is important:

Abort message: 'terminating with uncaught exception of type St16invalid_argument: stoi: no conversion'
...
backtrace:
      #00 pc 0000000000051994  /apex/com.android.runtime/lib64/bionic/libc.so (abort+164) (BuildId: 2c82bf49529e9aa25cb9e5eca777a56f)
      #01 pc 000000000004910c  /apex/com.android.vndk.v33/lib64/libc++.so (abort_message+248) (BuildId: ac7b79d1f77424245e84c9e5e12121c7)
      #02 pc 00000000000492d4  /apex/com.android.vndk.v33/lib64/libc++.so (demangling_terminate_handler()+208) (BuildId: ac7b79d1f77424245e84c9e5e12121c7)
      #03 pc 0000000000049e84  /apex/com.android.vndk.v33/lib64/libc++.so (std::__terminate(void (*)())+12) (BuildId: ac7b79d1f77424245e84c9e5e12121c7)
      #04 pc 00000000000494ec  /apex/com.android.vndk.v33/lib64/libc++.so (__cxxabiv1::failed_throw(__cxxabiv1::__cxa_exception*)+28) (BuildId: ac7b79d1f77424245e84c9e5e12121c7)
      #05 pc 0000000000049450  /apex/com.android.vndk.v33/lib64/libc++.so (__cxa_throw+116) (BuildId: ac7b79d1f77424245e84c9e5e12121c7)
      #06 pc 000000000009ae94  /apex/com.android.vndk.v33/lib64/libc++.so (std::__1::stoi(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, unsigned long*, int)+384) (BuildId: ac7b79d1f77424245e84c9e5e12121c7)
      #07 pc 000000000000ac38  /vendor/lib64/hw/[email protected] (device::sony::health::CycleCountBackupRestore::Read(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int&)+148) (BuildId: 47284e398f31910ce682e251b7e79c63)
      #08 pc 000000000000ab4c  /vendor/lib64/hw/[email protected] (device::sony::health::CycleCountBackupRestore::Restore()+36) (BuildId: 47284e398f31910ce682e251b7e79c63)
      #09 pc 000000000000a1c0  /vendor/lib64/hw/[email protected] (device::sony::health::SonyHealth::SonyHealth(std::__1::unique_ptr<healthd_config, std::__1::default_delete<healthd_config> >&&)+116) (BuildId: 47284e398f31910ce682e251b7e79c63)
      #10 pc 000000000000a0e0  /vendor/lib64/hw/[email protected] (HIDL_FETCH_IHealth+204) (BuildId: 47284e398f31910ce682e251b7e79c63)

What, strangely, seems to happen is that std::invalid_argument that the health HAL is compiled with might be different than the std::invalid_argument that libc++.so is raising here (IIRC as defined at https://cs.android.com/android-llvm/toolchain/llvm-project/+/main:libcxx/src/string.cpp;l=62-64;drc=9783f28cbb155e4a8d49c12e1c60ce14dcfaf0c7).

Alternatively AOSP could be compiling our HAL without RTTI (https://stackoverflow.com/questions/74441543/why-stdinvalid-argument-is-not-caught-with-no-rtti-in-macos-m1-environment)? Again nothing that the "Related PR" changed?

@bartcubbins
Copy link
Contributor

I see it for the first time.
On Nagara Health may not receive a value until ADSP is initialized, but then it restarts and everything goes smoothly. This is Nagara A14r67. I'll try one more time on a fresh build later

01-01 00:00:04.449   783   783 W [email protected]: Failed to read battery cycles from /mnt/vendor/persist/battery/battery_cycle_count
01-01 00:00:04.449   783   783 W [email protected]: Failed to read battery cycles from /sys/class/power_supply/battery/cycle_count
01-01 00:00:04.449   783   783 W [email protected]: Could not read battery capacity persist file from /mnt/vendor/persist/battery/battery_charge_full: No such file or directory
01-01 00:00:04.449   783   783 W [email protected]: Read max battery capacity from sysfs error: No such file or directory
01-01 00:00:04.450   783   783 I HidlServiceManagement: Registered [email protected]::IHealth/default
01-01 00:00:04.450   783   783 I HidlServiceManagement: Removing namespace from process name [email protected] to [email protected].
01-01 00:00:04.450   783   783 I [email protected]: default: Hal init done
---------
01-01 00:00:09.437  1346  1346 W [email protected]: Could not read battery capacity persist file from /mnt/vendor/persist/battery/battery_charge_full: No such file or directory
01-01 00:00:09.437  1346  1346 I [email protected]: Saved learned max battery capacity of 5056 mAh to persist storage
01-01 00:00:09.437  1346  1346 I [email protected]: default instance initializing with healthd_config...
01-01 00:00:09.438  1346  1346 I HidlServiceManagement: Registered [email protected]::IHealth/default
01-01 00:00:09.438  1346  1346 I HidlServiceManagement: Removing namespace from process name [email protected] to [email protected].
01-01 00:00:09.438  1346  1346 I [email protected]: default: Hal init done

@bartcubbins
Copy link
Contributor

@rinigus can you provide me your /vendor/bin/hw/[email protected] binary?

@rinigus
Copy link
Author

rinigus commented Nov 2, 2024

@MarijnS95 : I will check PR closer and will trim logs more next time :) . Dropped "Related PR" from original message.

@bartcubbins: let me bake a new one and check that it fails the same way

@MarijnS95
Copy link
Contributor

MarijnS95 commented Nov 2, 2024

@bartcubbins despite the broken log messages in CycleCountBackupRestore::Read(), this function is used to read from both persist and the actual sysfs file.

@rinigus said he came from a fresh stock, so it's quite possible that they have the file there in a different format; unfortunately the catch(...) workaround probably caused the HAL to already overwrite it. Or the file was simply corrupt?

Also note that LearnedCapcityBackupRestore strangely uses std::stoi() for ReadFromPersistSTorage(), and sscanf(..., "%d", ...) for ReadFromSRAM() (which is /sys/class/power_supply/bms/charge_full).

@rinigus
Copy link
Author

rinigus commented Nov 2, 2024

Very strange, I don't have that issue anymore. Looks like after successful boot by AOSP and shutdown, it doesn't appear. It fails on first boot after you come from stock as in:

01-01 00:00:06.708   792   792 W [email protected]: Data format is wrong in persist storage file and exception sneaked through: /mnt/vendor/persist/battery/battery_cycle_count

That's the first successful boot after coming from stock 64.2.A.2.185 and after applying the patch from my original message.

Before that was the boot where I was stuck on on "android" and exception issue as in the original message.

After the first successful boot, I get logs similar to @bartcubbins. Although, have lots of errors related to selinux, but that's another general issue.

@MarijnS95 - cross-posting, was looking for binary and I agree with you.

So, it is hard to reproduce. We need to probably go back to stock and then return. However, issue with exception catching is still there. @MarijnS95 - I wonder how to check your idea regarding different compilation?

@bartcubbins: File attached

[email protected]

@MarijnS95
Copy link
Contributor

@rinigus without fixing this, I'd love to learn more about the downstream format. The entire goal of this HAL is to "persist" these states, but that's all thrown out of the window if we restart the counter from stock, and stock restarts it when coming back from SODP.

Don't think I understand what you mean exactly with cross-posting 😅

@rinigus
Copy link
Author

rinigus commented Nov 2, 2024

Re stock / aosp restart. Agree, if we can figure out the format. But I am not sure it is important in practice for many. I would expect majority of users sticking to one of the camps. For SFOS users, they get a device, flash it, and use on SFOS without going to stock.

In my eyes, transition issues between stock and AOSP are minor / low priority. Right now, on nagara, basically nothing works and there are way bigger issues to fix :)

Re cross-posting: wrote my message part regarding coming from fresh stock, started looking for file, and then saw your message.

@bartcubbins
Copy link
Contributor

despite the broken log messages in CycleCountBackupRestore::Read(), this function is used to read from both persist and the actual sysfs file.

Yes, I am familiar with the code

@bartcubbins
Copy link
Contributor

I'd love to learn more about the downstream format

XQ-CQ54:/ # getprop ro.build.id                                                                                                                                
64.2.A.2.215B
XQ-CQ54:/ # cat /mnt/vendor/persist/battery/battery_charge_full                                                                                                
5056000XQ-CQ54:/ # 

and battery_cycle_count just don't exist

@rinigus
Copy link
Author

rinigus commented Nov 3, 2024

What is weird that it manages to pass android::base::ReadFileToString(path, &buffer) above that in this case. But OK, should we just replace it with sscanf and avoid C++ exceptions?

@MarijnS95
Copy link
Contributor

Did/does the file exist for you, @rinigus (if you come directly from stock)? If the file didn't exist that function shouldn't have passed (unless it returned an empty string in this case?).

@rinigus
Copy link
Author

rinigus commented Nov 3, 2024

In short, I cannot trigger it when testing today.

I flashed back stock, booted into it for setup and once more to confirm it was OK. As its not rooted, couldn't see any vendor/persist. Then flashed new AOSP14 images with the following CycleCountBackupRestore::Read:

void CycleCountBackupRestore::Read(const std::string &path, int &cycles) {
    std::string buffer;

    LOG(WARNING) << "Test Access: "  << access(path.c_str(), F_OK);

    if (!android::base::ReadFileToString(path, &buffer)) {
        LOG(WARNING) << "Failed to read battery cycles from " << path;
        return;
    }

    LOG(WARNING) << "Test Buffer: '" << buffer << "'";

    buffer = ::android::base::Trim(buffer);
    try {
        cycles = std::stoi(buffer);
    } catch (std::out_of_range &e) {
        LOG(WARNING) << "Battery cycle count in persist storage file is out of bounds: " << path;
        return;
    } catch (std::invalid_argument &e) {
        LOG(WARNING) << "Data format is wrong in persist storage file: " << path;
        return;
    } catch (...) {
        LOG(WARNING) << "Data format is wrong in persist storage file and exception sneaked through: " << path;
        LOG(WARNING) << "Failed Buffer: '" << buffer << "'";
        LOG(WARNING) << "Failed Access: "  << access(path.c_str(), F_OK);
        return;
    }
    LOG(VERBOSE) << "Read " << cycles << " battery cycles from " << path;
}

Corresponding logcat grep:

01-01 00:00:12.508   786   786 W [email protected]: Test Access: 0
--
01-01 00:00:12.518   786   786 W [email protected]: Test Buffer: '4'
01-01 00:00:12.518   786   786 W [email protected]: Test Access: -1
01-01 00:00:12.519   786   786 W [email protected]: Failed to read battery cycles from /sys/class/power_supply/battery/cycle_count
01-01 00:00:12.519   786   786 W [email protected]: Error writing battery cycles to /sys/class/power_supply/battery/cycle_count: No such file or directory
01-01 00:00:12.527   786   786 W [email protected]: Read max battery capacity from sysfs error: No such file or directory
01-01 00:00:12.528   786   786 I [email protected]: default instance initializing with healthd_config...

So, here it is nicely caught before proceeding to stoi. Which doesn't help me in terms of reproducing it.

But coming back to the question: should we replace it with sscanf as C++ exceptions were failing? I can check whether they still fail, though...

@rinigus
Copy link
Author

rinigus commented Nov 3, 2024

I have added the following snippet into the CycleCountBackupRestore::Read:

    int testi;
    try {
        testi = std::stoi("hello"); 
    } catch (std::out_of_range &e) {
        LOG(WARNING) << "Test std::out_of_range";
    } catch (std::invalid_argument &e) {
        LOG(WARNING) << "Test std::invalid_argument";
    } catch (...) {
        LOG(WARNING) << "Test all catch";
    }

and have in the logs:

01-01 00:00:04.126   796   796 W [email protected]: Test all catch

i.e. the original issue is still there - we don't catch the corresponding exception.

@rinigus
Copy link
Author

rinigus commented Nov 3, 2024

The bigger issue is that the cycles file is never made as we have permission denied errors:

01-01 00:00:09.259     0     0 W healthd : batteryCapacityLevelPath not found
--
01-01 00:00:09.259     0     0 W healthd : chargingPolicyPath not found
01-01 00:00:09.259  1406  1406 W [email protected]: Error writing battery cycles to /sys/class/power_supply/battery/cycle_count: Permission denied
01-01 00:00:09.260  1406  1406 W [email protected]: Write max battery capacity to sysfs error: Permission denied
01-01 00:00:09.261  1406  1406 I [email protected]: default instance initializing with healthd_config...
01-01 00:00:09.261  1406  1406 I HidlServiceManagement: Registered [email protected]::IHealth/default
01-01 00:00:09.261  1406  1406 I HidlServiceManagement: Removing namespace from process name [email protected] to [email protected].
01-01 00:00:09.262  1406  1406 I [email protected]: default: Hal init done

And similar permission denied errors later in the log for "Write max battery capacity"

@MarijnS95
Copy link
Contributor

You shouldn't be able to create a file in this location in sysfs, it's an attribute of the battery driver/device.

@bartcubbins
Copy link
Contributor

But OK, should we just replace it with sscanf and avoid C++ exceptions?

Please use android::base::ParseInt instead

@rinigus
Copy link
Author

rinigus commented Nov 3, 2024

Sure. Will prepare PR with it.

@rinigus
Copy link
Author

rinigus commented Nov 3, 2024

Weird, for some reason this method (https://github.com/sonyxperiadev/device-sony-common/blob/master/hardware/health/CycleCountBackupRestore.cpp#L78) is not called anymore. Is it normal? Don't want to start flashing stock and getting back to AOSP from it just to test it.

I have added few log calls to test, but never got anything from it in logcat.

@bartcubbins
Copy link
Contributor

The thing is... DSP charger firmware never increments cycle_count on Nagara, Yodo... and most likely on other devices with QTI Glink battery charger

@rinigus
Copy link
Author

rinigus commented Nov 4, 2024

Does it mean that this method is not even called on a regular boot? Since that what it looks like for me right now and I cannot figure out why.

@bartcubbins
Copy link
Contributor

I can clearly see that the method is called and prints a warning which is acceptable.
I checked this literally just now on the fresh build

01-01 00:00:04.481   794   794 W [email protected]: Failed to read battery cycles from /mnt/vendor/persist/battery/battery_cycle_count
01-01 00:00:04.481   794   794 W [email protected]: Failed to read battery cycles from /sys/class/power_supply/battery/cycle_count

Please provide me with the full logcat. I'm interested to understand what's going on there

@rinigus
Copy link
Author

rinigus commented Nov 4, 2024

Thank you very much! Will be able to provide it tonight.

@rinigus
Copy link
Author

rinigus commented Nov 4, 2024

Please find attached full logcat. That's using current CycleCountBackupRestore::Read as it is in common repo (no patches).

Few changes in my tree:

  • commented out starting fingerprint in vendor/oss/fingerprint/[email protected]
  • added to ODM blobs:
    • lib64/libcld80211.so (allows to wifi / bt to initialize)
    • /odm/bin/modemManager (not working yet, I think)
    • /odm/bin/mstatdaemon (not working yet, I think)

logs-14-nov4-2.gz

@rinigus
Copy link
Author

rinigus commented Nov 5, 2024

OK, on the last boot I can see the debug messages again from CycleCountBackupRestore::Read. Will work on PR for this.

@bartcubbins
Copy link
Contributor

bartcubbins commented Nov 10, 2024

I have added the following snippet into the CycleCountBackupRestore::Read:

    int testi;
    try {
        testi = std::stoi("hello"); 
    } catch (std::out_of_range &e) {
        LOG(WARNING) << "Test std::out_of_range";
    } catch (std::invalid_argument &e) {
        LOG(WARNING) << "Test std::invalid_argument";
    } catch (...) {
        LOG(WARNING) << "Test all catch";
    }

and have in the logs:

01-01 00:00:04.126   796   796 W [email protected]: Test all catch

i.e. the original issue is still there - we don't catch the corresponding exception.

I converted our Health HAL to AIDL and now I catch the proper invalid_argument exception
03-22 16:48:50.591 803 803 W android.hardware.health-service.sony: Test std::invalid_argument

@rinigus
Copy link
Author

rinigus commented Nov 10, 2024

Excellent! I think it is way better than my submitted PR, thank you very much. Is this conversion to AIDL already merged?

@bartcubbins
Copy link
Contributor

Not yet, it's in my private branch. I have a lot of different changes here besides this one that need to be sorted out

@MarijnS95
Copy link
Contributor

I converted our Health HAL to AIDL and now I catch the proper invalid_argument exception
03-22 16:48:50.591 803 803 W android.hardware.health-service.sony: Test std::invalid_argument

It is hard to believe that converting from a HIDL to AIDL HAL has an effect on how exceptions are handled in C++ which should remain mostly unchanged. Unless compilation flags for our own .cpp files are changed, a different STL is linked, or...?

@bartcubbins just as a sanity-check, have you made sure that the exception isn't caught for you in the previous HIDL variant, and it isn't some configuration/environment "issue" / mismatch on Rinigus' side?

@bartcubbins
Copy link
Contributor

I'm 1000% sure. Just checked again. I have the same behavior as @rinigus
03-23 18:27:34.075 797 797 W [email protected]: Test all catch

@MarijnS95
Copy link
Contributor

Thanks for confirming, it is still very confusing and unexpected 😅. Looking forward to seeing your changes to i.e. Android.bp that might have an effect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants