Skip to content

[REPORT]Multi-Threaded Race Condition bug found in src/modules/navigator/geofence.cpp cause MissionFeasibilityChecker useless

High
bkueng published GHSA-p74g-gvg5-6pgc Mar 14, 2024

Package

PX4-Autopilot (Pixhawk )

Affected versions

1.14.0

Patched versions

None

Description

Summary

Due to the lack of synchronization mechanism for loading geofence data,we identified a Race Condition vulnerability in the geofence.cpp and mission_feasibility_checker.cpp.This will result in the drone uploading overlapping geofences and mission routes.But, as expected, MissionFeasibilityChecker should prevent the uploading.

Details

  1. When the ground control station upload mission or geofence to the drone.In geofence.cpp, there is a state machine designed to read geofence data from ourb and load it onto the drone, as illustrated in the following picture.

    This functionality is called within a while(1) loop in navigator_main.cpp.


    The read and load operations within it utilize asynchronous reading, ensuring that they do not block the function loop.
    success = _dataman_client.readAsync(DM_KEY_FENCE_POINTS, 0, reinterpret_cast<uint8_t *>(&_stats),
    sizeof(mission_stats_entry_s));

    case DatamanState::Load:
    _dataman_cache.update();
    if (!_dataman_cache.isLoading()) {
    _dataman_state = DatamanState::UpdateRequestWait;
    _updateFence();
    _fence_updated = true;
    }

  2. When the ground control station upload mission or geofence to the drone.In mission_feasibility_checker.cpp, synchronous reading is employed to retrieve updates for the mission, and to check for potential conflicts between the mission and geofence.

    bool success = _dataman_client.readSync((dm_item_t)mission.dataman_id, i, reinterpret_cast<uint8_t *>(&missionitem),
    sizeof(mission_item_s));

    This functionality is called within a while(1) loop in navigator_main.cpp too.
    for (unsigned int i = 0; i < NAVIGATOR_MODE_ARRAY_SIZE; i++) {
    if (_navigation_mode_array[i]) {
    _navigation_mode_array[i]->run(_navigation_mode == _navigation_mode_array[i]);
    }
    }

  3. This is the cause of the issue. When on the ground control station, both the geofence and mission are updated simultaneously. The geofence undergoes multiple state transitions through the state machine (with each state transition requiring the execution of a while(1) loop), and it is only after asynchronous reading that the geofence data can be loaded for the mission_feasibility_checker to examine.
    Therefore, when the mission_feasibility_checker utilizes synchronous reading to obtain mission data, the geofence data involved in the check has not yet been updated. Consequently, the check is performed with outdated geofence data, resulting in the failure of the checker.

Verification

I added some debug output to watch the drone's geofence data update.

We have observed that, after clicking "UPLOAD" in QGC, the updating of the geofence lags behind the mission updates and feasibility_check by several rounds in the loop.

When the drone is executing a mission and the user clicks UPLOAD,the console will output the following content:

...
INFO  [navigator] loop
INFO  [navigator] loop
INFO  [navigator] update mission    //load the new mission data
INFO  [navigator] loop
INFO  [navigator] check feasibility   //triger check function
INFO  [navigator] loop   
INFO  [navigator] loop             
INFO  [navigator] loop   
INFO  [navigator] loop   
INFO  [navigator] update geofence  //Here load geofence data,but the feasibility checking has finshed
INFO  [navigator] loop
INFO  [navigator] loop
...

Temporary Patch

Now, the check function is only present in the mission module. This is unreasonable.

When GCS update the geofence, we need the mission feasibility checks to be run immediately.

We can accomplish the above by simply adding the same check to the geofence.Like this:Add mission point check when update the geofence #22531

In the main branch now, someone noticed the potential issue with this checker's effectiveness. However, their solution is to perform the check again just before the drone takes off. This can address some problems since, after some time, the geofence data is updated to the latest. However, they did not recognize that the fundamental cause of the issue is data race due to multithreading, and this approach introduces new problems. We and the contributor discussed this in #1261.

Currently, the same upload button doesn't trigger a check when updating only the fence, only updating the mission triggers a check, which I think is patently unreasonable.
For example, a scenario like this: a user plans a task in advance and passes the check, is ready to execute it some time later, but doesn't get a notification that the task is not compliant until just before execution.

Impact

  • Due to multiple threads accessing the geofence data, checher is dysfunctional.
  • Due to this vulnerability, users may unintentionally or intentionally execute missions within the no-fly zone after performing the following sequence of actions: triggering the vulnerability to bypass mission feasibility check, uploading waypoints within the no-fly zone, setting the home point within the no-fly zone, turn to return mode to bring the drone into the no-fly zone, and then switching to mission mode, causing the drone to execute the mission within the no-fly zone starting from the nearest waypoint.
20240109151442.-.Compressed.with.FlexClip.1.mp4

Severity

High

CVSS overall score

This score calculates overall vulnerability severity from 0 to 10 and is based on the Common Vulnerability Scoring System (CVSS).
/ 10

CVSS v3 base metrics

Attack vector
Network
Attack complexity
Low
Privileges required
High
User interaction
Required
Scope
Changed
Confidentiality
None
Integrity
High
Availability
High

CVSS v3 base metrics

Attack vector: More severe the more the remote (logically and physically) an attacker can be in order to exploit the vulnerability.
Attack complexity: More severe for the least complex attacks.
Privileges required: More severe if no privileges are required.
User interaction: More severe when no user interaction is required.
Scope: More severe when a scope change occurs, e.g. one vulnerable component impacts resources in components beyond its security scope.
Confidentiality: More severe when loss of data confidentiality is highest, measuring the level of data access available to an unauthorized user.
Integrity: More severe when loss of data integrity is the highest, measuring the consequence of data modification possible by an unauthorized user.
Availability: More severe when the loss of impacted component availability is highest.
CVSS:3.1/AV:N/AC:L/PR:H/UI:R/S:C/C:N/I:H/A:H

CVE ID

No known CVE

Weaknesses

Credits