You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When importing a set of Plaso files at the same time (concurrent HTTP importations), one can reach to the following
undesired state: In the Timesketch database, the table timeline_status contains (at least) 2 lines with the same parent_id:
An "old" one (related to the created_at column), not deleted, that has its status set to processing,
The latest one (related to the created_at column) with the status set to ready.
The ready line shows that the import is already done, but the processing line is still there: This is the bug.
At the frontend side, the related timeline permanently shows a spinning wheel, expecting the end of the (already
terminated) process.
Actually, I can identify several cases with the same parent_id in the timeline_status table, which is already
problematic
However, the bug becomes visible in the frontend when one of the less recent line's status is set to processing.
I suspect this is linked to concurrent access to the database from workers when importing Plaso data.
I mean, I think no transaction protects the access to the timeline_status table.
defset_status(self, status):
""" Set status on object. Although this is a many-to-many relationship this makes sure that the parent object only has one status set. Args: status: Name of the status """# TODO Fix refresh self.status now.for_statusinself.status:
self.status.remove(_status)
self.status.append(self.Status(user=None, status=status))
db_session.add(self)
db_session.commit()
To Reproduce
Steps to reproduce the behavior:
Create a campaign,
Create a timeline,
At the same time (concurrent HTTP requests), import a set of Plaso files to this timeline.
The problem cannot be reproduced systematically and Plaso files sizes can be small.
Expected behavior
The processing line should have been removed from the timeline_status table when the import is done, and the ready
line should be the only one remaining.
In such a case, the spinning wheel should disappear from the GUI, making data available for this timeline.
More generally in the database, a timeline should never have more than a single timeline_status line.
Screenshots
Desktop (please complete the following information):
OS: Windows
Browser Firefox
Version 115.15.0esr (64 bits)
Note: The desktop set-up is not related to the problem as it is a backend bug.
Below is a set of SQL queries to easily identify the bug:
Getting timelines with the bug, that is:
Having more than one timeline_status,
Having a timeline_status one with the status set to processing,
This timeline_status is not the most recent one.
selectsketch.id,
sketch.updated_at,
sketch.name,
timeline.id,
timeline.updated_at,
timeline.name,
timeline.searchindex_id,
timeline_status.id,
timeline_status.updated_at,
timeline_status.status,
timeline_status.rankfrom (
select*from (
select*, row_number() over (partition by parent_id order by created_at desc) rank
from timeline_status
where parent_id is not null
) duplicates
whereduplicates.rank>1andduplicates.status='processing'order by created_at
) timeline_status
inner join timeline ontimeline_status.parent_id=timeline.idinner join sketch ontimeline.sketch_id=sketch.id;
Fixing the bug, that is, deleting the timeline_status lines with the status set to processing when they are not
the most recent one.
deletefrom timeline_status where id in (
select id
from (
select*, row_number() over (partition by parent_id order by created_at desc) rank
from timeline_status
where parent_id is not null
) duplicates
whereduplicates.rank>1andduplicates.status='processing'
);
timeline_status lines for timelines bound to more than one with the processing bug:
selectsketch.id,
sketch.updated_at,
sketch.name,
timeline.id,
timeline.updated_at,
timeline.name,
timeline.searchindex_id,
timeline_status.id,
timeline_status.updated_at,
timeline_status.statusfrom (
select
parent_id as id
from (
select
parent_id,
status,
row_number() over (partition by parent_id order by created_at desc) as rank
from timeline_status
where parent_id is not null
) grouped_timeline_status
wheregrouped_timeline_status.rank>1andgrouped_timeline_status.status='processing'group by parent_id
order by parent_id
) problematic_timeline
inner join timeline_status onproblematic_timeline.id=timeline_status.parent_idinner join timeline onproblematic_timeline.id=timeline.idinner join sketch ontimeline.sketch_id=sketch.idorder bysketch.id,
timeline.id,
timeline_status.created_at;
All timeline_status lines for timelines bound to more than one (a more general problem):
selectsketch.id,
sketch.updated_at,
sketch.name,
timeline.id,
timeline.updated_at,
timeline.name,
timeline.searchindex_id,
timeline_status.id,
timeline_status.updated_at,
timeline_status.statusfrom (
select
parent_id as id
from (
select
parent_id,
row_number() over (partition by parent_id order by created_at desc) as rank
from timeline_status
where parent_id is not null
) grouped_timeline_status
wheregrouped_timeline_status.rank>1group by parent_id
order by parent_id
) problematic_timeline
inner join timeline_status onproblematic_timeline.id=timeline_status.parent_idinner join timeline onproblematic_timeline.id=timeline.idinner join sketch ontimeline.sketch_id=sketch.idorder bysketch.id,
timeline.id,
timeline_status.created_at;
Distribution of timelines with more than a single timeline_status:
selectsketch.name,
timeline.name,
timeline_status_stat.countfrom
sketch
inner join timeline onsketch.id=timeline.sketch_idinner join (
select parent_id,
count(1) as count
from timeline_status
where parent_id is not nullgroup by parent_id
havingcount(1) >1
) timeline_status_stat ontimeline.id=timeline_status_stat.parent_idorder bytimeline_status_stat.countdesc,
sketch.name,
timeline.name;
The text was updated successfully, but these errors were encountered:
Describe the bug
When importing a set of Plaso files at the same time (concurrent HTTP importations), one can reach to the following
undesired state: In the Timesketch database, the table
timeline_status
contains (at least) 2 lines with the sameparent_id
:created_at
column), not deleted, that has itsstatus
set to processing,created_at
column) with thestatus
set to ready.The ready line shows that the import is already done, but the processing line is still there: This is the bug.
At the frontend side, the related timeline permanently shows a spinning wheel, expecting the end of the (already
terminated) process.
Actually, I can identify several cases with the same
parent_id
in thetimeline_status
table, which is alreadyproblematic
However, the bug becomes visible in the frontend when one of the less recent line's
status
is set to processing.I suspect this is linked to concurrent access to the database from workers when importing Plaso data.
I mean, I think no transaction protects the access to the
timeline_status
table.I think the bug is precisely here (see the
TODO
comment) (timesketch/models/annotations.py):To Reproduce
Steps to reproduce the behavior:
The problem cannot be reproduced systematically and Plaso files sizes can be small.
Expected behavior
The processing line should have been removed from the
timeline_status
table when the import is done, and the readyline should be the only one remaining.
In such a case, the spinning wheel should disappear from the GUI, making data available for this timeline.
More generally in the database, a
timeline
should never have more than a singletimeline_status
line.Screenshots
Desktop (please complete the following information):
Note: The desktop set-up is not related to the problem as it is a backend bug.
Additional context
GET /api/v1/version
:Below is a set of SQL queries to easily identify the bug:
Getting timelines with the bug, that is:
timeline_status
,timeline_status
one with thestatus
set to processing,timeline_status
is not the most recent one.Fixing the bug, that is, deleting the
timeline_status
lines with thestatus
set to processing when they are notthe most recent one.
timeline_status
lines fortimelines
bound to more than one with the processing bug:All
timeline_status
lines fortimelines
bound to more than one (a more general problem):Distribution of timelines with more than a single
timeline_status
:The text was updated successfully, but these errors were encountered: