Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Ensure consistent datetime handling during CSV import #3244

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jkppr
Copy link
Collaborator

@jkppr jkppr commented Dec 9, 2024

This PR addresses a bug where the CSV importer would fail with a TypeError when encountering certain datetime formats with timezone offsets, even if those formats were individually parseable by dateutil. The error stemmed from Pandas' inconsistent timezone inference across chunks of the DataFrame.

The fix adds utc=True to the pandas.to_datetime call within the read_and_validate_csv function in timesketch/lib/utils.py. This forces all parsed datetimes to be explicitly represented in UTC, preventing timezone-related parsing errors and ensuring consistent datetime handling.

Key Benefits:

  • Improves data consistency by storing all datetimes as UTC.
  • Prevents TypeError during CSV import for a broader range of datetime formats.

This change is backward compatible and should not affect existing timelines or functionality.

Ensure consistent datetime handling during CSV import
@jkppr jkppr added Backend Data import All things that are with importing data labels Dec 9, 2024
@jkppr jkppr requested a review from berggren December 9, 2024 17:58
@jkppr jkppr self-assigned this Dec 9, 2024
@jkppr jkppr requested review from jaegeral and removed request for berggren December 10, 2024 08:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Backend Data import All things that are with importing data
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant