Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Importer doesn't accept dates before 1677 or after 2262 #1617

Open
dadokkio opened this issue Feb 16, 2021 · 6 comments
Open

Importer doesn't accept dates before 1677 or after 2262 #1617

dadokkio opened this issue Feb 16, 2021 · 6 comments

Comments

@dadokkio
Copy link
Contributor

Describe the bug
Not sure if this is a real bug but I want to report this in any case.

If the data you want to import has a date before 1677 [in our case some windows log that default to 1601-01-01T00:00:00Z] or after 2262 the importer with pandas support will fail with:

ERROR:timesketch_importer.importer:Unable to change datetime, is it badly formed?
Traceback (most recent call last):
  File "/envs/mte/lib/python3.9/site-packages/pandas/core/arrays/datetimes.py", line 2085, in objects_to_datetime64ns
    values, tz_parsed = conversion.datetime_to_datetime64(data)
  File "pandas/_libs/tslibs/conversion.pyx", line 350, in pandas._libs.tslibs.conversion.datetime_to_datetime64
TypeError: Unrecognized value type: <class 'str'>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/envs/mte/lib/python3.9/site-packages/timesketch_import_client/importer.py", line 186, in _fix_data_frame
    date = pandas.to_datetime(data_frame['datetime'], utc=True)
  File "/envs/mte/lib/python3.9/site-packages/pandas/core/tools/datetimes.py", line 801, in to_datetime
    cache_array = _maybe_cache(arg, format, cache, convert_listlike)
  File "/envs/mte/lib/python3.9/site-packages/pandas/core/tools/datetimes.py", line 178, in _maybe_cache
    cache_dates = convert_listlike(unique_dates, format)
  File "/envs/mte/lib/python3.9/site-packages/pandas/core/tools/datetimes.py", line 465, in _convert_listlike_datetimes
    result, tz_parsed = objects_to_datetime64ns(
  File "/envs/mte/lib/python3.9/site-packages/pandas/core/arrays/datetimes.py", line 2090, in objects_to_datetime64ns
    raise e
  File "/envs/mte/lib/python3.9/site-packages/pandas/core/arrays/datetimes.py", line 2075, in objects_to_datetime64ns
    result, tz_parsed = tslib.array_to_datetime(
  File "pandas/_libs/tslib.pyx", line 364, in pandas._libs.tslib.array_to_datetime
  File "pandas/_libs/tslib.pyx", line 586, in pandas._libs.tslib.array_to_datetime
  File "pandas/_libs/tslib.pyx", line 582, in pandas._libs.tslib.array_to_datetime
  File "pandas/_libs/tslib.pyx", line 558, in pandas._libs.tslib.array_to_datetime
  File "pandas/_libs/tslibs/np_datetime.pyx", line 113, in pandas._libs.tslibs.np_datetime.check_dts_bounds
pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1601-01-01 00:00:05

It seems that valid range are:

>> pd.Timestamp.min
Timestamp('1677-09-22 00:12:43.145225')
>> pd.Timestamp.max
Timestamp('2262-04-11 23:47:16.854775807')

Expected behavior
Default the line to oldest supported value or skip the line with a warning

Used lib for import
timesketch-api-client==20210205
timesketch-import-client==20210215

@dadokkio dadokkio added the Bug label Feb 16, 2021
@kiddinn kiddinn self-assigned this Feb 16, 2021
@kiddinn
Copy link
Contributor

kiddinn commented Feb 16, 2021

I opt for skipping with a warning,

@kiddinn
Copy link
Contributor

kiddinn commented Feb 18, 2021

also take into consideration #1534, since that will also change ingestion of data...

and since that will use the pandas library for ingestion, it may have similar effects when the web UI is used to import the data (since pandas is already used in the importer, which is the reason for this)

@kiddinn
Copy link
Contributor

kiddinn commented Feb 26, 2021

so to be fair, it's not a bug per se, as in these are clearly invalid dates, it's just that we need TS ingestion to handle that in a more graceful way, so that the ingestion can still take place, and perhaps these bad dates either filtered out or time set to zero in order for the ingestion to be able to be completed.

Now #1534 has been merged in, this becomes even more important to fix.

@jaegeral
Copy link
Collaborator

jaegeral commented Jul 1, 2021

@kiddinn are you planning to work on this bug? Otherwise I opt to free it up and someone else look into it maybe.

@jaegeral jaegeral added this to the 2021-12 milestone Jul 1, 2021
@kiddinn kiddinn removed their assignment Jul 1, 2021
@jaegeral jaegeral modified the milestones: 2021-12, Future Aug 22, 2022
@berggren berggren added Frontend and removed UI/UX labels Jan 2, 2023
@berggren berggren removed this from the Future milestone Sep 25, 2023
@jaegeral
Copy link
Collaborator

I will write some unit tests and e2e tests around this.

@jaegeral jaegeral self-assigned this Sep 12, 2024
@jaegeral
Copy link
Collaborator

#3179

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants