Skip to content

Commit

Permalink
fix parsing dates with months abbreviated in some locales
Browse files Browse the repository at this point in the history
Current code first attempts to detect english abbreviations for %b and
only if the string does not start with one of the months, tries
the locale specific names.

This is incorrect for some locales.
For example in fr_FR %b for november is `nov.`. Parsing `nov. 29` as `%b
%d` would fail because we wrongly assume that the month is just `nov`
and not `nov.`. Then we attempt to parse `. 29` as ` %d`.

The correct solution would be to try english and if later the string
does not match to backtrack, but this does not match the existing flow
of the code.

Instead this restricts the fast path to matching full words to the
english locale, no only prefixes.

Fixes: tstack#1086
  • Loading branch information
symphorien committed Nov 22, 2024
1 parent cce176f commit 3fa72a3
Show file tree
Hide file tree
Showing 2 changed files with 15 additions and 1 deletion.
12 changes: 11 additions & 1 deletion src/ptimec.hh
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,17 @@ bool ptime_b_slow(struct exttm* dst,
inline bool
ptime_b(struct exttm* dst, const char* str, off_t& off_inout, ssize_t len)
{
if (off_inout + 3 < len) {
// fast path to detect english abbreviated months
//
// only detect english abbreviated months if they end at a word boundary.
// if the abbreviated month in the current locale is longer than 3 letters,
// and starts with the same letters as an english locale month abbreviation,
// then the computation of off_inout is incorrect.
//
// Ex: in fr_FR november is `nov.`. Parsing `nov. 29` as `%b %d` fails if
// this fast path is taken as later we will attempt to parse `. 29` as
// ` %d`.
if (off_inout + 3 < len && isspace(str[off_inout+3])) {
auto month_start = (unsigned char*) &str[off_inout];
uint32_t month_int = ABR_TO_INT(month_start[0] & ~0x20UL,
month_start[1] & ~0x20UL,
Expand Down
4 changes: 4 additions & 0 deletions test/test_date_time_scanner.cc
Original file line number Diff line number Diff line change
Expand Up @@ -203,6 +203,7 @@ TEST_CASE("date_time_scanner")
{
const char* en_date = "Jan 1 12:00:00";
const char* fr_date = "août 19 11:08:37";
const char* fr_date2 = "nov. 29 20:23:37";
struct timeval en_tv, fr_tv;
struct exttm en_tm, fr_tm;
date_time_scanner dts;
Expand All @@ -213,6 +214,9 @@ TEST_CASE("date_time_scanner")
dts.clear();
assert(dts.scan(fr_date, strlen(fr_date), nullptr, &fr_tm, fr_tv)
!= nullptr);
dts.clear();
assert(dts.scan(fr_date2, strlen(fr_date), nullptr, &fr_tm, fr_tv)
!= nullptr);
}
}

Expand Down

0 comments on commit 3fa72a3

Please sign in to comment.