Skip to content

Commit

Permalink
fuzzy all working
Browse files Browse the repository at this point in the history
  • Loading branch information
kensnyder committed Oct 6, 2024
1 parent 4cef706 commit cf25770
Show file tree
Hide file tree
Showing 12 changed files with 664 additions and 490 deletions.
69 changes: 34 additions & 35 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,14 @@
[![Language](https://badgen.net/static/language/TS?v=2.0.0)](https://github.com/search?q=repo:kensnyder/any-date-parser++language:TypeScript&type=code)
[![Build Status](https://github.com/kensnyder/any-date-parser/actions/workflows/workflow.yml/badge.svg?v=2.0.0)](https://github.com/kensnyder/any-date-parser/actions)
[![Code Coverage](https://codecov.io/gh/kensnyder/any-date-parser/branch/main/graph/badge.svg?v=2.0.0)](https://codecov.io/gh/kensnyder/any-date-parser)
![2000+ Tests](https://badgen.net/static/tests/2000+/green)
[![Gzipped Size](https://badgen.net/bundlephobia/minzip/any-date-parser?label=minzipped&v=2.0.0)](https://bundlephobia.com/package/[email protected])
[![Dependency details](https://badgen.net/bundlephobia/dependency-count/any-date-parser?v=2.0.0)](https://www.npmjs.com/package/any-date-parser?activeTab=dependencies)
[![Tree shakeable](https://badgen.net/bundlephobia/tree-shaking/any-date-parser?v=2.0.0)](https://www.npmjs.com/package/any-date-parser)
[![ISC License](https://badgen.net/github/license/kensnyder/any-date-parser?v=2.0.0)](https://opensource.org/licenses/ISC)

Parse a wide range of date formats including human-input dates.

Supports Node and browsers. Uses `Intl` to provide parsing support for all
installed locales.
The most comprehensive and accurate date parser for Node and browsers. It uses
`Intl` to provide parsing support for all installed locales.

## Installation

Expand Down Expand Up @@ -47,10 +46,8 @@ OR
`MaybeValidDate` has an `invalid` property if invalid, and an `isValid()`
function whether valid or not. If in v1 you simply checked for an `invalid`
property, v2 will behave the same.
- If an input string does not match any known format, it will use the current
locale and `Intl.DateTimeFormat` to attempt a fuzzy match. This allows
matching on every locale, i.e. for every date format known to the JavaScript
engine.
- If an input string does not match any known format, it will attempt a fuzzy
match, looking for date parts individually.

## Motivation

Expand All @@ -64,22 +61,7 @@ OR

There are three ways to use any-date-parser:

1.) Use a new function directly on `Date`:

- `Date.fromString(string, locale)` - Parses a string and returns a `Date`
object
- `Date.fromAny(any, locale)` - Return a `Date` object given a `Date`, `Number`
or string to parse

Example:

```ts
import 'any-date-parser';
Date.fromString('2020-10-15');
// same as new Date(2020, 9, 15, 0, 0, 0, 0)
```

2.) Use the parser object:
1.) Use the parser object: (Recommended)

- `parser.fromString(string, locale)` - Parses a string and returns a `Date`
object. It is the same function as in option 1.
Expand All @@ -94,7 +76,7 @@ parser.fromString('2020-10-15');
// same as new Date(2020, 9, 15, 0, 0, 0, 0)
```

3.) `parser` also has a function `parser.attempt(string, locale)` that
2.) `parser` also has a function `parser.attempt(string, locale)` that
returns an object with one or more integer values for the following keys: year,
month, day, hour, minute, second, millisecond, offset. _Note_ month is returned
as a normal 1-based integer, not the 0-based integer the `Date()` constructor
Expand Down Expand Up @@ -133,6 +115,21 @@ parser.attempt('');
*/
```

3.) Use a new function directly on `Date`:

- `Date.fromString(string, locale)` - Parses a string and returns a `Date`
object
- `Date.fromAny(any, locale)` - Return a `Date` object given a `Date`, `Number`
or string to parse

Example:

```ts
import 'any-date-parser';
Date.fromString('2020-10-15');
// same as new Date(2020, 9, 15, 0, 0, 0, 0)
```

4.) There are npm packages that integrate any-date-parser directly into popular
date libraries:

Expand Down Expand Up @@ -163,7 +160,7 @@ Summary:

## Locale Support

any-date-parser supports any locale that your runtime's
`any-date-parser` supports any locale that your runtime's
[Intl.DateTimeFormat](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/DateTimeFormat)
supports. In browsers that usually means the operating system language. In Node,
that means the compiled language or the icu modules included. For unit tests,
Expand Down Expand Up @@ -202,7 +199,7 @@ Check out the

## Adding custom formats

any-date-parser has an `addFormat()` function to add a custom parser.
`any-date-parser` has an `addFormat()` function to add a custom parser.

First, parsers must have `matcher` or `template`.

Expand Down Expand Up @@ -241,9 +238,9 @@ import parser, { Format } from 'any-date-parser';

parser.addFormat(
new Format({
matcher: /^Q([1-4]) (\d{4})$/, // String such as "Q4 2004"
matcher: /^(Q[1-4]) (\d{4})$/, // String such as "Q4 2004"
handler: function ([, quarter, year]) {
const monthByQuarter = { 1: 1, 2: 4, 3: 7, 4: 10 };
const monthByQuarter = { Q1: 1, Q2: 4, Q3: 7, Q4: 10 };
const month = monthByQuarter[quarter];
return { year, month };
},
Expand Down Expand Up @@ -271,9 +268,9 @@ import parser, { Format } from 'any-date-parser';

parser.addFormat(
new Format({
template: '^Q([1-4]) (_YEAR_)$', // String such as "Q4 2004"
template: '^(Q[1-4]) (_YEAR_)$', // String such as "Q4 2004"
handler: function ([, quarter, year]) {
const monthByQuarter = { 1: 1, 2: 4, 3: 7, 4: 10 };
const monthByQuarter = { Q1: 1, Q2: 4, Q3: 7, Q4: 10 };
const month = monthByQuarter[quarter];
return { year, month };
},
Expand All @@ -293,7 +290,7 @@ parser.removeFormat(dayMonth);
parser.removeFormat(fuzzy);
```

All formats names:
All exported formats:

- `time24Hours`
- `time12Hours`
Expand Down Expand Up @@ -330,6 +327,8 @@ const myParser = new Parser();
myParser.addFormats([time24Hours, yearMonthDay, ago]);
```

Note that formats will be attempted in the order they were added.

You can convert your custom parser to a function. For example:

```ts
Expand All @@ -344,12 +343,12 @@ Date.fromAny = myParser.exportAsFunctionAny();

## Unit tests

`any-date-parser` has 100% code coverage.
You can git checkout `any-date-parser` and run its tests.

- To run tests, run `npm test`
- To check coverage, run `npm run coverage`
- _Note_ - `npm test` will attempt to install full-icu and luxon globally if not
present
- _Note_ - `npm test` will attempt to install `full-icu` and `luxon` globally if
not present

## Contributing

Expand Down
2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
"demo": "npm run build && npx serve -p 5050 .",
"test": "./scripts/test.sh run",
"test-watch": "./scripts/test.sh",
"test-fuzzy": "bun ./test-fixtures/are-we-fuzzy-yet.ts",
"are-we-fuzzy-yet": "TZ=UTC bun ./test-fixtures/are-we-fuzzy-yet.ts",
"build": "npm run build:clean && npm run build:dts && npm run build:esm && npm run build:cjs",
"build:clean": "rimraf dist/ && mkdir dist",
"build:dts": "yes | npx dts-bundle-generator -o dist/index.d.ts src/main.ts",
Expand Down
19 changes: 11 additions & 8 deletions src/Format/Format.ts
Original file line number Diff line number Diff line change
Expand Up @@ -86,20 +86,20 @@ export default class Format {

/**
* Build the RegExp from the template for a given locale
* @param {String} locale The language locale such as en-US, pt-BR, zh, es, etc.
* @returns {RegExp} A RegExp that matches when this format is recognized
* @param locale The language locale such as en-US, pt-BR, zh, es, etc.
* @returns A RegExp that matches when this format is recognized
*/
getRegExp(locale = defaultLocale) {
getRegExp(locale = defaultLocale): RegExp {
if (this.template) {
if (!this.regexByLocale[locale]) {
this.regexByLocale[locale] = LocaleHelper.factory(locale).compile(
this.template
);
//console.log([locale, this.regexByLocale[locale]]);
}
// if (locale.slice(0, 2) === 'zh') {
// console.log(this.template, this.regexByLocale[locale]);
// }
if (locale.slice(0, 2) === 'zh') {
console.log(this.template, this.regexByLocale[locale]);
}
return this.regexByLocale[locale];
}
return this.matcher;
Expand Down Expand Up @@ -140,8 +140,11 @@ export default class Format {
* @returns {Object|null} Null if format can't handle this string, Object for result or error
*/
attempt(strDate: string, locale = defaultLocale): HandlerResult {
strDate = runPreprocessors(String(strDate), locale).trim();
const matches = this.getMatches(strDate, locale);
const processedDate = runPreprocessors(String(strDate), locale).trim();
if (strDate === '2021年10月15日 下午6:34:56 [UTC]') {
console.log('processedDate------------', processedDate);
}
const matches = this.getMatches(processedDate, locale);
if (matches) {
const dt = this.toDateTime(matches, locale);
return dt || null;
Expand Down
21 changes: 19 additions & 2 deletions src/LocaleHelper/LocaleHelper.ts
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,21 @@ export default class LocaleHelper {
return parseInt(latnDigitString, 10);
}

monthNameToInt(monthName: string) {
const lower = monthName.toLocaleLowerCase(this.locale).replace(/\.$/, '');
return this.lookups.month[lower] || 12;
}
h12ToInt(digitString: string | number, ampm: string) {
const meridiemOffset = this.lookups.meridiem[ampm?.toLowerCase()] || 0;
let hourInt = this.toInt(digitString);
if (hourInt < 12 && meridiemOffset === 12) {
hourInt += 12;
}
return hourInt;
}
zoneToOffset(zoneName: string) {
return this.lookups.zone[zoneName];
}
/**
* Build lookups for digits, month names, day names, and meridiems based on the locale
*/
Expand All @@ -91,9 +106,11 @@ export default class LocaleHelper {
if (!/^en/i.test(this.locale)) {
this.buildMonthNames();
this.buildDaynames();
this.buildMeridiems();
if (!/zh/i.test(this.locale)) {
this.buildMeridiems();
}
}
// if (this.locale === 'ar-SA') {
// if (this.locale === 'zh-TW') {
// console.log('lookups=====>', this);
// }
}
Expand Down
19 changes: 11 additions & 8 deletions src/data/preprocessors.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,28 +6,31 @@ const periodsInsteadOfColons = [

const preprocessors = {
ar: [[/ /g, ' ']], // Some built-in formats contain non-breaking space
bn: [[/,/g, '']],
zh: [
// in Chinese, am/pm comes before the digits
[/早上\s*([\d:]+)/, '$1am'],
[/凌晨\s*([\d:]+)/, '$1am'],
[/上午\s*([\d:]+)/, '$1am'],
[/下午\s*([\d:]+)/, '$1pm'],
[/晚上\s*([\d:]+)/, '$1pm'],
// Chinese "time"
// [/\[.+?時間]/, ''],
],
// he: [[/ב/gi, '']],
he: [[/ב/gi, '']],
// "of" in various languages
// de: [[/ um /g, '']],
// pt: [[/de /gi, '']],
// es: [[/de /gi, '']],
// da: [[/den /gi, '']],
de: [[/ um /g, '']],
pt: [[/de /gi, '']],
es: [[/de /gi, '']],
da: [[/den /gi, ''], ...periodsInsteadOfColons],
// Russian symbol after years
// ru: [[/ г\./g, '']],
ru: [[/ г\./g, '']],
th: [
// Thai "at/on"
// [/ที่/gi, ''],
[/\s*นาฬิกา\s*/i, ':'], // "hour"
[/\s*นาที\s*/i, ':'], // "minute"
[/\s*วินาที\s*/i, ''], // "second"
[/\s*วินาที\s*/i, ' '], // "second"
],
ko: [
[/\s*시\s*/, ':'], // "hour"
Expand All @@ -38,7 +41,7 @@ const preprocessors = {
],
fi: periodsInsteadOfColons,
id: periodsInsteadOfColons,
da: periodsInsteadOfColons,
// da: periodsInsteadOfColons,
};

export default preprocessors;
1 change: 0 additions & 1 deletion src/data/timezoneNames.ts
Original file line number Diff line number Diff line change
Expand Up @@ -187,7 +187,6 @@ const timezoneNames = {
WST: 480, // Western Standard Time
YAKT: 540, // Yakutsk Time
YEKT: 300, // Yekaterinburg Time
Z: 0, // Zulu Time (Coordinated Universal Time)
};

export default timezoneNames;
Loading

0 comments on commit cf25770

Please sign in to comment.