Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow unicode (Fixes #279) #539

Closed
wants to merge 45 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
5ea1532
Initial unicode regex
iansw246 Jul 13, 2023
58d6ab1
Add temporary test-issue.js
iansw246 Jul 13, 2023
e9b7d90
Fix regex for single char matches
iansw246 Jul 13, 2023
9647c9a
add testingfor repeat same unicode name
michellejtan Jul 14, 2023
53f9a75
Added emoji test in testUtils.ts, regenerated test snapshots to corre…
TravisChong Jul 14, 2023
412be3c
Cited reference in testUtils.ts, added another emoji/emoticon testcase
TravisChong Jul 14, 2023
fc39761
Skip first character when checking XID_Continue
iansw246 Jul 14, 2023
5292e9e
Add regexpu package
iansw246 Jul 14, 2023
da6a6bc
Fix regex to not use 'u' flag
iansw246 Jul 14, 2023
79d86b5
Fix one Unicode test
iansw246 Jul 14, 2023
7063422
More Unicode tests
iansw246 Jul 14, 2023
4f1c37f
updated tests with more unicode characters and refreshed snapshot
TravisChong Jul 14, 2023
78515be
Merge branch 'allow-unicode' into allow-unicode-regexpu
TravisChong Jul 14, 2023
7453006
Merge pull request #1 from TravisChong/allow-unicode-regexpu
TravisChong Jul 14, 2023
8e5d59c
Fix getting first characters of unicode string
iansw246 Jul 14, 2023
12245af
Revert to regexpu regexes
iansw246 Jul 14, 2023
45c2178
Add surrogate pairs and multi-codepoint symbol tests
iansw246 Jul 17, 2023
40d31d8
Improve comments for Unicode handling
iansw246 Jul 17, 2023
e5950b6
Added tests in e2e/unicode.ts for some of the most common langages wi…
TravisChong Jul 17, 2023
e97cc4f
Merge branch 'allow-unicode' of https://github.com/TravisChong/json-s…
TravisChong Jul 17, 2023
40e3ef7
Add tests that index should still increment for different unicode cha…
michellejtan Jul 17, 2023
45369b5
Add e2e test with surrogate pairs
iansw246 Jul 17, 2023
d47aa21
refreshed snapshots to include gothic test case
TravisChong Jul 17, 2023
22c488d
Add e2e tests with languages (Tamil, telugu, Urdu, French)
michellejtan Jul 18, 2023
87289f1
Use regex-generating script
iansw246 Jul 18, 2023
759e3a1
Two files changed
michellejtan Jul 18, 2023
1d296fa
removed temporary tests
TravisChong Jul 18, 2023
c8c8306
removed remaining temp files and corrected package.json
TravisChong Jul 18, 2023
7d8ec39
Revert yarn lock
iansw246 Jul 18, 2023
4f94694
Remove ES6-regex code
iansw246 Jul 19, 2023
d1b57c0
Rename newString to better name
iansw246 Jul 20, 2023
2effb36
Use simplified Unicode-aware regex
iansw246 Aug 30, 2023
24a2b0a
Merge remote-tracking branch 'upstream/master' into allow-unicode
iansw246 Oct 24, 2023
9bb992e
Use identifierfy
iansw246 Oct 24, 2023
16543f0
Update snapshots
iansw246 Oct 24, 2023
2044584
Remove regenerate
iansw246 Oct 24, 2023
82b0d20
Remove es5IdentifierRegex.ts
iansw246 Oct 24, 2023
89c1f7f
Remove redundent Unicode tests
iansw246 Oct 24, 2023
468286d
Cleanup comments, redundent code, test
iansw246 Oct 25, 2023
792abb4
Restore leading underscore behavior
iansw246 Oct 25, 2023
f55933e
Clean up emoji tests
iansw246 Oct 25, 2023
3616bd8
Improve comment
iansw246 Oct 25, 2023
c1f8bdd
Add leading number and spaces tests
iansw246 Oct 25, 2023
b8053fa
Add tests for entirely invalid identifier
iansw246 Oct 25, 2023
d587a3d
Test empty string
iansw246 Oct 25, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@
"get-stdin": "^8.0.0",
"glob": "^7.1.6",
"glob-promise": "^4.2.2",
"identifierfy": "^2.0.0",
"is-glob": "^4.0.3",
"lodash": "^4.17.21",
"minimist": "^1.2.6",
Expand Down
17 changes: 7 additions & 10 deletions src/generator.ts
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ import {
TUnion,
T_UNKNOWN,
} from './types/AST'
import {log, toSafeString} from './utils'
import {log} from './utils'

export function generate(ast: AST, options = DEFAULT_OPTIONS): string {
return (
Expand Down Expand Up @@ -165,7 +165,7 @@ function generateRawType(ast: AST, options: Options): string {
log('magenta', 'generator', ast)

if (hasStandaloneName(ast)) {
return toSafeString(ast.standaloneName)
return ast.standaloneName
}

switch (ast.type) {
Expand Down Expand Up @@ -333,9 +333,9 @@ function generateStandaloneEnum(ast: TEnum, options: Options): string {
(hasComment(ast) ? generateComment(ast.comment, ast.deprecated) + '\n' : '') +
'export ' +
(options.enableConstEnums ? 'const ' : '') +
`enum ${toSafeString(ast.standaloneName)} {` +
`enum ${ast.standaloneName} {` +
'\n' +
ast.params.map(({ast, keyName}) => keyName + ' = ' + generateType(ast, options)).join(',\n') +
ast.params.map(({ast, keyName}) => escapeKeyName(keyName) + ' = ' + generateType(ast, options)).join(',\n') +
'\n' +
'}'
)
Expand All @@ -344,9 +344,9 @@ function generateStandaloneEnum(ast: TEnum, options: Options): string {
function generateStandaloneInterface(ast: TNamedInterface, options: Options): string {
return (
(hasComment(ast) ? generateComment(ast.comment, ast.deprecated) + '\n' : '') +
`export interface ${toSafeString(ast.standaloneName)} ` +
`export interface ${ast.standaloneName} ` +
(ast.superTypes.length > 0
? `extends ${ast.superTypes.map(superType => toSafeString(superType.standaloneName)).join(', ')} `
? `extends ${ast.superTypes.map(superType => superType.standaloneName).join(', ')} `
: '') +
generateInterface(ast, options)
)
Expand All @@ -355,10 +355,7 @@ function generateStandaloneInterface(ast: TNamedInterface, options: Options): st
function generateStandaloneType(ast: ASTWithStandaloneName, options: Options): string {
return (
(hasComment(ast) ? generateComment(ast.comment) + '\n' : '') +
`export type ${toSafeString(ast.standaloneName)} = ${generateType(
omit<AST>(ast, 'standaloneName') as AST /* TODO */,
options,
)}`
`export type ${ast.standaloneName} = ${generateType(omit<AST>(ast, 'standaloneName') as AST /* TODO */, options)}`
)
}

Expand Down
5 changes: 5 additions & 0 deletions src/types/identifierfy.d.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
// Stub
declare module 'identifierfy' {
const identifierfy: (name: string, options?: any) => string | null;
export default identifierfy;
}
21 changes: 6 additions & 15 deletions src/utils.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import {deburr, isPlainObject, trim, upperFirst} from 'lodash'
import {isPlainObject, upperFirst} from 'lodash'
import {basename, dirname, extname, normalize, sep, posix} from 'path'
import {JSONSchema, LinkedJSONSchema, Parent} from './types/JSONSchema'
import identifierfy from 'identifierfy'

// TODO: pull out into a separate package
export function Try<T>(fn: () => T, err: (e: Error) => any): T {
Expand Down Expand Up @@ -159,25 +160,15 @@ export function stripExtension(filename: string): string {
* can safely be used as a TypeScript interface or enum name.
*/
export function toSafeString(string: string) {
// identifiers in javaScript/ts:
// First character: a-zA-Z | _ | $
// Rest: a-zA-Z | _ | $ | 0-9

return upperFirst(
// remove accents, umlauts, ... by their basic latin letters
deburr(string)
// replace chars which are not valid for typescript identifiers with whitespace
.replace(/(^\s*[^a-zA-Z_$])|([^a-zA-Z_$\d])/g, ' ')
// Convert to valid identifier
identifierfy(string)
// uppercase leading underscores followed by lowercase
.replace(/^_[a-z]/g, match => match.toUpperCase())
?.replace(/^_[a-z]/g, match => match.toUpperCase())
// remove non-leading underscores followed by lowercase (convert snake_case)
.replace(/_[a-z]/g, match => match.substr(1, match.length).toUpperCase())
// uppercase letters after digits, dollars
.replace(/([\d$]+[a-zA-Z])/g, match => match.toUpperCase())
// uppercase first letter after whitespace
.replace(/\s+([a-zA-Z])/g, match => trim(match.toUpperCase()))
// remove remaining whitespace
.replace(/\s/g, ''),
.replace(/([\d$]+[a-zA-Z])/g, match => match.toUpperCase()),
)
}

Expand Down
81 changes: 76 additions & 5 deletions test/__snapshots__/test/test.ts.md
Original file line number Diff line number Diff line change
Expand Up @@ -448503,15 +448503,15 @@ Generated by [AVA](https://avajs.dev).
* This interface was referenced by \`SafeTypeNames\`'s JSON-Schema␊
* via the \`definition\` "5tartsWithDigit".␊
*/␊
export interface TartsWithDigit {␊
export interface _5TartsWithDigit {␊
a?: string;␊
[k: string]: unknown;␊
}␊
/**␊
* This interface was referenced by \`SafeTypeNames\`'s JSON-Schema␊
* via the \`definition\` " 5tartsWithBlankAndDigit".␊
*/␊
export interface TartsWithBlankAndDigit {␊
export interface _5TartsWithBlankAndDigit {␊
a?: string;␊
[k: string]: unknown;␊
}␊
Expand Down Expand Up @@ -448821,12 +448821,83 @@ Generated by [AVA](https://avajs.dev).
* and run json-schema-to-typescript to regenerate this file.␊
*/␊
export type NoName1 = string;␊
export type میںنےگوگلٹرانسلیٹاستعمالکیا = number;␊
export type 哈哈 = string;␊
export type 𐌼𐌰𐌲𐌲𐌻𐌴𐍃𐌹̈𐍄𐌰𐌽𐌽𐌹𐌼𐌹𐍃𐍅𐌿𐌽𐌳𐌰𐌽𐌱𐍂𐌹𐌲𐌲𐌹𐌸 = string;␊
export type UtilicéElTraductorDe = boolean;␊
export type ကျွန်တော်Translateသုံးပါတယ် = string;␊
export type 나는구글번역을사용했다 = string;␊
export type 私は翻訳を使用しました = string;␊
export type השתמשתיבגוגלתרגום = string;␊
export type ΧρησιμοποίησαΤοTranslate = string;␊
export type ԵսՕգտագործելԵմTranslateԸ = string;␊
export type لقداستخدمتمترجمجوجل = string;␊
export type ผมใช้แปล = string;␊
export type ЯИспользовалTranslateCom = string;␊
export type আমিTranslateComব্যবহারকরেছি = string;␊
export type ഞാൻഗൂഗിൾവിവർത്തനംഉപയോഗിച്ചു = string;␊
export type Ე = string;␊
export type ངསསྔོནཆད = string;␊
export type ຂ້ອຍໃຊ້ແປພາສາ = string;␊
export type ខ្ញុំបានប្រើបកប្រែ = string;␊
export type БиTranslateАшигласан = string;␊
export type मैंनेगूगलअनुवादकाइस्तेमालकिया = string;␊
export type நான்கூகல்மொழிப்பெயர்ப்பைபயன்படுத்தினேன் = string;␊
export type నేనుఅనువాదంఉపయోగించాను = string;␊
export type میںنےگوگلٹرانسلیٹاستعمالکیا1 = string;␊
export type NoName = number;␊
export type DoesnTHaveOrOr = string;␊
export interface NoName {␊
someKey?: NoName1;␊
export interface 呵呵 {␊
"Unicode property ÄÖÉÜß 𐌼𐌰𐌲"?: میںنےگوگلٹرانسلیٹاستعمالکیا;␊
chinese?: 哈哈;␊
"this is 'I can eat glass in Gothic' apparently"?: 𐌼𐌰𐌲𐌲𐌻𐌴𐍃𐌹̈𐍄𐌰𐌽𐌽𐌹𐌼𐌹𐍃𐍅𐌿𐌽𐌳𐌰𐌽𐌱𐍂𐌹𐌲𐌲𐌹𐌸;␊
spanish?: UtilicéElTraductorDe;␊
myanmar?: ကျွန်တော်Translateသုံးပါတယ်;␊
korean?: 나는구글번역을사용했다;␊
japanese?: 私は翻訳を使用しました;␊
hebrew?: השתמשתיבגוגלתרגום;␊
greek?: ΧρησιμοποίησαΤοTranslate;␊
armenian?: ԵսՕգտագործելԵմTranslateԸ;␊
arabic?: لقداستخدمتمترجمجوجل;␊
thai?: ผมใช้แปล;␊
"Russian/Cyrillic"?: ЯИспользовалTranslateCom;␊
bengali?: আমিTranslateComব্যবহারকরেছি;␊
malayalam?: ഞാൻഗൂഗിൾവിവർത്തനംഉപയോഗിച്ചു;␊
georgian?: Ე;␊
tibetian?: ངསསྔོནཆད;␊
lao?: ຂ້ອຍໃຊ້ແປພາສາ;␊
kmer?: ខ្ញុំបានប្រើបកប្រែ;␊
mongolian?: БиTranslateАшигласан;␊
hindi?: मैंनेगूगलअनुवादकाइस्तेमालकिया;␊
tamil?: நான்கூகல்மொழிப்பெயர்ப்பைபயன்படுத்தினேன்;␊
telugu?: నేనుఅనువాదంఉపయోగించాను;␊
urdu?: میںنےگوگلٹرانسلیٹاستعمالکیا1;␊
"✔✔✔"?: NoName;␊
refAndExtends?: Ე1;␊
unicodeEnums?: Ე2;␊
"🇦🇶 starts with and contains emoji"?: DoesnTHaveOrOr;␊
[k: string]: unknown;␊
}␊
export interface Ე1 {␊
/**␊
* @maxItems 5␊
*/␊
გამოვიყენე?:␊
| []␊
| [number]␊
| [number, number]␊
| [number, number, number]␊
| [number, number, number, number]␊
| [number, number, number, number, number];␊
[k: string]: unknown;␊
}␊
export const enum Ე2 {␊
"გუგლის" = "کیا",␊
"ყე" = "ე გამოვიყენე",␊
"🇦🇶antartica" = "🇦🇶 doesn't"␊
}␊
`

## union.2.js
Expand Down
Binary file modified test/__snapshots__/test/test.ts.snap
Binary file not shown.
115 changes: 112 additions & 3 deletions test/e2e/unicode.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,118 @@ export const input = {
type: 'object',
title: '呵呵',
properties: {
someKey: {
'Unicode property ÄÖÉÜß 𐌼𐌰𐌲': {
type: 'number',
title: 'میں نے گوگل ٹرانسلیٹ استعمال کیا۔',
},
chinese: {
type: 'string',
title: '哈哈'
},
'this is \'I can eat glass in Gothic\' apparently': {
type: 'string',
title: '𐌼𐌰𐌲 𐌲𐌻𐌴𐍃 𐌹̈𐍄𐌰𐌽, 𐌽𐌹 𐌼𐌹𐍃 𐍅𐌿 𐌽𐌳𐌰𐌽 𐌱𐍂𐌹𐌲𐌲𐌹𐌸.'
},
spanish: {
type: 'boolean',
title: 'Utilicé el traductor de'
},
myanmar: {
type: 'string',
title: 'ကျွန်တော် translate သုံးပါတယ်။'
},
korean: {
type: 'string',
title: '나는 구글 번역을 사용했다'
},
japanese: {
type: 'string',
title: '私は翻訳を使用しました'
},
hebrew: {
type: 'string',
title: 'השתמשתי בגוגל תרגום'
},
greek: {
type: 'string',
title: 'Χρησιμοποίησα το Translate'
},
armenian: {
type: 'string',
title: 'Ես օգտագործել եմ translate-ը'
},
arabic: {
type: 'string',
title: 'لقد استخدمت مترجم جوجل'
},
thai: {
type: 'string',
title: 'ผมใช้ แปล'
},
'Russian/Cyrillic': {
type: 'string',
title: 'Я использовал Translate.com'
},
bengali: {
type: 'string',
title: 'আমি Translate.com ব্যবহার করেছি'
},
malayalam: {
type: 'string',
title: 'ഞാൻ ഗൂഗിൾ വിവർത്തനം ഉപയോഗിച്ചു'
},
georgian: {
type: 'string',
// This string is especially tricky because identifierfy removes the entire string if passed to toSafeString twice
title: 'Მე გამოვიყენე გუგლის თარგმანი'
},
tibetian: {
type: 'string',
title: 'ངས་སྔོན་ཆད་'
},
lao: {
type: 'string',
title: 'ຂ້ອຍໃຊ້ ແປພາສາ'
},
kmer: {
type: 'string',
title: 'ខ្ញុំបានប្រើ បកប្រែ'
},
mongolian: {
type: 'string',
title: 'Би Translate ашигласан'
},
hindi: {
type: 'string',
title: 'मैंने गूगल अनुवाद का इस्तेमाल किया'
},
tamil: {
type: 'string',
title: 'நான் கூகல் மொழிப்பெயர்ப்பை பயன்படுத்தினேன்'
},
telugu: {
type: 'string',
title: 'నేను అనువాదం ఉపయోగించాను'
},
urdu: {
type: 'string',
title: 'میں نے گوگل ٹرانسلیٹ استعمال کیا۔',
},
'✔✔✔': {
type: 'integer',
title: '𝄇𝄇𝄇'
},
refAndExtends: {
$ref: 'test/resources/UnicodeSchemaΔЙק๗あ叶葉.json',
},
unicodeEnums: {
enum: ['کیا', 'ე გამოვიყენე', '🇦🇶 doesn\'t'],
tsEnumNames: ['გუგლის', 'ყე', '🇦🇶antartica'],
title: 'ე გამოვიყენე',
},
'🇦🇶 starts with and contains emoji': {
type: 'string',
title: '哈哈',
title: '🇦🇶 doesn\'t have 𝄰 𝄱 𝄲 𝄳 𝄴 𝄵 or 🀀 🀁 🀂 🀃 🀄 or 🏴󠁧󠁢󠁷󠁬󠁳󠁿'
},
},
}
}
12 changes: 12 additions & 0 deletions test/resources/UnicodeSchemaΔЙק๗あ叶葉.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
{
"title": "Მე გამოვიყენე გუგლის თარგმანი",
"type": "object",
"properties": {
"გამოვიყენე": {
"items": {
"type": "number"
},
"maxItems": 5
}
}
}
20 changes: 20 additions & 0 deletions test/testUtils.ts
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,31 @@ export function run() {
t.is(generateName('ABcd', usedNames), 'ABcd')
t.is(generateName('$Abc_123', usedNames), '$Abc_123')
t.is(generateName('Abc-de-f', usedNames), 'AbcDeF')
t.is(generateName(' 412Abc-de-f', usedNames), '_412AbcDeF')

// Unicode tests. See https://mathiasbynens.be/notes/javascript-identifiers-es6 to confirm results
t.is(generateName('呵呵', usedNames), '呵呵')
t.is(generateName('abc 𝄇 de-fg', usedNames), 'AbcDeFg')
t.is(generateName('Abcಠ_ಠde-fgh๏_๏', usedNames), 'Abcಠ_ಠdeFgh_')
t.is(generateName('ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝ', usedNames), 'ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝ')
t.is(generateName('ÄÖÉÜß', usedNames), 'ÄÖÉÜß')
// Surrogate pairs at start
t.is(generateName('𝄀 𝄁 𝄂 𝄃 𝄄 𝄅 𝄆 𝄇 𝄈 𝄉 𝄊 music', usedNames), 'Music')
// Multiple Unicode codepoints
// Emoji flags use two regional indicator symbols
t.is(generateName('🇳🇵 Emoji flags 🇦🇩', usedNames), 'EmojiFlags')
// Regional flags like England use emoji tag sequences
t.is(generateName(' 🏴󠁧󠁢󠁥󠁮󠁧󠁿 england 🏴󠁧󠁢󠁳󠁣󠁴󠁿', usedNames), 'England')

// Index should increment:
t.is(generateName('a', usedNames), 'A1')
t.is(generateName('a', usedNames), 'A2')
t.is(generateName('a', usedNames), 'A3')
t.is(generateName('🇳🇵 Emoji flags 🇦🇩', usedNames), 'EmojiFlags1')
t.is(generateName('🇳🇵 Emoji flags 🇦🇩', usedNames), 'EmojiFlags2')

t.is(generateName('', usedNames), 'NoName')
t.is(generateName('𝄇𝄇𝄇', usedNames), 'NoName1')
})
test('isSchemaLike', t => {
const schema = link({
Expand Down
Loading