-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Go Case Conversion #98
base: master
Are you sure you want to change the base?
Conversation
sergiosalvatore
commented
Oct 4, 2024
- provide the ability to augment or replace the set of initialisms that are used
- provide an EncodeCasingFunc for the Go style
- introduce the concept of "atoms" so that Go will not attempt to break up names that we want to be kept together.
102af63
to
fdd3c0f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, just nits
|
||
// SetInitialisms replaces the set of initialisms used by the GoCaseConverter | ||
// with the argument. | ||
func (g *GoCaseConverter) SetInitialisms(initialisms []string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to set or recommend any limits to max number of initialism or atoms? (stealing your "we should have limits on everything" philosophy 😉)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a good thought -- I just put in some protections against short initialisms or atoms that would certainly cause problems. I think, given that Dials is a library, for it to impose limits on the number of initialisms or atoms it's probably a step too far.
// AddInitialisms adds the passed initialisms to the set of initialisms. | ||
func (g *GoCaseConverter) AddInitialism(initialism ...string) { | ||
g.initialisms = append(g.initialisms, initialism...) | ||
sort.Strings(g.initialisms) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we don't set any limits on how long the lists of initialisms are, maybe we should insert at the right sort index, but probably this is a micro-optimization. also do we care about duplicates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My impression is that users should not be calling AddInitialism
in a loop and instead would just call it once in main
at startup to configure the extra things they want to add.
fdd3c0f
to
925f319
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a couple comments
continue | ||
} | ||
chunkLen := 2 | ||
for chunkLen <= len(word) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shouldn't need to iterate over sizes like this.
The doc for SearchStrings
says:
The return value is the index to insert x if x is not present
... so the longest prefix present will be a nearby predecessor to the index that's actually returned.
I think it would be better to iterate backwards from the returned index until either the offset is either no longer a prefix or you hit the beginning of the list. (assuming that it isn't an exact match)
(If we're going to do exact matches after case-conversion, we might as well use a set, and get O(1) checks for the inner loop.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point -- I think the longest prefix (or an exact match) will necessarily be the entry in the position at one less than the return value of SearchStrings
or 0 if there are no prefixes. I've updated the code to use this logic and it appears to work as expected with the fewest loop iterations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At some point I convinced myself that the longest prefix would be in the previous position, but while writing that comment, I had trouble convincing myself of it again (which is why I was thinking about iterating backwards)
e.g. if we're looking for abcd
and there are entries, a
and aa
, then aa
will be in the preceding position, but we need to grab a
, which would be another position earlier.
If we enforce that entries aren't prefixes of one another, then that simplification works, though. (I think I originally thought about this scheme in the context of Fresnel's topic prefix ACLs, where it makes no sense for them to be prefixes of eachother (but, in hindsight we probably aren't validating that 🫤 ))
} else if initialismOffset < len(g.initialisms) && g.initialisms[initialismOffset] == maybeInitialism { | ||
if i == 0 { | ||
b.WriteString(w) | ||
} else { | ||
b.WriteString(maybeInitialism) | ||
} | ||
} else { | ||
if i == 0 { | ||
b.WriteString(w) | ||
} else { | ||
b.WriteString(cases.Title(language.English, cases.NoLower).String(w)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like you can pull the i == 0
check here up into the outer if
/else
cascade and simplify both branches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While there is some duplication in statements I feel like the original way I wrote it is easier to understand -- but I've updated it nonetheless to simplify the branches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, it might warrant a comment.
It is a little odd to see that i == 0
check at the beginning now.
b16213c
to
cca74ee
Compare
- provide the ability to augment or replace the set of initialisms that are used - provide an EncodeCasingFunc for the Go style - introduce the concept of "atoms" so that Go will not attempt to break up names that we want to be kept together.
cca74ee
to
c275b0a
Compare