Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handle apostrophes #14

Open
lichtr opened this issue Jan 20, 2014 · 7 comments
Open

handle apostrophes #14

lichtr opened this issue Jan 20, 2014 · 7 comments

Comments

@lichtr
Copy link
Member

lichtr commented Jan 20, 2014

Lewis and Short Dict provides only the 7 following cases, where ' is definitely an apostrophe:
po' (post)
potin' (potisne)
satin' (satisne)
scin' (scisne)
tun' (tune)
vin' (visne)
min' (mihine)
I think those could be handled manually.
I don't think there's a good reason to mark the n' as enclitic. Your opinion on that?

@LFDM
Copy link
Member

LFDM commented Jan 20, 2014

Not sure right now what's the best solution for the parsing process.

The issue of apostrophes was raised in #6, where @balmas proposed to treat apostrophes as words - which is true when they are in fact word-like, but that needs to be determined separately as ' are for sure used as direct speech delimiters as well.
The apostrophe is easily detected when inside of a word (an example can be found in #8), But it's much trickier when a word ends with an apostrophe - in such a case it's not immediately safe to assume that ' is ending a direct speech (or an other form of quotation) or is indeed an apostrophe.

@lichtr
Copy link
Member Author

lichtr commented Jan 20, 2014

as long as we don't face more cases like that above, i'd propose the same procedure that we apply for the abbreviated names and the following dot.

@LFDM
Copy link
Member

LFDM commented Jan 20, 2014

What was that procedure? :)

@lichtr
Copy link
Member Author

lichtr commented Jan 20, 2014

tokenizer.rb line 126 find_abbreviations_and_join_strings

@LFDM
Copy link
Member

LFDM commented Jan 20, 2014

Ah, that you mean - 👍 for that.

Not sure how apostrophes are used in Latin (or better: by Latin editors). Is it to be expected that we encounter elided syllables with a ' in verses?

@lichtr
Copy link
Member Author

lichtr commented Jan 20, 2014

was looking for them, only found fabulatu's; don't think that there's a heavy use of apostrophes, so this approach could work; but i'll keep my eyes open...

@LFDM
Copy link
Member

LFDM commented Jan 20, 2014

Ok, please implement the magnificent seven if you have the time.
How the parser handles fabulatu's is postponed, unimportant edge case.
The tokenizer should treat it as one token for now.
That means it must not be spaced when the tokenizer starts working (currently it would create fabulatu ' s from fabulatu's)
but that should be easy, ' is not a punctuation if a-z are right behind it (another punctuation might be there, a dot ending a sentence e.g.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants