handle apostrophes #14

lichtr · 2014-01-20T16:00:05Z

Lewis and Short Dict provides only the 7 following cases, where ' is definitely an apostrophe:
po' (post)
potin' (potisne)
satin' (satisne)
scin' (scisne)
tun' (tune)
vin' (visne)
min' (mihine)
I think those could be handled manually.
I don't think there's a good reason to mark the n' as enclitic. Your opinion on that?

LFDM · 2014-01-20T19:09:49Z

Not sure right now what's the best solution for the parsing process.

The issue of apostrophes was raised in #6, where @balmas proposed to treat apostrophes as words - which is true when they are in fact word-like, but that needs to be determined separately as ' are for sure used as direct speech delimiters as well.
The apostrophe is easily detected when inside of a word (an example can be found in #8), But it's much trickier when a word ends with an apostrophe - in such a case it's not immediately safe to assume that ' is ending a direct speech (or an other form of quotation) or is indeed an apostrophe.

lichtr · 2014-01-20T19:45:53Z

as long as we don't face more cases like that above, i'd propose the same procedure that we apply for the abbreviated names and the following dot.

LFDM · 2014-01-20T19:48:01Z

What was that procedure? :)

lichtr · 2014-01-20T19:50:46Z

tokenizer.rb line 126 find_abbreviations_and_join_strings

LFDM · 2014-01-20T20:06:32Z

Ah, that you mean - 👍 for that.

Not sure how apostrophes are used in Latin (or better: by Latin editors). Is it to be expected that we encounter elided syllables with a ' in verses?

lichtr · 2014-01-20T20:17:45Z

was looking for them, only found fabulatu's; don't think that there's a heavy use of apostrophes, so this approach could work; but i'll keep my eyes open...

LFDM · 2014-01-20T21:45:57Z

Ok, please implement the magnificent seven if you have the time.
How the parser handles fabulatu's is postponed, unimportant edge case.
The tokenizer should treat it as one token for now.
That means it must not be spaced when the tokenizer starts working (currently it would create fabulatu ' s from fabulatu's)
but that should be easy, ' is not a punctuation if a-z are right behind it (another punctuation might be there, a dot ending a sentence e.g.)

LFDM mentioned this issue Jan 20, 2014

how to handle abbreviations like "fabulatu's" #8

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

handle apostrophes #14

handle apostrophes #14

lichtr commented Jan 20, 2014

LFDM commented Jan 20, 2014

lichtr commented Jan 20, 2014

LFDM commented Jan 20, 2014

lichtr commented Jan 20, 2014

LFDM commented Jan 20, 2014

lichtr commented Jan 20, 2014

LFDM commented Jan 20, 2014

handle apostrophes #14

handle apostrophes #14

Comments

lichtr commented Jan 20, 2014

LFDM commented Jan 20, 2014

lichtr commented Jan 20, 2014

LFDM commented Jan 20, 2014

lichtr commented Jan 20, 2014

LFDM commented Jan 20, 2014

lichtr commented Jan 20, 2014

LFDM commented Jan 20, 2014