Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

toHiragana and toKatakana methods are skipping kanji #12

Open
white-miku opened this issue May 14, 2020 · 3 comments
Open

toHiragana and toKatakana methods are skipping kanji #12

white-miku opened this issue May 14, 2020 · 3 comments

Comments

@white-miku
Copy link

white-miku commented May 14, 2020

Hello,
I've created a simple page that represents the example from "Getting started". And I've noticed that in my implementation methods toHiragana and toKatakana() are skipping kanji symbols.
Result:
image

Results:
Words: 庭 で ライム を 育てています。
Readings: ニワデライムヲソダテテイマス。
Pronunciations: ニワデライムヲソダテテイマス。
Lemmas: 庭でライムを育てる。
Parts of speech: noun, postposition, noun, postposition, verb, symbol
Hiragana: 庭でらいむを育てています。
Katakana: 庭デライムヲ育テテイマス。
Romaji: niwa de raimu o sodateteimasu.
Furigana: 庭ニワデライムヲ育ソダテテイマス。

If you want, you may try it yourself: http://jpn.white-miku.me/index.php
At the same time readings, pronunciations, romaji and furigana works perfectly.
Can it be a bug? Or maybe it is MeCab misconfiguration?
Thank you.

@zachleigh
Copy link
Collaborator

Hello @white-miku
Parsing that works fine for me so I don't think its an issue with the package (could be wrong though). Yo can test your Mecab setup by running Mecab in the command line.

$ mecab
庭 で ライム を 育てています。
庭	名詞,一般,*,*,*,*,庭,ニワ,ニワ
で	助詞,格助詞,一般,*,*,*,で,デ,デ
ライム	名詞,一般,*,*,*,*,ライム,ライム,ライム
を	助詞,格助詞,一般,*,*,*,を,ヲ,ヲ
育て	動詞,自立,*,*,一段,連用形,育てる,ソダテ,ソダテ
て	助詞,接続助詞,*,*,*,*,て,テ,テ
い	動詞,非自立,*,*,一段,連用形,いる,イ,イ
ます	助動詞,*,*,*,特殊・マス,基本形,ます,マス,マス
。	記号,句点,*,*,*,*,。,。,。
EOS

If that works, then I'm guessing its your implementation. If you paste you code in I might be able to help out.

@white-miku
Copy link
Author

Hello, @zachleigh
Thank you for the reply.
I've executed the example you provided and output looks similar with yours.

root@White-Miku:~# mecab
庭 で ライム を 育てています。
庭	名詞,一般,*,*,*,*,庭,ニワ,ニワ
で	助詞,格助詞,一般,*,*,*,で,デ,デ
ライム	名詞,一般,*,*,*,*,ライム,ライム,ライム
を	助詞,格助詞,一般,*,*,*,を,ヲ,ヲ
育て	動詞,自立,*,*,一段,連用形,育てる,ソダテ,ソダテ
て	助詞,接続助詞,*,*,*,*,て,テ,テ
い	動詞,非自立,*,*,一段,連用形,いる,イ,イ
ます	助動詞,*,*,*,特殊・マス,基本形,ます,マス,マス
。	記号,句点,*,*,*,*,。,。,。
EOS

My implementation is very close to the example provided in Limelight documentation. It's just adapted for Yii2 framework:

	public function process()
	{
		$this->processed = true;
		$limelight = new Limelight();
		$results = $limelight->parse($this->text);

		$this->words = $results->string('word', ' ');
		$this->readings = $results->string('reading');
		$this->pronunciation = $results->string('pronunciation');
		$this->lemma = $results->string('lemma');
		$this->partOfSpeech = $results->string('partOfSpeech', ', ');
		$this->hiragana = $results->toHiragana()->string('word');
		$this->katakana = $results->toKatakana()->string('word');
		$this->romaji = $results->string('romaji', ' ');
		$this->furigana = $results->string('furigana');
	}

@onrsama
Copy link

onrsama commented Aug 27, 2020

Me too. Function toHiragana and toKatakana escape the kanji. My result same with @white-miku . Function string('furigana') too, the kanji furigana become katakana not hiragana.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants