Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upGitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
Wrong pinyin tone #1
Comments
|
Thanks for this. But I'm not sure if this is WRONG. Chinese regular tone changes are not written according to https://resources.allsetlearning.com/chinese/pronunciation/Tone_change_rules#Why_Tone_Changes_Are_Not_Written. Instead, I think it's better to distinguish those two: original vs. rule-applied. |
|
cedict.txt |
|
I think some simple rules can help. I'm working on them. I'll be back in hours. |
|
@begeekmyfriend I've added the pronunciation that tone change rules are applied to. Upgrade the library to check it and please let me know if it is okay. Thanks for pointing this out.
|
|
for example: |
|
|
|
more example:
|
|
@Jackiexiao Can you clarify what you mean? It's confusing. The current results for the strings above are like: 有一次 第一次。 十一二岁来到戏校 同年十一月 一九八二年英文版 欧洲统一步伐 吉林省一号工程 一是选拔优秀干部 Which parts are incorrect? |
|
Well it is really confusing when you first learn Chinese on
|
|
According to https://en.wikipedia.org/wiki/Standard_Chinese_phonology#Tone_sandhi
So are the rules 1 and 2 applied word-internally only? In other words, when 一 is followed by a fourth-tone character which belongs to a separate word, 一 is read as first tone, not second tone? |
|
That is right for what you have learned. |
|
give another interesting example:
|
|
|
|
I'm looking at the literature about the tone change rules. Unfortunately, most of them are not clear about the boundaries. But some say the tone change rules MAY work across word boundaries. If my understanding is correct, things are more complicated. If we just think all the tone change rules including third tone, 一, and 不 occur word-internally, things are simple, but I'm not sure if that's true. |
|
I do not think one can do Chinese Pinyin conversion totally correct. There are no rules but conventions. A enoumous pinyin dictionary is indisensable in such issue. That is what we can do about it in all. |
|
Okay. I've updated it to 0.9.9.3. I tried to refine the rules. Feel free to check it. |
|
Hi Kyubyong, Do you consider to use machine learning like CRF to predict the tone change of 一? Thanks. |
|
I have found a well designed Chinese pinyin dictionary from espeak with 21567 single characters plus 36098 compound exceptions (includes 332 added 'yi' and 10720 added 'bu' exceptions, and 9713 extra 2-syllable words for 3rd-tone sandhi blocking). Would you like to replace the original one with it @Kyubyong ? |

Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake of history and digital heritage. The group is 100% composed of volunteers and interested parties, and has expanded into a large amount of related projects for saving online and digital history.

Should be 'yi4 xin1 yi2 yi4'.
See mozillazg/phrase-pinyin-data#20