We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
大佬好! 我用这份代码提取《大话数据结构》全书,发现得到的关键词大多都含字母,且不大像一个词,如下图。 请问,我该怎么改进呢?
The text was updated successfully, but these errors were encountered:
这里修改正则表达式 @hummingg
Sorry, something went wrong.
似乎问题是THULAC分词错误导致的,碰上英文就歇菜。清华的分词模型对自定义用户词典的支持好像不太好。 准备把THULAC换成jieba试试,可行吗?
跟分词系统和正则匹配规则相关度很大
大佬你好,怎么才能提取全书呢
No branches or pull requests
大佬好!
我用这份代码提取《大话数据结构》全书,发现得到的关键词大多都含字母,且不大像一个词,如下图。
请问,我该怎么改进呢?
The text was updated successfully, but these errors were encountered: