Knowledge source
- wikipedia, use https://dumps.wikimedia.org/ , enwiki is around 12G
- zhihu, use https://github.com/egrcc/zhihu-python , see also https://github.com/simoncos/zhihu-analysis-python
- google/bing/yahoo search results
- khan/coursera/udacity classes
- scholar search results
- stackoverflow and quora answers
- sougou search result for zhihu answers and wechat articles
- …
NLP process
- https://en.wikipedia.org/wiki/Automatic_summarization
- TextRank: Bringing Order into Texts
- Viterbi algorithm
- Wordnet
- Topic model, see also https://github.com/bigartm/bigartm-book
- https://github.com/tensorflow/models/tree/master/research/syntaxnet
- https://github.com/fxsjy/jieba
- https://github.com/yozhao/IKAnalyzer
- ICTCLAS, see https://github.com/xunyuw/ICTCLASDemo
- http://utensil.github.io/tech/2011/10/26/chinses-segment.html
- TF-IDF, see also https://github.com/nkottary/Help.jl
WeChat bot
https://github.com/liuwons/wxBot
Rule-based chat bot
https://github.com/node-webot/webot