作者:野指针呀
地址:https://blog.csdn.net/mjj1024/article/details/105618784
在jupyter notebook上运行代码时:
1 import nltk 2 paragraph = "i am a good boy ! are you ok? hahaha i am fine" 3 words_list = nltk.word_tokenize(paragraph) 4 print(words_list)
出现错误:
1 ModuleNotFoundError Traceback (most recent call last) 2 <ipython-input-4-55bf564de021> in <module> 3 ----> 1 import nltk 4 2 paragraph = "i am a good boy ! are you ok? hahaha i am fine" 5 3 words_list = nltk.word_tokenize(paragraph) 6 4 print(words_list) 7 8 ModuleNotFoundError: No module named 'nltk'
显示没有nltk这个模块。
然后在cmd和conda里分别运行pip list和conda list,发现nltk都已经安装好。之后搜了一个博客,才知道还要下载nltk语料包。
尝试了一下自动下载:
在idle 3.7(按照自己电脑上的版本)中运行代码:
1 >>> import nltk 2 >>> nltk.download()
之后看博客说是把NlTK Downloder里的Server Index的内容:
https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml
换成:http://www.nltk.org/nltk_data/
点击下载时又出现错误相同错误:getaddrinfo failed
搜了一堆博客尝试后无果,没办法只能乖乖手动安装nltk。
手动安装有点麻烦,但也没办法。
不过有看到一个大佬写了个代码装的,看起来很厉害:
我自己是手动下载,然后解压。
在github上下载语料库:https://github.com/nltk/nltk_data
下载之后把里面的packages文件名改成nltk_data(里面的压缩包都要解压),然后放在该放的路径下。
查看该放的路径:可以先运行一段代码(在idle中运行),错误提示里会给出路径,比如下面:
1 >>> import nltk 2 >>> paragraph = "i am a good boy ! are you ok? hahaha i am fine" 3 >>> words_list = nltk.word_tokenize(paragraph) 4 Traceback (most recent call last): 5 File "<pyshell#5>", line 1, in <module> 6 words_list = nltk.word_tokenize(paragraph) 7 File "C:\Program Files\Python37\lib\site-packages\nltk\tokenize\__init__.py", line 144, in word_tokenize 8 sentences = [text] if preserve_line else sent_tokenize(text, language) 9 File "C:\Program Files\Python37\lib\site-packages\nltk\tokenize\__init__.py", line 105, in sent_tokenize 10 tokenizer = load('tokenizers/punkt/{0}.pickle'.format(language)) 11 File "C:\Program Files\Python37\lib\site-packages\nltk\data.py", line 868, in load 12 opened_resource = _open(resource_url) 13 File "C:\Program Files\Python37\lib\site-packages\nltk\data.py", line 993, in _open 14 return find(path_, path + ['']).open() 15 File "C:\Program Files\Python37\lib\site-packages\nltk\data.py", line 701, in find 16 raise LookupError(resource_not_found) 17 LookupError: 18 ********************************************************************** 19 Resource [93mpunkt[0m not found. 20 Please use the NLTK Downloader to obtain the resource: 21 22 [31m>>> import nltk 23 >>> nltk.download('punkt') 24 [0m 25 For more information see: https://www.nltk.org/data.html 26 27 Attempted to load [93mtokenizers/punkt/english.pickle[0m 28 29 Searched in: 30 - 'C:\\Users\\马静静/nltk_data' 31 - 'C:\\Program Files\\Python37\\nltk_data' 32 - 'C:\\Program Files\\Python37\\share\\nltk_data' 33 - 'C:\\Program Files\\Python37\\lib\\nltk_data' 34 - 'C:\\Users\\马静静\\AppData\\Roaming\\nltk_data' 35 - 'C:\\nltk_data' 36 - 'D:\\nltk_data' 37 - 'E:\\nltk_data' 38 - '' 39 **********************************************************************
这一部分就是可以放nltk_data的路径:
1 Searched in: 2 - 'C:\\Users\\马静静/nltk_data' 3 - 'C:\\Program Files\\Python37\\nltk_data' 4 - 'C:\\Program Files\\Python37\\share\\nltk_data' 5 - 'C:\\Program Files\\Python37\\lib\\nltk_data' 6 - 'C:\\Users\\马静静\\AppData\\Roaming\\nltk_data' 7 - 'C:\\nltk_data' 8 - 'D:\\nltk_data' 9 - 'E:\\nltk_data' 10 - ''
我解压完后,直接把文件夹 ( packages文件名改成nltk_data的文件夹 ) 放在C:\\Users\\马静静\\下。
再运行代码就ok了。
1 >>> import nltk 2 >>> paragraph = "i am a good boy ! are you ok? hahaha i am fine" 3 >>> words_list = nltk.word_tokenize(paragraph) 4 >>> print(words_list) 5 ['i', 'am', 'a', 'good', 'boy', '!', 'are', 'you', 'ok', '?', 'hahaha', 'i', 'am', 'fine']
但是,又发现在idle中和pycharm中运行都没有问题,在jupyter notebook中运行依然是找不到模块。
又搜了一堆博客,大概明白了是因为jupyter和python下载包的路径不同,导致在jupyter运行时找不到包。
尝试过重新install nltk、修改环境变量等操作依然无果,后来看到一个博客(python Jupyter不能导入外部包----解决方案)找到了解决方法,感谢这位小可爱。