如何使用spacy从文本中提取名词短语?
我不是指词性标签.
在文档中,我找不到任何关于名词短语或常规解析树的内容.
解决方法:
如果你想要基本的NP,即没有协调的NP,介词短语或相关子句,你可以在Doc和Span对象上使用noun_chunks迭代器:
>>> from spacy.en import English
>>> nlp = English()
>>> doc = nlp(u'The cat and the dog sleep in the basket near the door.')
>>> for np in doc.noun_chunks:
>>> np.text
u'The cat'
u'the dog'
u'the basket'
u'the door'
如果你需要别的东西,最好的方法是迭代句子的单词并考虑句法上下文来确定这个单词是否支配你想要的短语类型.如果是,则产生其子树:
from spacy.symbols import *
np_labels = set([nsubj, nsubjpass, dobj, iobj, pobj]) # Probably others too
def iter_nps(doc):
for word in doc:
if word.dep in np_labels:
yield word.subtree