我已经开始使用Spacy.io的NLP包,并研究了一些介绍以及一些示例代码.
我对spacy.en.English.matcher.add方法很感兴趣-添加自己的实体的格式是什么?在解释了基本格式的同时,似乎还有其他功能可用.我添加的实体可以链接到dbpedia / wikipedia条目还是其他外部链接?
这是Spacy匹配器示例中的代码:
https://github.com/honnibal/spaCy/blob/master/examples/matcher_example.py
nlp.matcher.add(
"GoogleNow", # Entity ID: Not really used at the moment.
"PRODUCT", # Entity type: should be one of the types in the NER data
{"wiki_en": "Google_Now"}, # Arbitrary attributes. Currently unused.
[ # List of patterns that can be Surface Forms of the entity
# This Surface Form matches "Google Now", verbatim
[ # Each Surface Form is a list of Token Specifiers.
{ # This Token Specifier matches tokens whose orth field is "Google"
ORTH: "Google"
},
{ # This Token Specifier matches tokens whose orth field is "Now"
ORTH: "Now"
}
],
[ # This Surface Form matches "google now", verbatim, and requires
# "google" to have the NNP tag. This helps prevent the pattern from
# matching cases like "I will google now to look up the time"
{
ORTH: "google",
TAG: "NNP"
},
{
ORTH: "now"
}
]
]
)
感谢您的时间.
解决方法:
当然,您可以将它们链接起来,但是据我所知,spaCy并不是开箱即用的.您可以设置自己的类别类型(例如,SINGER而不是PRODUCT;请注意,该类别当前已损坏,您可能需要使用v0.93),然后在其中填充DBpedia条目(例如,David Bowie而不是Google Now).完成此操作后,您可以在实体及其URL之间使用映射.如该评论所示,可能会自动进行最后一个链接
{"wiki_en": "Google_Now"}, # Arbitrary attributes. Currently unused.