research topics in NLP
-
Optical character recognition (OCR) 光学字符识别
Given an image representing print test, determine the corresponding text
给定一个代表打印测试的图像,确定相应的文本 -
Questing answering 问题回答
Given a human-languagge question,determine its answer. Typical questions have a specific right answer(such as :“What is th capital of Canada?”), but sometimes openended questions are also considered (such as “What is the meaning of life?”). Recent works have looked at even moew complex questions.
给定一个人为语言的问题,请确定其答案。典型的问题有特定的正确答案(例如:“加拿大的首都是什么?”),但有时也会考虑开放性的问题(例如“生活的意义是什么?”)。最近的作品甚至看了一些复杂的问题
3. ### Recognizing Textual entailment 认识文字蕴含
Given two text fragments, determine if one being true entails the other,entails the other’s negation,or allows the other to be either true or false.
给定两个文本片段,请确定一个为真是否包含另一个,是否需要另一个否定,或允许另一个为真或假。
-
Relationship extraction 关系提取
Given a chunk of text, identify the relationships among named entities(e.g who is married to whom).
给定大量文本,请确定命名实体之间的关系(例如,与谁结婚的人)。 -
Sentiment analysis 情绪分析
Extract subjective information usually from a set of documents,often using online reviews to determine “polarity” about specific objects, It is espeially useful for identifying trends of public opinoin in the social media, for the purpose of marketing.
通常从一组文档中提取主观信息,通常使用在线评论来确定有关特定对象的“极性”。这对于出于营销目的在社交媒体中识别公共视黄质的趋势特别有用。 -
Topic segmentation and recognition 主题细分与识别
Given a chunk of text, separate it into segments each of which is devoted to a topic,and identify the topic of the segment.
给定一小段文本,将其分成多个段,每个段专门用于一个主题,并标识该段的主题 -
Word sense disambiguation 词义消歧
Many words have more than one meaning; we have to select the meaning which makes the most sense in contect. For this problem, we are typically given a list of words and associated word senses, e.g from a dictionary or from an online resource such as WordNet
许多单词具有不止一种含义;我们必须选择最有意义的含义。对于此问题,通常会给我们提供单词列表和相关的词义,例如从字典或在线资源(如WordNet)中获得 -
Automatic summary 自动汇总
Produce a readable summary of a chunk of text. Often used to provide summaries of text of a known type,such as articles in the financial section of a newspaper.
产生一段可读的文本摘要。通常用于提供已知类型的文本摘要,例如报纸财务版块中的文章。 -
Coreference resolution 共指解析
Given a sentence or larger chunk of text, determine which words(“mentions”) refer to the same objects(“entities”),Anaphora resolution is a specific example of this task, and is specifically concerned with matching up pronouns with the nouns or names to which they refer. The more general task of coreference resolution also includes identifyinf so-called"bridging relationshios" involving referring expressions .For example,in a sentence such as “He entered John’s house through the front door”.“the front door” is a referring expression and tehe bridging relationship to be identified is the fact that the door being rederred to is the front door of John’s house (rather than os some other structure that mighe also be referred to).
给定一个句子或更大的文本块,确定哪些单词(“提及”)指代相同的对象(“实体”),回指解析是此任务的一个特定示例,并且特别涉及将代词与名词或他们所引用的名称。共指解析的更一般任务还包括识别涉及引用表达的所谓“桥接关系”。例如,在诸如“他通过前门进入约翰家中”之类的句子中。“前门”是引用表达,要确定的桥接关系是这样的事实,即被重定向到的门是约翰家的前门(而不是米格尔还提到的其他结构)。 -
Discourse analysis 话语分析
This rubric includes anumber of related tasks. One task is identifying the discourse structure of connected text, i.e the nature of the discourse relationships between sentences(e.g elaboration,explanation,contrast). Another possible task is recognizing and classifying the speech ects in a chunk of text(e.g yes-no question,content question , statement,assertion,etc).
该主题包括许多相关任务。一项任务是确定关联文本的语篇结构,即句子之间语篇关系的性质(例如,阐述,解释,对比)。另一个可能的任务是在一段文本中识别语音分类并将其分类(例如,是-否问题,内容问题,陈述,断言等)。 -
Speech recognition 语音识别
Given a sound clip of a person or people speaking, determine the textural representation of the speech. This is the opposite of text to speech and is one of the extremely difficult problems colloquially termed “Ai-complete” (see above). In natural speech there are hardly any pauses between successive(see below). Note also that in most spoken languages, the sounds representing successive letters blend into each other in a process termed coaticulation, so the conversion of the analog signal to distrete chatacters can be a very difficult process.
给定一个人讲话的声音片段,确定语音的纹理表示。这与文字与语音相反,是通称为“ Ai完全”(见上文)的极为困难的问题之一。在自然语言中,连续之间几乎没有任何停顿(见下文)。还要注意的是,在大多数口语中,代表连续字母的声音在称为涂复的过程中相互融合,因此,将模拟信号转换为离散字符可能是一个非常困难的过程。 -
Speech segmentation 语音分割
Given a sound clip of a person or people speaking, separate it into words, A subtasks of speech recognition and typically grouped with it
给定一个人说话的声音片段,将其分成单词,语音识别的子任务,通常与之分组
-
text to speech 文字转语音