NLU -- 基于信息架构的新的NLU理论及研究方法 -- 2.2 结构块

论文链接于此:http://arxiv.org/abs/2010.12789
并在此陆续连载论文中英文双语版本。(文中的中文都是机器翻译的,所以,如果看着别扭请看英文。 但是,我的英文也是为了写论文需要,花两个月时间学习的。 简而言之,言而简之,中英文应该都会有少许BUG,我会持续在ArXiv上更新论文。 实在有问题大家可以给我留言哈。)

II. NEW CLASSIFICATION OF LEXICAL CHUNKS, INFORMATION ARCHITECTURE
第二章 – 词汇的新分类和信息架构

B. Structure Chunk
2.2 – 结构块
Connections between data chunks can be interpreted as various kinds of relations, for example, the representation relation (defining relation), the inclusion relation, the causal relation, and so on. In this section, we will discuss the defining relation and inclusion relation represent by structure chunks: “Be,” “Of,” “’s” and “Have,” and elaborate on two corresponding data reading modes: the defining reading mode and the set reading mode.
数据块之间的连接可以解释为各种关系,例如表示关系(定义关系)、包含关系、因果关系等。在本节中,我们将讨论定义关系和包含关系表示的结构区块:“Be”、“Of”、“Have"和"Have”,并详细阐述两种相应的数据读取模式:定义读取模式和设置读取模式。

  1. “Be”: In dictionaries, “Be” and “Have” are classified as verbs, which is against the verb classification rule introduced in the data chunk section. In natural language usage habits, we can observe that the data chunks after “Be” always are used to represent or define the data chunk before “Be”. Although the author has said that an entity is defined and represented by all its attribute information, but people do not need and impossible to fully describe an entity in the usage of natural language. Usually, people just partially describe an entity by giving one or several of its attributes. In natural language, the data chunk after “Be” is used to describe and define the data chunk before “Be”.

  2. “Be”:在字典中,"Be"和"Have"被归类为动词,这与数据区块部分引入的动词分类规则有关。在自然语言使用习惯中,我们可以观察到"Be"之后的数据区块始终用于表示或定义"Be"之前的数据区块。虽然作者说过,一个实体是由它的所有属性信息定义和表示的,但是人们并不需要也不可能用自然语言来充分描述一个实体。通常,人们只是通过提供一个或多个属性来部分描述实体。在自然语言中,"Be"之后的数据区块用于描述和定义"Be"之前的数据区块。

  3. “Of”, “’s” and “Have”: These words interpret the connections as inclusion relations between data chunks. We assume that there are no equal sets in a memory-graph. So, the inclusion relation of sets can be written as:

  4. “Of”,"'s"和"Have":这些词将连接解释为数据块之间的包含关系。我们假设内存图中没有相等的集。因此,集的包含关系可以编写为:
     A ⸧ B  A has B. or A’s B.
     B ⸦ A  B of A.

Now, we can simulate the process of how brains read the data in memory-graph (in Fig.5) in below two modes, the read-out sentences are list in Table III.
现在,我们可以模拟大脑如何读取内存图中的数据(如图5)在下面的两种模式,读出句子在表三中列出。
NLU -- 基于信息架构的新的NLU理论及研究方法 -- 2.2 结构块
Figure 5. Queen’s memory-graph

a) Defining Reading Mode (DRM): or we can call it the full reading mode, which read whole data chunks from a selected reading chunk. As we can see in Table III, though the article and punctuation are missing in those sentences, we still can roughly get the information they conveyed. In defining reading mode, if the following attribute chunk’s connotation (the data chunk after “be”) can cover the current ASC’s, the ASC can be omitted in expression. It is rare to see this phenomenon in Modern Chinese because the classification coding system of Modern Chinese is more efficient.
a) 定义读取模式 (DRM): 或者我们可以称之为完整读取模式,从选定的读取区块读取整个数据块。正如我们在表三中看到的,虽然这些句子中缺少文章和标点符号,但我们仍然可以大致获得它们传达的信息。在定义读取模式时,如果以下属性区块的内涵("be"之后的数据区块)可以覆盖当前 ASC 的内涵,则可以在表达式中省略 ASC。在现代汉语中很少看到这种现象,因为现代汉语的分类编码系统效率更高。

b) Set Reading Mode (SRM): this is an inclusion relation reading mode, which only reads the sets and the inclusion relation between them from a selected reading chunk. We can find three typical patterns in natural language usage:
b) 集合读取模式 (SRM):这是一个包含关系读取模式,它只从选定的读取区块读取这些集合及其之间的包含关系。我们可以在自然语言使用中找到三种典型的模式:

  • NLU -- 基于信息架构的新的NLU理论及研究方法 -- 2.2 结构块
    “b” is the elements of “ASC”, and b is not a set, thus, only read the “A ⸧ ASC” part from “A ASC bl”, and omit the “b∈ASC” part. E.g., items 1, 2, 3, and 4 in Table III.

  • NLU -- 基于信息架构的新的NLU理论及研究方法 -- 2.2 结构块
    In this case, we select A as the target description object, then, we can choose the “A ⸧ ASC” part, or the “A ⸧ B” part to read-out which is according to the requirement. E.g., items 5 and 6 in Table III.

  • NLU -- 基于信息架构的新的NLU理论及研究方法 -- 2.2 结构块
    Due to there is a virtual connection between A and B, structurally, there is no inclusion relation between A and B, just the virtual abstract relation. This virtual abstract relation can be unidirectional or bidirectional. In this case, we can choose the “ASC ⸦ A” part to read-out. E.g., items 7a, 8a, 9a, and 9c in Table III.

NLU -- 基于信息架构的新的NLU理论及研究方法 -- 2.2 结构块
3) Punctuations and conjunctions: Punctuation and conjunctions segment the information on a larger scale, for the following purposes:
3) 标点符号和连词:标点符号和连词将信息分段在更大范围,用于以下目的:

  • Distinguish the task type of segmented information chunks (e.g., period, question marks, exclamation marks.)
  • 区分分段信息区块的任务类型(例如,期间、问号、感叹号)。
  • Distinguish the processing order of segmented information chunks (e.g., punctuations: commas, semicolons, parentheses.)
  • 区分分段信息区块的处理顺序(例如标点符号:逗号、分号、括号)。
  • Indicate structural relations of segmented information chunks (e.g., conjunctions: and, or, therefore, so.)
  • 指示分段信息区块的结构关系(例如,连词:和,因此,等)。

Except for structure chunks listed in Fig.2, sentence structures and paragraph structures segment information chunks on an even larger scale. The structural relations represented by those structural forms are also more diversified. The sentence structures will elaborate in the task chunk section.
除了图 2 中列出的结构区块外,句子结构和段落结构在更大比例上分割信息区块。这些结构形式所代表的结构关系也更加多样化。句子结构将在任务块部分中详细说明。

上一篇:对话系统rasa学习及使用


下一篇:NLU -- 基于信息架构的新的NLU理论及研究方法 -- 2.1 数据块