论文链接于此:http://arxiv.org/abs/2010.12789
并在此陆续连载论文中英文双语版本。
INTRODUCTION (3) - The relation between Natural Language and Information
介绍3 – 自然语言与信息之间的关系
Let’s think again about the relation between natural language and information. Natural language is a tool that human beings use to communicate with the outside world; it is also one of the carriers of information. Information constantly changes its carriers (or forms) in the process of transmission and processing:
- in real-world, information exists in the forms of electromagnetic
wave, chemical molecules, and ions, the kinetic energy of air, etc;
human beings perceive this information and transform them into
biochemical and bioelectricity signals then process them in brains
[1]; - when people communicate with outside world information is then
transformed into the form of natural language, body gesture, body
movements, etc; - in CPU, information is processed in form of binary
code.
Obviously, the kinds and density of information contained in natural language are much higher than in other forms. Natural language has highly abstracted and conceptualized the information. In the next section, we will take the first step of NLU by classifying lexicons according to the information they contain.
让我们再想想自然语言和信息之间的关系。自然语言是人类与外界交流的工具;它也是信息的载体之一。信息在传输和处理过程中不断变化其载体(或形式): - 在现实世界中,信息以电磁波、化学分子和离子、空气的动能等形式存在;人类感知这些信息,并将它们转化为生化和生物电信号,然后在大脑中处理它们[1];
- 当人们与外界交流时,信息就转化为自然语言、身体姿态、身体动作等形式;
- 在 CPU 中,信息以二进制代码的形式处理。
显然,自然语言中所含信息的种类和密度远远高于其他形式。自然语言对信息进行了高度抽象和概念化。下一节,我们将采取NLU的第一步,根据词汇中所包含的信息的不同对词汇进行分类。