自然语言处理术语

定义来自*

Tokenization is the process of breaking a stream of text up into words, phrases, symbols, or other meaningful elements called tokens. The list of tokens becomes input for further processing such as parsing or text mining.

 

Parsing or syntactic analysis is the process of analysing a string of symbols, either in natural language or in computer languages, according to the rules of a formal grammar. Same meanning with Part of Speech(POS).

 

Text segmentation is the process of dividing written text into meaningful units, such as wordssentences, or topics.

 

In computer sciencelexical analysis is the process of converting a sequence of characters into a sequence of tokens.

自然语言处理术语

上一篇:leetcode--Evaluate Reverse Polish Notation


下一篇:ulimit -n修改单进程可打开最大文件数目