Title:
Developing Simplified Chinese Psychological Linguistic Analysis Dictionary for Microblog
为微博开发简体中文心理语言分析词典
Keywords:
LIWC,
Traditional Chinese, 繁体中文
Simplified Chinese, 简体中文
microblog, 微博
text analysis. 文本分析
Abstract:
The words that people use could reveal their emotional states, intentions, thinking styles, individual differences, etc. LIWC (Linguistic Inquiry and Word Count) has been widely used for psychological text analysis, and its dictionary is the core. The Traditional Chinese version of LIWC dictionary has been released, which is a translation of LIWC English dictionary. However, Simplified Chinese which is the world’s most widely used language has subtle differences with Traditional Chinese. Furthermore, both English LIWC dictionary and Traditional Chinese version dictionary were both developed for relatively formal text. Microblog has become more and more popular in China nowadays. Original LIWC dictionaries take less consideration on microblog popular words, which makes it less applicable for text analysis on microblog. In this study, a Simplified Chinese LIWC dictionary is established according to LIWC categories. After translating Traditional Chinese dictionary into Simplified Chinese, five thousand words most frequently used in microblog are added into the dictionary. Four graduate students of psychology rated whether each word belonged in a category. The reliability and validity of Simplified Chinese
LIWC dictionary were tested by these four judges. This new dictionary could contribute to all the text analysis on microblog in future.
人们使用的词语可以揭示他们的情绪状态、意图、思维方式、个体差异等。 语言查询和词数统计(LIWC)被广泛应用于心理语篇分析,词典是其核心。《LIWC词典》的繁体中文版已经发行,它是LIWC英语词典的翻译。然而,作为世界上使用最广泛的语言,简体中文与繁体中文有着微妙的区别。此外,英语LIWC词典和繁体中文词典都是为相对正式的文本而开发的。如今微博在中国越来越流行。原有的LIWC词典对微博流行词的考虑较少,不适合微博文本分析。本研究根据LIWC的分类,建立了一个简体中文LIWC词典。在将繁体中文词典翻译成简体中文后,微博上最常用的五千个单词被加入词典。四名心理学研究生对每个词是否属于一个范畴进行了评分。通过这四位评委对《简化汉语LIWC词典》的信度和效度进行了检验。这部新词典将有助于今后微博上所有的文本分析。
Conclusion:
Percentage of words captured by the SCLIWC dictionary indicates that words usage in internet environment like Sina microblog are much more diverse compared to formal text materials[9, 14]. Percentage of words captured by the SCMBWC dictionary improves above 10 percent, especially captured more words in category of psychological processes and its sub categories, such as social processes, affective
processes, cognitive processes and etc. Internal Reliability and External Validity of those two dictionaries are well guaranteed by four groups of judges. SCLIWC bridges the gap between LIWC software and Simplified Chinese. What is more, SCMBWC suggests a promising approach for further text analysis of Chinese Simplified in various internet environments.
SCLIWC词典所捕获单词的百分比表明,新浪微博等网络环境下的词汇使用比正式文本材料[9, 14]更加多样化。SCMBWC词典收录词的百分比提高了10%以上,尤其是在心理过程类及其子类中,如社会过程、情感过程等,捕捉到了更多的词汇,这两部词典的内部信度和外部效度都得到了四组评委的充分保证。SCLIWC弥补了LIWC软件与简体中文之间的差距。此外,SCMBWC为进一步分析各种网络环境下的简体中文文本提供了一种很有前景的方法。