網路新聞讀者閱後情感之預測

 

Sentiment Prediction for Internet News Readers

 

 

張昭憲

淡江大學資訊管理系

新北市淡水區英專路151

jschang@mail.tku.edu.tw

 

沈育信

淡江大學資訊管理系

新北市淡水區英專路151

ahsinshen@gmail.com

 

 

摘要

隨著社群網路的蓬勃發展,人們已習慣在網路上發表個人看法,留下無數的數位足跡。若能蒐集這些發言,透過系統化的情感分析(sentiment analysis),便可快速得知群眾的想法或傾向。在吸引群眾參與的網路媒體中,除網路社群與即時通訊平台外,網路新聞亦是非常重要的一環。若能預測大多數讀者閱後之情緒反應,各公私領域(政府、選舉、娛樂、運動等)決策者便可在發布新聞前,事先調整內容,以獲得更多的關注與正面評價。然而,有別於傳統的情感分析,網路新聞因新創詞多、讀者情感變化快速、新聞用語與讀者反應關聯度低等特點,需採用不同的方法來因應。為此,本研究發展了一套有效的讀者情感預測方法,以預測新聞刊登後可能引發之群眾情緒。首先,透過蒐集特定情感分類下的所有新聞,進行N-gram斷詞分析。藉由統計文章中常用字詞排序表,產生各種不同的讀者情感預測模型。為準確預測讀者情感,本研究使用三種相似度計算方式,將待測新聞進行情感分類。為驗證提出方法之有效性,我們蒐集Yahoo奇摩新聞將近一年共193,489筆新聞進行實驗。結果顯示,在相關新聞數量足夠時,本研究提出方法具有良好之預測準確率。其次,當新聞蒐集天數增長時,準確率可獲得明顯提升,但需考量新聞熱度持續時間。此外,當有重大新聞發生時,控制塑模的時間點可獲得更佳的預測結果。上述結果說明本研究發展方法之有效性,若能實際應用於各領域之新聞發布,將可提供有效之決策支援。

關鍵詞: 情感分析、N-gram、資料分類、文字探、網路新聞。


 

 

 

Abstract

In the past few years, Internet social community has been growing rapidly. People get used to post their opinions on the Internet and thus leave tremendous amount of digital footprints. To understand what people are thinking, one of the best ways is to perform systematic sentiment analysis on those posts. In addition to social communities and instant messengers, network news sites are also popular for people. If one can predict the sentiment of people after they read news, he can adjust the news content to attract the spot of people and obtain positive feedbacks. However, different from conventional sentiment analysis, predicting the emotion of the news readers is inherently difficult, which is caused by unknown newly-created words, fast sentiment evolvement and the mismatch of used words in news and  readers’ emotion. Thus, it is required to develop new analysis method to predict the emotion of news readers. To this end, an effective predicting method is proposed in this work to mine the instant public opinions on the Internet. At first, the news under test is segmented by N-gram partition. Then, a frequent word accounting table is constructed to help building the prediction model of readers’ emotion. To enhance the prediction accuracy, three difference similarity measures are used to classify the news under test to a proper user emotion category. To verify the proposed method, 193,489 news are gathered from Yahoo!Kimo news site for experiments. The results show that the proposed method can achieve good prediction accuracy when the number of news is large enough for building emotion model. In addition, the accuracy can be promoted by increasing the days of news gathering. Also, when breaking news comes, one can maintain the accuracy by setting shorter data gathering period. Based on the above results, we believe the proposed method can be used as useful assistant tools for news posting.

Keywords: sentiment analysis, N-gram, classification, text mining, Internet news