A Survey of Opinion Mining

From IDSlab

Jump to: navigation, search

Contents


Introduction

Opinion Mining (OM) is a recent discipline at the crossroads of information retrieval and computational linguistics which is concerned not with the topic a document is about, but with opinion it express.


Opinion
private state - a state that is not open to objective observation or verification [Quirk et al., 1985]


There are various problems that is solvable by opinion mining.

  • What is the general opinion on the proposed tax reform?
  • How is popular opinion on the presidential candidates evolving?
  • Which of our customers are unsatified? Why?


The ultimate goal of opinion mining is to extract useful information from reliable amounts of feedback data in automatic or semi-automatic ways and present the information by the most effective way to serve the chosen objectives. Image:OM01.jpg


Main Topics of OM

  • Development of linguistic resources for OM
  • Polarity Classification of text
  • Extraction of opinion expression from text


Related Area

  • Information Retrieval
  • Computational Lingustics
  • Appraisal Analysis
  • Data Mining
    - Feature Exraction
  • Document Summarization
    - Morpheme Analysis
    - Syntactic Analysis

Data Mining? and IR?


기존의 IR이 주로 "문서가 무엇을 말하는 것인가?"에 중점에 두고 이를 찾고자 하였다면,
Opinion Mining 은 "글쓴이가 대상에 대해서 말하고자 하는 의견"에 중점을 두고 이를 찾고자 한다.

그 방법적인 면들은 IR, Data Mining에서 사용하는 것과 유사하나,
기존 접근 방법들이 주로 Data의 통게적 분포를 많이 활용한데 비해,
언어학적 접근 방법들도 많이 활용한다.


다음과 같은 Data Mining의 정의에 따르면 Opinion Mining은 Data Mining의 sub discipline으로 간주되어야 할 것 같다.

"the nontrivial extraction of implicit, previously unknown, and potentially useful information from data"
- W. Frawley and G. Piatetsky-Shapiro and C. Matheus (Fall 1992). "Knowledge Discovery in Databases: An Overview". AI Magazine: pp. 213-228
"the science of extracting useful information from large data sets or databases"
- D. Hand, H. Mannila, P. Smyth (2001). Principles of Data Mining. MIT Press, Cambridge, MA


Terms to identify this discipline in literatures
Sentiment Analysis, Sentiment Classification, Opinion Extraction


  • Opinion Mining은 다음의 분야들과 유사점과 차이점을 가진다.
분야 유사점 차이점
Traditional Data Mining    
Traditional IR  
  • IR은 문서가 무엇에 관한 것인가를 알고자 하며, 무엇에 관한 문서를 찾는 것이 주요과제임
  • OM은 저자가 무엇에 대해서 어떻게 말하고자 하는지 알고자 함
Document Summarization    
Document Classification    

Main Topics of OM

Development of linguistic resources

Goal
development of linguistic resources for OM defining some sentiment-related properties of terms.
Main tasks
Determining term orientation, as in deciding if a given Subjective term has a positive or negative slant
Determining term subjectivity, as in deciding whether a given term has a subjective or an objective(i.e. neutral, or factual) nature
Determining the strength of term attitude (either orientation or subjectivity), as in attributing to terms (real-valued) degrees of posotivity or
Example
good, excellent, best - positivie terms
bad, wrong, worst - negative terms
vertical, yellow, liquid - objective terms


Sentiment Properties

  • Orientation
    - Hatzivassiloglou and McKeown, 1997
    - Turney and Littman, 2003
    - Esuli and Sebastiani, 2005
  • Subjectivity
    - Baroni and Vegnaduzzo, 2004
    - Esuli and Sebastiani, 2006
  • Combinational Approaches
    - Esuli and Sebastiani, 2006
    - Martin and White, 2005
    - Appraisal Theory


Development

Linguistic resources를 구축하는 방법들
대부분이 반 자동적인 방법으로 seed terms를 활용하여 sentiment properties정의를 확장해간다.
association rule을 활용하여 확장하는 방법
WordNet에 있는 synonym, atonym을 활용하여 확장해간다.
가정
synonym은 원 어휘와 동일한 orientation을 가진다.
atonym은 원 어휘와 반대의 orientation을 가진다.
  • Esuli and Sebastiani, 2006
  • Martin and White, 2005
  • Hatzivassiloglou and McKeown, 1997


Linguistics resources

Extraction of opinion expression from text

Goal
Identify expression of opinions in text, and eventually:
- their sentiment properties (e.g. orientation, strength);
- who is expressing them
- their target

Wiebe et al., 2005


Polarity Classification

Classification of a text
Determining the overall sentiment properties of a text
Classification of a sentence or a small unit
Split the text into sentences or small units and determining the sentiment properties of it
Overall sentiment properties of a text can be estimated from doing this

Turney, 2002 Pang et al., 2002 Pang and Lee, 2004 Whitelaw et al., 2005


Auto Classification Methods
  • Term 단위의 Vector Model을 활용하는 자동 분류 방법
가장 원초적인 방법이나, syntax를 제대로 활용하지 못하므로 오류가 많이 발생
Dictionary에 큰 영향을 받지 않음
not, no등의 부정을 처리하기 어려움
SVM, Naive Bayesian
  • term bigram 혹은 n-gram으로 Vector Model을 확장함
not, no등의 단어의 연결에 의해서 나타나는 semantic을 고려하지만, 여전히 syntax를 고려하지 않음
  • Syntax Tree를 이용한 주요 subtree를 추출하여 활용하는 방법
Syntactic Analysis 성능이 떨어지면, semantic을 추출하는데 한계가 있음
그러나, 정확한 semantic을 추출할 수 있으므로 잘 되면 가장 정확한 결과를 얻을 수 있음
document단위가 아니라, sentence 혹은 정의한 최소 sentiment unit단위로 polarity를 분류할 수 있음
잘 정의된 Dictionary가 있어야 함

Applications

Product Review Analysis

Goal
Mining opinions from product reviews and indexing them and visualize the result to the user.
Eventaully feature information gained from reviews can help customers find products they want faster
하나의 리뷰를 분석하기보다는 여러개의 리뷰를 분석하여 상품 혹은 상품의 분류내에서 유용한 정보를 주고자 함
예) 상품 분류내에서 주요 feature를 추추랗고 각 feature별 상품의 고객의 선호도를 점수로 산정하고 이에 따른 정렬 리스트 제공
Operations
Syntactic analysis
Sentiment feature extraction
With or without dictionary
Extracting minimal sentiment unit (text, sentence or subtree of parsed tree)
Allocating sentiment properties to the unit
Indexing the sentiment units
Aggregation and summarization of sentiment expressions
Visualization
Evaluation
Precision and Recall of sentiment unit extraction
Overall time savings for customers


Mining the peanut gallery Dave et al., 2003

Mining opinion features in customer reviews Hu et al., 2004

Mining and summarizing customer reviews Hu et al., 2004

Deeper sentiment analysis using machine translation technology, Kanayama et al. 2004

  • Machine Translation과정에서 생성되는 syntactic parse tree를 이용하고, 각 어휘들의 sentiment properties를 활용
  • Sentiment unit
<sentiment unit> ::= <sentiment> <predicate> <argument>+ <suface>
<sentiment> ::= favorable | unfavorable | question | request
<predicate> ::= <word> <feature>*
<argument> ::= <word> <feature>*
<surface> ::= <string>
  • Operation
Full parsing and top-down tree matching
Extracting informative noun phrase
Disambiguation of sentiment polarity
Aggregation of synonymous expressions
  • Evaluation
Precision - Weak precision, Strong precision
Recall

Extracting product features and opinions from reviews Popescu, Etzioni, 2005

Red Opal Scaffidi et al., 2007

  • Feature extraction
Use noun or noun phrase with lemma-frequency
  • Using scores in review
  • Web application prototype

Others

Conclusion

  • Opinion Mining은 최근에 점점 성숙하고 있는 분야이다.
  • Opinion Mining은 polarity classification보다는 최소의 sentment unit에 대한 추출 및 이를 효율적으로 압축하고, 사용자에게 보여주고자 하는 방향으로 나아가고 있다.
text level analysis -> sentiment unit level
statistical approach -> statistical + linguistic approach
  • Computational linguistics의 비중이 커지고 있음
  • 최근 논문들은 대부분 WordNet을 활용하여 sentiment properties를 확장하거나 활용하고 있으므로 영어 이외의 다른 언어들도 실재적인 활용을 위해서는 WordNet과 같은 어휘의 의미와 관계를 정의한 사전이 필요함
  • 언어에 의존적으로 되고 있음

References

  • Ana-Maria Popescu, Oren Etzioni: Extracting Product Features and Opinions from Reviews. HLT/EMNLP 2005
  • Andrea Esuli, Fabrizio Sebastiani: Determining Term Subjectivity and Term Orientation for Opinion Mining. EACL 2006
  • Appraisal Analysis, http://www.grammatics.com/appraisal/index.html
  • BOIY, Erik, HENS, Pieter, DESCHACHT, K., MOENS, Marie-Francine, Automatic Sentiment Analysis of On-line Text. In Proceedings of the 11th International Conference on Electronic Publishing, Openness in Digital Publishing: Awareness, Discovery & Access, June 13-15, 2007, Vienna Austria
  • Casey Whitelaw, Navendu Garg, Shlomo Argamon: Using appraisal groups for sentiment analysis. CIKM 2005: 625-631
  • Christopher Scaffidi, Kevin Bierhoff, Eric Chang, Mikhael Felker, Herman Ng, Chun Jin: Red Opal: product-feature scoring from reviews. ACM Conference on Electronic Commerce 2007: 182-191
  • Esuli, A. and Sebastiani, F. Determining the semantic orientation of terms through gloss analysis. In Proceedings of CIKM-5, the ACM SIGIR Conference on Information and Knowledge Management, Bremen, DE. 2005
  • Esuli, A. and Sebastiani, F. Determining term subjectivity and term orientation for opinion mining. In Proceedings ACL-06, the 11rd Conference of the European Chapter of the Association for Computational Linguistics, Trento, IT. (2006a).
  • Esuli, A. and Sebastiani, F. Sentiwordnet: A publicly available lexical resource for opinion mining. In Proceedings of LREC-06, the 5th Conference on Language Resources and Evaluation, Genova, IT. (2006b).
  • Andrea Esuli and Fabrizio Sebastiani SentiWordNet: A Publicly Available Lexical Resource for Opinion Mining In Proceedings of LREC-06, 5th Conference on Language Resources and Evaluation, Genova, IT, 2006, pp. 417-422.
  • Gamon, M., A. Aue. 2005. Automatic identification of sentiment vocabulary: exploiting low association with known sentiment terms. In Proceedings of the Workshop on Feature Engineering for Machine Learning in Natural Language Processing at ACL 2005., pages 57-64
  • Hatzivassiloglou, V. and Mckeown, K., 1997. Predicting the Semantic Orientation of Adjectives. In Proc. of 35th ACL/8th EACL.
  • Hiroshi Kanayama, Tetsuya Nasukawa, & Hideo Watanabe: Deeper sentiment analysis using machine translation technology. Coling 2004: 20th International Conference on Computational Linguistics, 23-27 August 2004, University of Geneva, Switzerland, Proceedings; 7pp
  • Jeonghee Yi, Wayne Niblack: Sentiment Mining in WebFountain. ICDE 2005: 1073-1083
  • Kushal Dave, Steve Lawrence, David M. Pennock: Mining the peanut gallery: opinion extraction and semantic classification of product reviews. WWW 2003: 519-528
  • Minqing Hu, Bing Liu: Mining and summarizing customer reviews. KDD 2004: 168-177
  • Minqing Hu, Bing Liu: Mining Opinion Features in Customer Reviews. To appear in AAAI’04, 2004.
  • Shlomo Argamon, Kenneth Bloom, Andrea Esuliy, Fabrizio Sebastiani, "Automatically Determining Attitude Type and Force for Sentiment Analysis"
  • Theresa Wilson, Janyce Wiebe, Paul Hoffmann: Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis. HLT/EMNLP 2005
  • Tony Mullen, Incorporating topic information into sentiment analysis models, 2004 ACL Poster Session, Barcelona

Original Source Files