A Survey of Opinion Mining
From IDSlab
Introduction
Opinion Mining (OM) is a recent discipline at the crossroads of information retrieval and computational linguistics which is concerned not with the topic a document is about, but with opinion it express.
- Opinion
- private state - a state that is not open to objective observation or verification [Quirk et al., 1985]
There are various problems that is solvable by opinion mining.
- What is the general opinion on the proposed tax reform?
- How is popular opinion on the presidential candidates evolving?
- Which of our customers are unsatified? Why?
The ultimate goal of opinion mining is to extract useful information from reliable amounts of feedback data in automatic or semi-automatic ways and present the information by the most effective way to serve the chosen objectives.
Main Topics of OM
- Development of linguistic resources for OM
- Polarity Classification of text
- Extraction of opinion expression from text
Related Area
- Information Retrieval
- Computational Lingustics
- Appraisal Analysis
- Data Mining
- - Feature Exraction
- Document Summarization
- - Morpheme Analysis
- - Syntactic Analysis
기존의 IR이 주로 "문서가 무엇을 말하는 것인가?"에 중점에 두고 이를 찾고자 하였다면,
Opinion Mining 은 "글쓴이가 대상에 대해서 말하고자 하는 의견"에 중점을 두고 이를 찾고자 한다.
그 방법적인 면들은 IR, Data Mining에서 사용하는 것과 유사하나,
기존 접근 방법들이 주로 Data의 통게적 분포를 많이 활용한데 비해,
언어학적 접근 방법들도 많이 활용한다.
다음과 같은 Data Mining의 정의에 따르면 Opinion Mining은 Data Mining의 sub discipline으로 간주되어야 할 것 같다.
- "the nontrivial extraction of implicit, previously unknown, and potentially useful information from data"
- - W. Frawley and G. Piatetsky-Shapiro and C. Matheus (Fall 1992). "Knowledge Discovery in Databases: An Overview". AI Magazine: pp. 213-228
- "the science of extracting useful information from large data sets or databases"
- - D. Hand, H. Mannila, P. Smyth (2001). Principles of Data Mining. MIT Press, Cambridge, MA
- Terms to identify this discipline in literatures
- Sentiment Analysis, Sentiment Classification, Opinion Extraction
- Opinion Mining은 다음의 분야들과 유사점과 차이점을 가진다.
| 분야 | 유사점 | 차이점 |
|---|---|---|
| Traditional Data Mining | ||
| Traditional IR |
| |
| Document Summarization | ||
| Document Classification |
Main Topics of OM
Development of linguistic resources
- Goal
- development of linguistic resources for OM defining some sentiment-related properties of terms.
- Main tasks
- Determining term orientation, as in deciding if a given Subjective term has a positive or negative slant
- Determining term subjectivity, as in deciding whether a given term has a subjective or an objective(i.e. neutral, or factual) nature
- Determining the strength of term attitude (either orientation or subjectivity), as in attributing to terms (real-valued) degrees of posotivity or
- Example
- good, excellent, best - positivie terms
- bad, wrong, worst - negative terms
- vertical, yellow, liquid - objective terms
Sentiment Properties
- Orientation
- - Hatzivassiloglou and McKeown, 1997
- - Turney and Littman, 2003
- - Esuli and Sebastiani, 2005
- Subjectivity
- - Baroni and Vegnaduzzo, 2004
- - Esuli and Sebastiani, 2006
- Combinational Approaches
- - Esuli and Sebastiani, 2006
- - Martin and White, 2005
- - Appraisal Theory
Development
- Linguistic resources를 구축하는 방법들
- 대부분이 반 자동적인 방법으로 seed terms를 활용하여 sentiment properties정의를 확장해간다.
- association rule을 활용하여 확장하는 방법
- WordNet에 있는 synonym, atonym을 활용하여 확장해간다.
- 가정
- synonym은 원 어휘와 동일한 orientation을 가진다.
- atonym은 원 어휘와 반대의 orientation을 가진다.
- Esuli and Sebastiani, 2006
- Martin and White, 2005
- Hatzivassiloglou and McKeown, 1997
Linguistics resources
Extraction of opinion expression from text
- Goal
- Identify expression of opinions in text, and eventually:
- - their sentiment properties (e.g. orientation, strength);
- - who is expressing them
- - their target
Wiebe et al., 2005
Polarity Classification
- Classification of a text
- Determining the overall sentiment properties of a text
- Classification of a sentence or a small unit
- Split the text into sentences or small units and determining the sentiment properties of it
- Overall sentiment properties of a text can be estimated from doing this
Turney, 2002 Pang et al., 2002 Pang and Lee, 2004 Whitelaw et al., 2005
- Auto Classification Methods
- Term 단위의 Vector Model을 활용하는 자동 분류 방법
- 가장 원초적인 방법이나, syntax를 제대로 활용하지 못하므로 오류가 많이 발생
- Dictionary에 큰 영향을 받지 않음
- not, no등의 부정을 처리하기 어려움
- SVM, Naive Bayesian
- term bigram 혹은 n-gram으로 Vector Model을 확장함
- not, no등의 단어의 연결에 의해서 나타나는 semantic을 고려하지만, 여전히 syntax를 고려하지 않음
- Syntax Tree를 이용한 주요 subtree를 추출하여 활용하는 방법
- Syntactic Analysis 성능이 떨어지면, semantic을 추출하는데 한계가 있음
- 그러나, 정확한 semantic을 추출할 수 있으므로 잘 되면 가장 정확한 결과를 얻을 수 있음
- document단위가 아니라, sentence 혹은 정의한 최소 sentiment unit단위로 polarity를 분류할 수 있음
- 잘 정의된 Dictionary가 있어야 함
Applications
Product Review Analysis
- Goal
- Mining opinions from product reviews and indexing them and visualize the result to the user.
- Eventaully feature information gained from reviews can help customers find products they want faster
- 하나의 리뷰를 분석하기보다는 여러개의 리뷰를 분석하여 상품 혹은 상품의 분류내에서 유용한 정보를 주고자 함
- 예) 상품 분류내에서 주요 feature를 추추랗고 각 feature별 상품의 고객의 선호도를 점수로 산정하고 이에 따른 정렬 리스트 제공
- Operations
- Syntactic analysis
- Sentiment feature extraction
- With or without dictionary
- Extracting minimal sentiment unit (text, sentence or subtree of parsed tree)
- Allocating sentiment properties to the unit
- Indexing the sentiment units
- Aggregation and summarization of sentiment expressions
- Visualization
- Evaluation
- Precision and Recall of sentiment unit extraction
- Overall time savings for customers
Mining the peanut gallery Dave et al., 2003
Mining opinion features in customer reviews Hu et al., 2004
Mining and summarizing customer reviews Hu et al., 2004
Deeper sentiment analysis using machine translation technology, Kanayama et al. 2004
- Machine Translation과정에서 생성되는 syntactic parse tree를 이용하고, 각 어휘들의 sentiment properties를 활용
- Sentiment unit
<sentiment unit> ::= <sentiment> <predicate> <argument>+ <suface> <sentiment> ::= favorable | unfavorable | question | request <predicate> ::= <word> <feature>* <argument> ::= <word> <feature>* <surface> ::= <string>
- Operation
- Full parsing and top-down tree matching
- Extracting informative noun phrase
- Disambiguation of sentiment polarity
- Aggregation of synonymous expressions
- Evaluation
- Precision - Weak precision, Strong precision
- Recall
Extracting product features and opinions from reviews Popescu, Etzioni, 2005
Red Opal Scaffidi et al., 2007
- Feature extraction
- Use noun or noun phrase with lemma-frequency
- Using scores in review
- Web application prototype
Others
Conclusion
- Opinion Mining은 최근에 점점 성숙하고 있는 분야이다.
- Opinion Mining은 polarity classification보다는 최소의 sentment unit에 대한 추출 및 이를 효율적으로 압축하고, 사용자에게 보여주고자 하는 방향으로 나아가고 있다.
- text level analysis -> sentiment unit level
- statistical approach -> statistical + linguistic approach
- Computational linguistics의 비중이 커지고 있음
- 최근 논문들은 대부분 WordNet을 활용하여 sentiment properties를 확장하거나 활용하고 있으므로 영어 이외의 다른 언어들도 실재적인 활용을 위해서는 WordNet과 같은 어휘의 의미와 관계를 정의한 사전이 필요함
- 언어에 의존적으로 되고 있음
References
- Ana-Maria Popescu, Oren Etzioni: Extracting Product Features and Opinions from Reviews. HLT/EMNLP 2005
- Andrea Esuli, Fabrizio Sebastiani: Determining Term Subjectivity and Term Orientation for Opinion Mining. EACL 2006
- Appraisal Analysis, http://www.grammatics.com/appraisal/index.html
- BOIY, Erik, HENS, Pieter, DESCHACHT, K., MOENS, Marie-Francine, Automatic Sentiment Analysis of On-line Text. In Proceedings of the 11th International Conference on Electronic Publishing, Openness in Digital Publishing: Awareness, Discovery & Access, June 13-15, 2007, Vienna Austria
- Casey Whitelaw, Navendu Garg, Shlomo Argamon: Using appraisal groups for sentiment analysis. CIKM 2005: 625-631
- Christopher Scaffidi, Kevin Bierhoff, Eric Chang, Mikhael Felker, Herman Ng, Chun Jin: Red Opal: product-feature scoring from reviews. ACM Conference on Electronic Commerce 2007: 182-191
- Esuli, A. and Sebastiani, F. Determining the semantic orientation of terms through gloss analysis. In Proceedings of CIKM-5, the ACM SIGIR Conference on Information and Knowledge Management, Bremen, DE. 2005
- Esuli, A. and Sebastiani, F. Determining term subjectivity and term orientation for opinion mining. In Proceedings ACL-06, the 11rd Conference of the European Chapter of the Association for Computational Linguistics, Trento, IT. (2006a).
- Esuli, A. and Sebastiani, F. Sentiwordnet: A publicly available lexical resource for opinion mining. In Proceedings of LREC-06, the 5th Conference on Language Resources and Evaluation, Genova, IT. (2006b).
- Andrea Esuli and Fabrizio Sebastiani SentiWordNet: A Publicly Available Lexical Resource for Opinion Mining In Proceedings of LREC-06, 5th Conference on Language Resources and Evaluation, Genova, IT, 2006, pp. 417-422.
- Gamon, M., A. Aue. 2005. Automatic identification of sentiment vocabulary: exploiting low association with known sentiment terms. In Proceedings of the Workshop on Feature Engineering for Machine Learning in Natural Language Processing at ACL 2005., pages 57-64
- Hatzivassiloglou, V. and Mckeown, K., 1997. Predicting the Semantic Orientation of Adjectives. In Proc. of 35th ACL/8th EACL.
- Hiroshi Kanayama, Tetsuya Nasukawa, & Hideo Watanabe: Deeper sentiment analysis using machine translation technology. Coling 2004: 20th International Conference on Computational Linguistics, 23-27 August 2004, University of Geneva, Switzerland, Proceedings; 7pp
- Jeonghee Yi, Wayne Niblack: Sentiment Mining in WebFountain. ICDE 2005: 1073-1083
- Kushal Dave, Steve Lawrence, David M. Pennock: Mining the peanut gallery: opinion extraction and semantic classification of product reviews. WWW 2003: 519-528
- Minqing Hu, Bing Liu: Mining and summarizing customer reviews. KDD 2004: 168-177
- Minqing Hu, Bing Liu: Mining Opinion Features in Customer Reviews. To appear in AAAI’04, 2004.
- Shlomo Argamon, Kenneth Bloom, Andrea Esuliy, Fabrizio Sebastiani, "Automatically Determining Attitude Type and Force for Sentiment Analysis"
- Theresa Wilson, Janyce Wiebe, Paul Hoffmann: Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis. HLT/EMNLP 2005
- Tony Mullen, Incorporating topic information into sentiment analysis models, 2004 ACL Poster Session, Barcelona
Original Source Files
- Paper Media:A Survey of Opinion Mining.doc
- Presentation Media:A Survey of Opinion Mining.ppt