Intrinsic Evaluation of Grammatical Information within Word Embeddings

Information

Title Intrinsic Evaluation of Grammatical Information within Word Embeddings

Authors

Daniel Edmiston, Taeuk Kim

Year 2019 / 09

Keywords natural language processing, intrinsic evaluation, word embedding

Publication Type International Conference

Publication Pacific Asia Conference on Language, Information and Computation (PACLIC 33)

Link url

Abstract

This work presents a proof-of-concept study for a framework of intrinsic evaluation of continuous embeddings as used in NLP tasks. This evaluation method compares the geometry of such embeddings with ground-truth embeddings in a linguistically-inspired, discrete feature space. Using model distillation (Hinton et al., 2015) as a means of extracting morphological information from models with no explicit morphological awareness (e.g. word-atomic models), we train multiple learner networks which do model morpheme composition so as to compare the amount of grammatical information different models capture. We use Korean affixes as a case-study, as they encode multiple types of linguistic information (phonological, syntactic, semantic, and pragmatic), and allow us to investigate specific types of linguistic generalizations models may or may not be sensitive to.