Tutorial on neural language models

I gave today an extended tutorial on neural probabilistic language models and their applications to distributional semantics (slides available here). The talk took place at University College London (UCL), as part of the South England Statistical NLP Meetup @ UCL, which is organized by Prof. Sebastian Riedel, the Lecturer who is heading the UCL Machine Reading research group, and by Dr. Andreas Vlachos, who is currently doing his post-doc at UCL.

The talk covered the recent development in statistical language models, that go beyond n-grams and that build on distributional semantics (language modeling consists in assigning probabilities to sentences by factorizing the joint likelihood of the sentence into conditional likelihoods of a word given the word’s history). In these new language models, also called continuous space language models or neural probabilistic language models, each word is “embedded” into a low-dimensional vector-space representation that is learned as the language model is trained. By relying on very large corpora (as large as millions or billions of words, such as the 3.2GB English Wikipedia corpus), these models achieve state-of-the-art perplexity and word error rates. Starting from neural probabilistic language models, I presented their extensions, including recurrent neural networks, log-bilinear models and continuous bags of words, mentioned the Microsoft Sentence Completion challenge and illustrated how these models were able to preserve semantic linguistic regularities such as:
{king} – {man} + {woman} = {queen}.