Dirk Gorissen, who among many other things, organizes the London Big-O Algorithms Meetup series, invited me to give a talk at that meetup last Wednesday. I presented a shortened version of the extended tutorial on neural probabilistic language models that I gave last April at UCL. The slides are available here and the abstract follows below. The other speaker was Dominic Steinitz, who presented a great hands-on tutorial on Gibbs sampling.
A recent development in statistical language modeling happened when distributional semantics (the idea of defining words in relation to other words) met neural networks. Classical language modelling consists in assigning probabilities to sentences by factorizing the joint likelihood of the sentence into conditional likelihoods of each word given that word’s context. Neural language models further try to “embed” each word into a low-dimensional vector-space representation that can be learned as the language model is trained. When they are trained on very large corpora, these models can achieve state-of-the-art performance in many applications such as speech recognition or sentence completion.