Practical Neural Networks for NLP: From Theory to Code

Chris Dyer, Yoav Goldberg, Graham Neubig

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review


This tutorial aims to bring NLP researchers up to speed with the current techniques in deep learning and neural networks, and show them how they can turn their ideas into practical implementations. We will start with simple classification models (logistic regression and multilayer perceptrons) and cover more advanced patterns that come up in NLP such as recurrent networks for sequence tagging and prediction problems, structured networks (e.g., compositional architectures based on syntax trees), structured output spaces (sequences and trees), attention for sequence-to-sequence transduction, and feature induction for complex algorithm states. A particular emphasis will be on learning to represent complex objects as recursive compositions of simpler objects. This representation will reflect characterize standard objects in NLP, such as the composition of characters and morphemes into words, and words into sentences and documents. In addition, new opportunities such as learning to embed ``algorithm states'' such as those used in transition-based parsing and other sequential structured prediction models (for which effective features may be difficult to engineer by hand) will be covered.Everything in the tutorial will be grounded in code --- we will show how to program seemingly complex neural-net models using toolkits based on the computation-graph formalism. Computation graphs decompose complex computations into a DAG, with nodes representing inputs, target outputs, parameters, or (sub)differentiable functions (e.g., ``tanh'', ``matrix multiply'', and ``softmax''), and edges represent data dependencies. These graphs can be run ``forward'' to make predictions and compute errors (e.g., log loss, squared error) and then ``backward'' to compute derivatives with respect to model parameters. In particular we'll cover the Python bindings of the CNN library. CNN has been designed from the ground up for NLP applications, dynamically structured NNs, rapid prototyping, and a transparent data and execution model.
Original languageEnglish
Title of host publicationProceedings of the 2016 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts
Place of PublicationAustin, Texas
PublisherAssociation for Computational Linguistics
StatePublished - 1 Nov 2016


Dive into the research topics of 'Practical Neural Networks for NLP: From Theory to Code'. Together they form a unique fingerprint.

Cite this