Introduction to Neural Networks

   Neural networks burst into the computer science common consciousness in 2012 when the University of Toronto won the ImageNet[1] Large Scale Visual Recognition Challenge with a convolutional neural network[2], smashing all existing benchmarks. In the same year, Andrew Ng worked with Google to build the largest neural network to date. Trained on 10 million YouTube videos, the network learnt to recognize objects, cats most famously, without ever being given labelled examples. Better and better results quickly followed and computer vision was changed forever. More recently, neural networks have achieved state of the art results in natural language processing, have composed music[3], beat the human world champion at Go[4], and now power Google’s search algorithm[5].

In their modern form, neural networks have existed since the 1980’s. 1989 saw a breakthrough result when Yann LeCun successfully trained a neural network to recognize handwritten zip codes. Constrained by a lack of data and computing power, neural networks were superseded by other machine learning algorithms in the 1990s and most researchers moved on. Their startling resurgence and rapid domination of the machine learning field is enough for neural networks to be of interest to any data scientist and machine learning researcher. However it is their elegance, flexibility and generality that make them so fascinating to study. By giving up some control over what and how algorithms learn, computer science has gained the most powerful predictive algorithm in history.

This post is an introduction to neural networks for a reader with no background in neural networks or machine learning. It is the first in a series of four articles on neural networks. It does assume a basic understanding of linear algebra and calculus. If you are comfortable with vectors, matrices, the dot product, matrix-vector multiplication, matrix-matrix multiplication, transposing vectors and matrices, element wise operations, norms, partial derivatives and second order partial derivatives then you will be fine. By the end of it a reader should have a good understanding of what the fundamental components of a neural network are, how they are combined, how a neural network makes a prediction, and how it learns to produce good results. This post focuses on helping a reader develop good intuition for why different choices have been made in the design of neural networks, and  contains an in depth discussion of the role of the activation function. It is organized into four sections and is available as a PDF (link below)

  1. What is a neural network?
  2. Historical motivations for neural networks
  3. Role of the activation function
  4. How a neural network learns

Introduction To Neural Networks PDF

Later articles in this series will discuss regularization and provide a practical example of how to build your own neural network program in Python. Here is a quick preview of what is to come:

  • Regularization for Neural Networks: Explanation of regularization and an overview of the main techniques.
  • A Neural Network program in Python: Walk through of a vectorized implementation of a general neural network program with examples. With it you will be able to build an arbitrary sized network and choose from different cost, activation, and parameter initialization functions.

[1] ImageNet challenge website

[2] See this technology review article and ImageNet Large Sale Visual Recognition Challenge, Russakovsky et. al., 2015 for more details

[3] Jukedeck

[4] AlphaGo

[5] For more information on the history of machine learning and neural networks, watch Frank Chen’s video