Resources for learning about computer vision. They focus mostly on deep learning for computer vision but include the odd traditional computer vision paper. I update the list on an ongoing basis. See Neural Networks for general resources on deep learning.
Convolutional Networks
- Deep Learning for Computer Vision: Andrej Karpathy’s tutorial on Convolutional Networks from the Bay Area Deep Learning School. 90 minutes well spent, it’s an excellent tutorial. His message “don’t be a hero” has stuck.
- ImageNet classification with Convolutional Neural Networks: THE paper which convinced the machine learning community that neural networks could achieve excellent results. Kicked off of the deep learning renaissance.
- Visualizing convolutional neural networks (part 1): Keras blog post and python code to visualize individual neurons in the VGG16 model
- Visualizing convolutional neural networks (part 2): Deep visualization toolbox
- Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images: Important paper highlighting the ease of generating adversarial images which fool convolutional neural networks and the difficulty of training networks to robustly avoid these types of errors.
- Interactive tool for visualizing convolutional network features for Keras.
- Visualizing weights and intermediate outputs of a CNN in Keras: Great 12min video tutorial by Anuj Shah.
- Faster R-CNN: The successor to R-CNN and Fast R-CNN, this uses convolutional object detection and region proposal networks for almost real-time multi-object detection. The main competitor with YOLO.
- YOLO (You Only Look Once): Very fast multi-object detection. Capable of processing images at 40-90 fps! See here for an implementation in C.
- Generating image descriptions: Seminal paper be Fei-Fei Li and Andrej Karpathy, involving multi-modal training data. Google has open-sourced an implementation of their related work, Show and Tell. Finally, Microsoft Research came out with some really interesting work on open ended Visual Question Answering.
Traditional computer vision
- Distinctive Image Features from Scale-Invariant Keypoints: Classic paper from 2004 introducing the SIFT algorithm by David Lowe. Still widely used today.
I am always looking to learn more. Please send suggestions or comments to contact [at] learningmachinelearning [dot] org