References

[Sze10]

[CN11]

A. Coates and A. Ng. The importance of encoding versus training with sparse coding and vector quantization. In Lise Getoor and Tobias Scheffer, editors, Proceedings of the 28th International Conference on Machine Learning (ICML-11), ICML '11, 921–928. New York, NY, USA, June 2011. ACM.

[CDF+04]

G. Csurka, C. R. Dance, L. Fan, J. Willamowski, and C. Bray. Visual categorization with bags of keypoints. In In Workshop on Statistical Learning in Computer Vision, ECCV, 1–22. 2004.

[FvE91]

D.J. Felleman and D.C. van Essen. Distributed hierarchical processing in the primate cerebral cortex. Cereb Cortex, pages 1–47, 1991.

[Gir15]

Ross Girshick. Fast r-cnn. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), ICCV '15, 1440–1448. Washington, DC, USA, 2015. IEEE Computer Society. URL: http://dx.doi.org/10.1109/ICCV.2015.169, doi:10.1109/ICCV.2015.169.

[GDDM14]

Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. October 2014.

[GD07]

K. Grauman and T. Darrell. The pyramid match kernel: efficient learning with sets of features. J. Mach. Learn. Res., 8:725–760, May 2007. URL: http://dl.acm.org/citation.cfm?id=1248659.1248685.

[GL11]

K. Grauman and B. Leibe. Visual Object Recognition. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers, 2011.

[HS88]

C. Harris and M. Stephens. A combined corner and edge detector. In In Proc. of Fourth Alvey Vision Conference, 147–151. 1988.

[HZRS16]

K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770–778. 2016. doi:10.1109/CVPR.2016.90.

[HZRS14]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Spatial pyramid pooling in deep convolutional networks for visual recognition. January 2014. URL: http://dx.doi.org/10.1007/978-3-319-10578-9_23, doi:10.1007/978-3-319-10578-9_23.

[IS15]

Sergey Ioffe and Christian Szegedy. Batch normalization: accelerating deep network training by reducing internal covariate shift. CoRR, 2015. URL: http://arxiv.org/abs/1502.03167.

[KS14]

Andrew Zisserman Karen Simonyan. Very deep convolutional networks for large-scale image recognition. 2014.

[KSH]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional neural networks.

[LSP06]

S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2, CVPR '06, 2169–2178. Washington, DC, USA, 2006. IEEE Computer Society. URL: http://dx.doi.org/10.1109/CVPR.2006.68, doi:10.1109/CVPR.2006.68.

[Low04]

D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91–110, 2004.

[NS06]

D. Nister and H. Stewenius. Scalable recognition with a vocabulary tree. In CVPR, 2161–2168. 2006.

[RDGF15]

Joseph Redmon, Santosh Kumar Divvala, Ross B. Girshick, and Ali Farhadi. You only look once: unified, real-time object detection. CoRR, 2015. URL: http://arxiv.org/abs/1506.02640, arXiv:1506.02640.

[RHGS15]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: towards real-time object detection with region proposal networks. In C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 28. Curran Associates, Inc., 2015. URL: https://proceedings.neurips.cc/paper/2015/file/14bfa6bb14875e45bba028a21ed38046-Paper.pdf.

[SC00]

B. Schiele and J.L. Crowley. Recognition without correspondence using multidimensional receptive field histograms. International Journal of Computer Vision, 36(1):31–50, 2000.

[SEZ+]

Pierre Sermanet, David Eigen, Xiang Zhang, Michael Mathieu, Rob Fergus, and Yann Lecun. Overfeat: integrated recognition, localization and detection using convolutional networks.

[SZ03]

J. Sivic and A. Zisserman. Video google: a text retrieval approach to object matching in videos. In Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2, ICCV '03, 1470–. Washington, DC, USA, 2003. IEEE Computer Society. URL: http://dl.acm.org/citation.cfm?id=946247.946751.

[SB91]

Michael J. Swain and Dana H. Ballard. Color indexing. Int. J. Comput. Vision, 7(1):11–32, 1991. URL: https://doi.org/10.1007/BF00130487, doi:10.1007/BF00130487.

[SLJ+14]

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. 2014. arXiv:1409.4842.

[Sze10]

R. Szeliski. Computer Vision: Algorithms and Applications. Springer-Verlag New York, Inc., New York, NY, USA, 1st edition, 2010. ISBN 1848829345, 9781848829343.

[TM07]

T. Tuytelaars and K. Mikolajczyk. Local invariant feature detectors: a survey. Foundations and Trends in Computer Graphics and Vision, 3(3):177–280, 2007.

[YYGTHuang09]

J. Yang, K. Yu, Y. Gong, and T.Huang. Linear spatial pyramid matching using sparse coding for image classification. In in IEEE Conference on Computer Vision and Pattern Recognition(CVPR. 2009.

[RubnerTomasiGuibas98]

Y. Rubner, C. Tomasi, and L. J. Guibas. A metric for distributions with applications to image databases. In Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271), volume, 59–66. 1998. doi:10.1109/ICCV.1998.710701.