Computer Vision

Computer Vision
This is an overview of all content belonging to this term.

Results

arXiv, May, 2014

Enhancing Visual Fashion Recommendations with Users in the Loop

Anurag Bhardwaj, Vignesh Jagadeesh, Wei Di, Robinson Piramuthu, Elizabeth Churchill

We describe a completely automated large scale visual recommendation system for fashion. Existing approaches have primarily relied on purely computational models to solving this problem that ignore the role of users in the system.

In this paper, we propose to overcome this limitation by incorporating a user-centric design of visual fashion recommendations. Specifically, we propose a technique that augments 'user preferences' in models by exploiting elasticity in fashion choices. We further design a user study on these choices and gather results from the 'wisdom of crowd' for deeper analysis.

Our key insights learnt through these results suggest that fashion preferences when constrained to a particular class, contain important behavioral signals that are often ignored in recommendation design.

Further, presence of such classes also reflect strong correlations to visual perception which can be utilized to provide aesthetically pleasing user experiences. Finally, we illustrate that user approval of visual fashion recommendations can be substantially improved by carefully incorporating these user-centric feedback into the system framework.

CVPR 2014

Region-based Discriminative Feature Pooling for Scene Text Recognition

Chen Yu Lee, Anurag Bhardwaj, Wei Di, Vignesh Jagadeesh, Robinson Piramuthu

We present a new feature representation method for scene text recognition problem, particularly focusing on improving scene character recognition. Many existing methods rely on Histogram of Oriented Gradient (HOG) or part based models, which do not span the feature space well for characters in natural scene images, especially given large variation in fonts with cluttered backgrounds.

In this work, we propose a discriminative feature pooling method that automatically learns the most informative sub-regions of each scene character within a multi-class classification framework, whereas each sub-region seamlessly integrates a set of low-level image features through integral images.

The proposed feature representation is compact, computationally efficient, and able to effectively model distinctive spatial structures of each individual character class. Extensive experiments conducted on challenging datasets (Chars74K, ICDAR’03, ICDAR’11, SVT) show that our method significantly outperforms existing methods on scene character classification and scene text recognition tasks.

Keywords
Categories
ICML 2014 workshop on New Learning Models and Frameworks for BigData

Geometric VLAD for Large Scale Image Search

Zixuan Wang, Wei Di, Anurag Bhardwaj, Vignesh Jagadeesh, Robinson Piramuthu

We present a novel compact image descriptor for large scale image search. Our proposed descriptor - Geometric VLAD (gVLAD) is an extension of VLAD (Vector of locally Aggregated Descriptors) that incorporates weak geometry information into the VLAD framework.

The proposed geometry cues are derived as a membership function over keypoint angles which contain evident and informative information but yet often discarded. A principled technique for learning the membership function by clustering angles is also presented.

Further, to address the overhead of iterative codebook training over real-time datasets, a novel codebook adaptation strategy is outlined. Finally, we demonstrate the efficacy of proposed gVLAD based retrieval framework where we achieve more than 15% improvement in mAP over existing benchmarks.

Keywords
Categories
IEEE International Conference on Image Processing (ICIP), 2014

Cascaded Sparse Color-Localized Matching for Logo Retrieval

Rohit Pandey, Wei Di, Vignesh Jagadeesh, Robinson Piramuthu, Anurag Bhardwaj

We present a new feature representation method for scene text recognition problem, particularly focusing on improving scene character recognition. Many existing methods rely on Histogram of Oriented Gradient (HOG) or part-based models, which do not span the feature space well for characters in natural scene images, especially given large variation in fonts with cluttered backgrounds.

In this work, we propose a discriminative feature pooling method that automatically learns the most informative sub-regions of each scene character within a multi-class classification framework, whereas each sub-region seamlessly integrates a set of low-level image features through integral images.

The proposed feature representation is compact, computationally efficient, and able to effectively model distinctive spatial structures of each individual character class. Extensive experiments conducted on challenging datasets (Chars74K, ICDAR’03, ICDAR’11, SVT) show that our method significantly outperforms existing methods on scene character classification and scene text recognition tasks.

Keywords
Categories
To appear in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2013.

Large-Scale Video Summarization Using Web-Image Priors

Aditya Khosla, Raffay Hamid, Chih-Jen Lin, Neel Sundaresan
Given the enormous growth in user-generated videos, it is becoming increasingly important to be able to navigate them efficiently. As these videos are generally of poor quality, summarization methods designed for well-produced videos do not generalize to them. To address this challenge, we propose to use web-images as a prior to facilitate summarization of user-generated videos.
 
Our main intuition is that people tend to take pictures of objects to capture them in a maximally informative way. Such images could therefore be used as prior information to summarize videos containing a similar set of objects.
 
In this work, we apply our novel insight to develop a summarization algorithm that uses the web-image based prior information in an unsupervised manner. Moreover, to automatically evaluate summarization algorithms on a large scale, we propose a framework that relies on multiple summaries obtained through crowdsourcing.
 
We demonstrate the effectiveness of our evaluation framework by comparing its performance to that of multiple human evaluators. Finally, we present results for our framework tested on hundreds of user-generated videos.
Keywords
Categories
To appear in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Mobile Vision Workshop, 2013.

Style Finder: Fine-Grained Clothing Style Recognition and Retrieval

Wei Di, Catherine Wah, Anurag Bhardwaj, Robinson Piramuthu, Neel Sundaresan

With the rapid proliferation of smartphones and tablet computers, search has moved beyond text to other modalities like images and voice. For many applications like Fashion, visual search offers a compelling interface that can capture stylistic visual elements beyond color and pattern that cannot be as easily described using text.

However, extracting and matching such attributes remains an extremely challenging task due to high variability and deformability of clothing items. In this paper, we propose a fine-grained learning model and multimedia retrieval framework to address this problem.

First, an attribute vocabulary is constructed using human annotations obtained on a novel fine-grained clothing dataset. This vocabulary is then used to train a fine-grained visual recognition system for clothing styles.

We report benchmark recognition and retrieval results on Women's Fashion Coat Dataset and illustrate potential mobile applications for attribute-based multimedia retrieval of clothing items and image annotation.

KDD-2013

Palette Power: Enabling Visual Search through Colors

Anurag Bhardwaj, Atish DasSarma, Wei Di, Raffay Hamid, Robinson Piramuthu, Neel Sundaresan

xplosion of mobile devices with cameras, online search has moved beyond text to other modalities like images, voice, and writing. For many applications like Fashion, image-based search offers a compelling interface as compared to text forms by better capturing the visual attributes.

In this paper we present a simple and fast search algorithm that uses color as the main feature for building visual search. We show that low level cues such as color can be used to quantify image similarity and also to discriminate among products with different visual appearances.

We demonstrate the effectiveness of our approach through a mobile shopping application (eBay Fashion App available at https://itunes.apple.com/us/app/ebay-fashion/id378358380?mt=8 and eBay image swatch is the feature indexing millions of real world fashion images).

Our approach outperforms several other state-of-the-art image retrieval algorithms for large scale image data.

To appear in Proceedings of IEEE International Conference on Computer Vision & Pattern Recognition (CVPR) 2013

Dense Non-Rigid Point-Matching Using Random Projections

Raffay Hamid, Dennis DeCoste, Chih-Jen Lin

We present a robust and efficient technique for matching dense sets of points undergoing non-rigid spatial transformations. Our main intuition is that the subset of points that can be matched with high confidence should be used to guide the matching procedure for the rest.

We propose a novel algorithm that incorporates these high-confidence matches as a spatial prior to learn a discriminative subspace that simultaneously encodes both the feature similarity as well as their spatial arrangement.

Conventional subspace learning usually requires spectral decomposition of the pair-wise distance matrix across the point-sets, which can become inefficient even for moderately sized problems. To this end, we propose the use of random projections for approximate subspace learning, which can provide significant time improvements at the cost of minimal precision loss.

This efficiency gain allows us to iteratively find and remove high-confidence matches from the point sets, resulting in high recall. To show the effectiveness of our approach, we present a systematic set of experiments and results for the problem of dense non-rigid image-feature matching.