In this work, we propose and address a new computer vision task, which we call fashion item detection, where the aim is to detect various fashion items a person in the image is wearing or carrying. The types of fashion items we consider in this work include hat, glasses, bag, pants, shoes and so on.
The detection of fashion items can be an important first step of various e-commerce applications for fashion industry. Our method is based on state-of-the-art object detection method which combines object proposal methods with a Deep Convolutional Neural Network.
Since the locations of fashion items are in strong correlation with the locations of body joints positions, we incorporate contextual information from body poses in order to improve the detection performance. Through the experiments, we demonstrate the effectiveness of the proposed method.
As the amount of user generated content on the internet grows, it becomes ever more important to come up with vision systems that learn directly from weakly annotated and noisy data. We leverage a large scale collection of user generated content comprising of images, tags and title/captions of furniture inventory from an e-commerce website to discover and categorize learnable visual attributes. Furniture categories have long been the quintessential example of why computer vision is hard, and we make one of the first attempts to understand them through a large scale weakly annotated dataset. We focus on a handful of furniture categories that are associated with a large number of fine-grained attributes. We propose a set of localized feature representations built on top of state-of-the-art computer vision representations originally designed for fine-grained object categorization. We report a thorough empirical characterization on the visual identifiability of various fine-grained attributes using these representations and show encouraging results on finding iconic images and on multi-attribute prediction.
In image classification, visual separability between different object categories is highly uneven, and some categories are more difficult to distinguish than others. Such difficult categories demand more dedicated classifiers. However, existing deep convolutional neural networks (CNN) are trained as flat N-way classifiers, and few efforts have been made to leverage the hierarchical structure of categories.
In this paper, we introduce hierarchical deep CNNs (HD-CNNs) by embedding deep CNNs into a category hierarchy. An HD-CNN separates easy classes using a coarse category classifier while distinguishing difficult classes using fine category classifiers. During HD-CNN training, component-wise pretraining is followed by global finetuning with a multinomial logistic loss regularized by a coarse category consistency term.
In addition, conditional executions of fine category classifiers and layer parameter compression make HD-CNNs scalable for large-scale visual recognition. We achieve state-of-the-art results on both CIFAR100 and large-scale ImageNet 1000-class benchmark datasets. In our experiments, we build up three different HD-CNNs and they lower the top-1 error of the standard CNNs by 2.65%, 3.1% and 1.1%, respectively.