In this work, we propose and address a new computer vision task, which we call fashion item detection, where the aim is to detect various fashion items a person in the image is wearing or carrying. The types of fashion items we consider in this work include hat, glasses, bag, pants, shoes and so on.
The detection of fashion items can be an important first step of various e-commerce applications for fashion industry. Our method is based on state-of-the-art object detection method which combines object proposal methods with a Deep Convolutional Neural Network.
Since the locations of fashion items are in strong correlation with the locations of body joints positions, we incorporate contextual information from body poses in order to improve the detection performance. Through the experiments, we demonstrate the effectiveness of the proposed method.
We describe a completely automated large scale visual recommendation system for fashion. Our focus is to efficiently harness the availability of large quantities of online fashion images and their rich meta-data.
Specifically, we propose two classes of data driven models in the Deterministic Fashion Recommenders (DFR) and Stochastic Fashion Recommenders (SFR) for solving this problem. We analyze relative merits and pitfalls of these algorithms through extensive experimentation on a large-scale data set and baseline them against existing ideas from color science.
We also illustrate key fashion insights learned through these experiments and show how they can be employed to design better recommendation systems.
The industrial applicability of proposed models is in the context of mobile fashion shopping. Finally, we also outline a largescale annotated data set of fashion images (Fashion-136K) that can be exploited for future research in data driven visual fashion.
Recent advances in consumer depth sensors have created many opportunities for human body measurement and modeling. Estimation of 3D body shape is particularly useful for fashion e-commerce applications such as virtual try-on or fit personalization.
In this paper, we propose a method for capturing accurate human body shape and anthropometrics from a single consumer grade depth sensor. We first generate a large dataset of synthetic 3D human body models using real-world body size distributions.
Next, we estimate key body measurements from a single monocular depth image. We combine body measurement estimates with local geometry features around key joint positions to form a robust multi-dimensional feature vector.
This allows us to conduct a fast nearest-neighbor search to every sample in the dataset and return the closest one. Compared to existing methods, our approach is able to predict accurate full body parameters from a partial view using measurement parameters learned from the synthetic dataset.
Furthermore, our system is capable of generating 3D human mesh models in real-time, which is significantly faster than methods which attempt to model shape and pose deformations.
To validate the efficiency and applicability of our system, we collected a dataset that contains frontal and back scans of 83 clothed people with ground truth height and weight. Experiments on real-world dataset show that the proposed method can achieve real-time performance with competing results achieving an average error of 1.9 cm in estimated measurements.
Fashion, and especially apparel, is the fastest-growing category in online shopping. As consumers requires sensory experience especially for apparel goods for which their appearance matters most, images play a key role not only in conveying crucial information that is hard to express in text, but also in affecting consumer's attitude and emotion towards the product.
However, research related to e-commerce product image has mostly focused on quality at perceptual level, but not the quality of content, and the way of presenting.This study aims to address the effectiveness of types of image in showcasing fashion apparel in terms of its attractiveness, i.e. the ability to draw consumer's attention, interest, and in return their engagement.
We apply advanced vision technique to quantize attractiveness using three common display types in fashion filed, i.e. human model, mannequin, and flat. We perform two-stage study by starting with large scale behavior data from real online market, then moving to well designed user experiment to further deepen our understandings on consumer's reasoning logic behind the action.
We propose a Fisher noncentral hypergeometric distribution based user choice model to quantitatively evaluate user's preference. Further, we investigate the potentials to leverage visual impact for a better search that caters to user's preference. A visual attractiveness based re-ranking model that incorporates both presentation efficacy and user preference is proposed. We show quantitative improvement by promoting visual attractiveness into search on top of relevance.
We describe a completely automated large scale visual recommendation system for fashion. Existing approaches have primarily relied on purely computational models to solving this problem that ignore the role of users in the system.
In this paper, we propose to overcome this limitation by incorporating a user-centric design of visual fashion recommendations. Specifically, we propose a technique that augments 'user preferences' in models by exploiting elasticity in fashion choices. We further design a user study on these choices and gather results from the 'wisdom of crowd' for deeper analysis.
Our key insights learnt through these results suggest that fashion preferences when constrained to a particular class, contain important behavioral signals that are often ignored in recommendation design.
Further, presence of such classes also reflect strong correlations to visual perception which can be utilized to provide aesthetically pleasing user experiences. Finally, we illustrate that user approval of visual fashion recommendations can be substantially improved by carefully incorporating these user-centric feedback into the system framework.
With the rapid proliferation of smartphones and tablet computers, search has moved beyond text to other modalities like images and voice. For many applications like Fashion, visual search offers a compelling interface that can capture stylistic visual elements beyond color and pattern that cannot be as easily described using text.
However, extracting and matching such attributes remains an extremely challenging task due to high variability and deformability of clothing items. In this paper, we propose a fine-grained learning model and multimedia retrieval framework to address this problem.
First, an attribute vocabulary is constructed using human annotations obtained on a novel fine-grained clothing dataset. This vocabulary is then used to train a fine-grained visual recognition system for clothing styles.
We report benchmark recognition and retrieval results on Women's Fashion Coat Dataset and illustrate potential mobile applications for attribute-based multimedia retrieval of clothing items and image annotation.
xplosion of mobile devices with cameras, online search has moved beyond text to other modalities like images, voice, and writing. For many applications like Fashion, image-based search offers a compelling interface as compared to text forms by better capturing the visual attributes.
In this paper we present a simple and fast search algorithm that uses color as the main feature for building visual search. We show that low level cues such as color can be used to quantify image similarity and also to discriminate among products with different visual appearances.
We demonstrate the effectiveness of our approach through a mobile shopping application (eBay Fashion App available at https://itunes.apple.com/us/app/ebay-fashion/id378358380?mt=8 and eBay image swatch is the feature indexing millions of real world fashion images).
Our approach outperforms several other state-of-the-art image retrieval algorithms for large scale image data.