Raghav Gupta

Principal Research Scientist

eBay Research Labs
2145 Hamilton Ave,
San Jose CA 95125

Note: I have since quit eBay as of 10/13/2008. Visit my blog to check what I'm upto at this time
The opinions expressed on this page are solely my own, and have not been endorsed by eBay or any of its subsidiaries.
Raghav Gupta


Inventions
Even Selection (aka Sort By Category)
This became visible on eBay.com in early January 2007. This feature satisfies my criteria for simplicity in interface, with an incredibly complex backend implementation. I created an algorithm to do one naively simple thing: given a search term, show the maximum variety of inventory available on eBay, but in the minimum amount of space available, and with the most relevant inventory at the top. Given the dynamic nature of eBay listings, where thousands of items expire and new items get listed every second, coupled with a non-uniform category hierarchy, pre-computation is not possible. This algorithm therefore does clustering and relevance in real time to show you the widest variety of current items on eBay for your search, with the most relevant category of items on the top. Try it,
search on eBay for laptop using sort By Category. Scroll down below the featured items to see the difference. Hint: Try changing the sort to the regular "Time Ending Soonest" to see what most users see by default when they search for laptop. Another thing to note is, as market conditions change, or as the distribution of items changes over time, or even suddenly because of a big batch of new listings, the hierarchy displayed will automatically change along with it. Esteban has been an excellent product manager for the entire project. The backend implementation is very capable, but now the challenge remains on how to extend this experience without breaking with the existing search UI framework.

Relevance Sort (aka Best Match)
For a long time within eBay it was considered impossible to sort live listings by relevance in real-time, without human training/classification. Unlike regular web-search where the index does not need to be updated immediately as soon as someone somewhere in the world changes his/her webpage, eBay' search engine would need to constantly update its index as thousands of items get listed and expired every second, coupled with confusion over the meaning of "relevance" in an auction style e-commerce scenario. In 2005 I came up with an approach and built a prototype, and it worked remarkably well. After months of evangelizing to business, I led the project's production implementation. It was one of the most difficult architectural, performance, and deadline enabled challenges I have faced. By far the most challenging part turned out to be the abstraction of "relevance" across all the machines in the massive eBay search grid.

All in all, my personal belief is that when used appropriately, this capability will open the doors to significant revenue opportunities, not just for eBay, but also for the vast network of affiliate developers. Every developer who provides search functionality using the eBay API can now provide a better experience to end users.

It's disappointing that the name "Best Match" was ultimately chosen for describing the sort by relevance functionality on the main site, in particular because it has nothing to do with the "Best Matches" mechanism used in eBay eXpress. The two do not share the same algorithm, the same code, or the same implementation...except the name. The term "Best Match" seems to be getting used as an umbrella term for many things, including sorting category links by relevance on the new keyword pages:
http://buy.ebay.com/laptop (this uses another algorithm I invented separately), product recommendations and also regular items. Esteban Kozak has done a great job as product manager, and I hope this feature will continue to evolve. To check out relevance sort, compare the search results you see for ipod nano with the default Time Ending Soonest sort with the the results for ipod nano sorted by relevance.

Related Searches
When Related Searches was launched in July 2005, there was not even a whisper from the eBay community, but the click thru rates were impressive. From my point of view, this was an excellent launch. I had worked hard to sell the research prototype within eBay initially, but ultimately the quality of the recommendations sold themselves. Subsequently I led the development team and actually wrote the code myself for the production backend service. Ben Foster did a great job as product manager to define the end-user experience, and he respected my wishes to place the feature close to the search box, and still make it as inconspicuous as possible. Try it:
http://search.ebay.com/ferrari . As part of this effort, I also built the system which tracks the popular searches for all categories as shown on eBay Pulse. I believe all of this data has also been made available to eBay API users, and I hope its turning out to be as useful as I had thought it would be.

Sorting By Distance
I remember the launch of this feature many years back annoyed a large segment of the community, because it replaced the original "search by region" feature instead of adding to it. On top of that, the new functionality was largely hidden by way of having to select the "Distance" option from the sorting drop down. The new functionality allowed one to specify a zip code and sort the listings in the search results by distance from that zip code. Given the latitude longitude of two points on the earth, calculate the geographical distance between them. For short distances it sounds trivial, but for longer distances, the curvature of the earth requires complex trigonometric calculations to get an accurate answer. You need to find the Great Circle around the earth which joins the two points such that the arc between the two points is the shortest possible. Then you need to find the length of the arc.

If we were to implement all this, it would have required massive investments in hardware to satisfy the query volume this feature would get, coupled with the fact that for a query returning ten thousand results, it would have taken many seconds to calculate the distance for all of them. Amongst competing proposals, I came up with an algorithm where you can get the geographic distance between two points using just a couple of integer additions and subtractions, still accurate to within 2 miles around the entire surface of the earth.

Contextual Keyword Extractor (aka eBay AdContext)
This is still not live, although the basic technology was built a while back. I created the algorithm which takes any random piece of text, and spews out the most relevant list of keywords which can be broken out as search links to eBay, or to actually perform a relevance search on eBay using those keywords and getting back the top few most relevant items to show as advertisements.

Behavioral data search engine
We mine a huge amount of data from user activity, for various purposes. For example, some data is available at the user level. Some is available at a category level, some at country level, and some global. Sometimes the same type of data might be available in all resolutions. To make all of this data available for various uses in a generic manner, I led the design and architecture efforts for building a custom search engine/repository/database. We invented a brand new hierarchical query language derived from SQL to allow for querying the custom multi-dimensional hierarchical database, and mechanisms to save huge amounts of data in as efficient a manner as possible, while still providing millisecond level query response. This component cannot be seen by end users, but rest assured it is being heavily used. This component is a significant part of the "feedback loop", where activity on the site is automatically collected, analysed, processed and fed back into the live system.

Other Stuff

Patents(pending) Demos


Why eBay is different
The eBay marketplace is not only huge, it is a fantastically dynamic eco-system within itself. As a researcher, one could easily spend one's life exploring the mountains of data this well instrumented eco-system generates minute by minute (I know I could). It gets very tempting to treat the data exactly as it looks; detailed individual activity like an item being listed for sale at a particular time, a search query being performed, so on and so forth, and consequently it looks trivial to run regular data mining algorithms on this data to "generate more understanding". Only a deeper introspection reveals that behind every piece of data, behind every click, there is an individual human being, who possibly earns his/her living on eBay.

In most of commerce, whether online or offline, there is usually a visible distinction between buyers and sellers. But on eBay there is a huge intersection between the two groups, and the complexity this interaction generates can easily be compared to some of nature's greatest mysteries. Consequently, I believe it is simply not possible to computationally model all of eBay activity, as it would require the sum of human knowledge, updated right up to the current second. We can only make approximations...but herein lies the paradox. If you make a perceivably positive change to the system based upon previous data-analysis, can you be absolutely sure you won't cause an opposite reaction?

eBay Listings are a jumble of a wide variety of words and characters, so much so that eBay now has its own lexicon, vastly different from any other e-commerce dataset. Sellers demonstrate extreme creativity in designing their item titles to capture the viewers’ attention. Given that regular eBay search functions by simply keyword matching, and the hundreds of millions of listings for sale at any time, nearly every popular search brings back a large number of individual listings. Some folks actually like it, some don't. Some users complain about why "i pod" doesn't also return "ipod", whereas some complain if it does. Someone says "flutes" means a plural of the musical instrument flute, someone else says it means champagne glasses.

I have lent my support to many initiatives over the years, and opposed others. I supported adding plurals in search, but not synonyms. I supported aggressive spell check recommendations, but not auto-correction. I supported making "stop words" in description into wildcards, but not their removal altogether. In eBay's context there are no right or wrong answers to such questions, as most are beneficial to one segment of users and bad for another. The challenge of course, is to first come up with the correct question, and then the correct answer, which would result in a win-win for all.