How we Rank Hotels at OYO

OYO as a company thrives on having a plethora of really good quality budget hotels for the masses and has been swiftly incorporating more properties in the premium, and corporate categories. OYO, being one of the largest hotel chains in the world, serves a huge and varied user base in each and every category that it is making strides in.

Having such a varied user base and inventory brings in new challenges. One of the biggest challenges that we at OYO face, is to decipher the appropriate order in which the properties should be displayed for a user searching on our platform.

We observed that the probability of the user clicking on a particular property, and eventually booking it, decays exponentially as we go down the list of properties displayed to the user. As a result, having a relevant ordering of these properties for our users has a very direct impact on the business that OYO brings in for its partners.

With such a huge and varied user base, we acknowledged the fact that a one-size-fits-all solution would not have worked to cater to the varying demand and supply. There was a need to provide our users with a more personalised experience on the search page.

Press enter or click to view image in full size

We had originally understood user behaviour through our data and thus had formulated and configured some offline rules that would rank the properties for a given search query. This rule-based algorithm, however, was neither scalable nor did it account for the personal preferences of the users. There was a need to incorporate more advanced algorithms to solve the problem at hand. Rather than moving directly to a personalised approach, we decided to make incremental changes in the algorithms that we use, so that we can build from the ground up, and have a stronger understanding of the problem at hand.

The initial approach that we used was of point wise ranking which is well known in Learning To Rank (LEeToR) academia.

Ranking Workflow

There was a lot of initial inertia that we faced while tackling the problem at hand.

Getting the data for analysis

The first major challenge that we faced was the scarcity of data. The clickstream data that was being logged was very minuscule and hence showed minimal insights into user behaviour.

Hence, the first logical step was to build an infrastructure that would allow us to capture a users’ journey on the OYO platform, from searching for a property, clicking on a few, making a booking, checking into the property and eventually checking out of it. We built an infrastructure that allowed us to capture all of this data through user clickstreams and additionally added hotel level information to the data being collected.

Converting data to information

Having set up the infrastructure, our next big challenge was to make sense of the enormous data that was pouring in. We were logging roughly 5 million impressions on our platform for a single city in a single day. From having a scarcity of data, we had now reached a stage where we had way too much of it! As processing this amount of data would be really challenging, there was a need to skim down the data so that we can begin making sense of it.

We started with filtering out the dataset by selecting only those search queries that have

at least one detail page of a property being opened. We further reduced the dataset by taking into consideration the data that a user actually sees and removing all the data of the properties that come further down the listings.

This ensured that we have only relevant searched data and helped us reduce the dataset by a huge margin. There was now a need to decide what all data should we consider as relevant for a single property. This essentially involved feature selection among the plethora of features that we were currently logging into our databases.

For our first model, we decided to pick features that only represented hotel level information. The final feature set was the following:

For our first model, we wanted to just focus on hotel level information like -

Price of the hotel
The distance of the hotel from the desired location
Discount being offered on hotel
Rating of the hotel on the platform
Number of unique users that have given ratings
Hotel type according to the internal classification used at OYO
Percentage of times hotel is booked when shown in listings
Percentage of times hotel is clicked on when shown in listings

Outliers present in the data were then removed to make data ready for input into our new machine learning algorithms.

Press enter or click to view image in full size

Training the ranking model

Till now we had prepared the data for training but there was a need to give the machine learning model certain objectives on which it would learn. We decided to give the model four different outcomes to map the features attributed to every single hotel:

Hotel that was not clicked
Hotel that was clicked but not booked
Hotel that was booked
Hotel that was booked and the user eventually checked in

Now that we had features and certain labels attributed to each hotel, we were ready to start feeding the data to a machine learning algorithm. For this purpose, we used a Gradient Boosting Decision Tree (XGBoost). Every hotel, with its features, was considered a single data point, and the label that it was given is what the model was expected to predict. Hence, the entire problem boiled down to a simple multi-label classification problem.

When fed in new features, the model gave four values as outputs. These four values were the probabilities of the input features representing a data point in the four different classes that the model had previously learned. A weighted sum was then calculated over these values to give us a final score which served as the final score attributed to a hotel. Hotels were then ranked in accordance with this score.

The weights given to the probability for each class were manually chosen after some iterations. These weights served as additional hyper-parameters.

Press enter or click to view image in full size

Evaluating the model

After training the model we needed to test the model on metrics before we could go live with it. After training the model we needed to evaluate the performance of the model on certain metrics before we could go live with it.

Get Neha Garg’s stories in your inbox

Join Medium for free to get updates from this writer.

In order to evaluate the model, we extracted and preprocessed data that we were collecting from clickstreams in order to form testing data. For the model this data was unseen. All metrics were evaluated on the performance of the model on this data.

There were mainly two kinds of metrics that we used in order to evaluate the model. The first kind of metric used was well-known ranking metrics such as Normalised Discounted Cumulative Gain (NDCG) and Mean Average Precision. The second kind of metrics used were internal metrics that were developed at OYO, such as the average booking rate at a particular position in the list, which showed us the average number of times that our model could push a booked hotel above a particular position in the test dataset. We compared how the model was performing on these metrics as opposed to the current rule-based linear algorithm that was being employed on our platform. The model was able to outperform the rule-based algorithm on all of these metrics.

We used these metrics in order to tune our hyper-parameters and understand how the classifier was understanding the features that it was getting as input. We also analysed how intuitively important metrics such as price and distance were behaving as we went down a particular listing.

Deploying on live testing

In order to deploy the model, we first built the infrastructure that would extract the required features, transform them and feed them to the model in real-time.

The model that performed the best on evaluation metrics was deployed on our platform in order to perform A/B testing with the existing algorithm being used. We then compared these models in terms of how they impact business metrics such as net revenue generated.

The initial performance of the model on live testing was not as expected. We observed that there was a little variation in the listings being shown to different users. The features being used were the main features of the hotels.

Adding behavioural aspects

Certain behavioural aspects were bought into the feature set in order to retrain the model. Features used were based on how the user interacted with the listings page:

Average price of the rooms that the users books
Average price in listings that the user clicks
Average click rate of a user
Average booking rate of a user
Average click rates for every hotel type
Average booking rate for every hotel type

Removing Rank Biasing

In order to collect sufficient amounts of data for training, we have to rely on user clickstreams, not on the manual assignment of relevance to each property, as the later would take considerably more time. This introduces the problem of rank bias. As the probability of a user clicking on a property exponentially decreases as we go down a listing, the clickstreams point towards the hotels which are already on top to be given more importance. This introduces a bias towards the properties which are already on top of the listings. When trained on such biased data, the model itself learns this bias and continues ignoring properties that are way down in the listings regardless of their other attributes or features. This effect is amplified on subsequent training of models.

In order to remove this bias, we have started experimenting with various unbiasing methods such as Inverse Propensity Weights among others. Unbiasing methods essentially penalise properties on top of the listings by an amount that closely resembles the bias. This penalisation removes or minimizes the bias. This ensures that our users get the best possible properties on top of their listings page.

Monitoring

We created monitoring dashboards, an example of which is shown below, that would allow us to track real-time trends on how the model is behaving on business metrics. We also computed internal metrics as discussed earlier, for both the baseline and the newly deployed model. In addition to these, we also observed the distribution of price and distance across ranks.

Press enter or click to view image in full size

This dashboard shows an increase in business metrics.

All of this data being captured is used to draw insights into the relationship between internal metrics used and final business metrics that it results in. It also helps us better understand how the user behaves when price and distance, which are intuitively important features, are varied across a listing.

This data also helps us to make decisions regarding retraining and feature addition within the existing model, as and when the models’ performance starts performing poorly on the metrics mentioned above.

Challenges

While observing our models’ behaviour on A/B testing we saw that the hotels that were being shown on top of the listings had a rather high distance. The model was unable to properly understand distance as a feature. The highly irrelevant distances were taken care of by introducing distance bucketing. In distance bucketing, we introduced certain buckets for distance. Let us assume them to be: 0–5, 5–10, 10–15 km. In this new methodology, we would first rank all hotels in the 0–5kms bucket, and then the 5–10kms bucket, and so on, finally arranging them in the same order on the listings page.

Press enter or click to view image in full size

In this fig, price relevance is improved in the top 10 ranks but distance relevance is affected.

Users inherently like to book hotels that are cheaper. As a result, our model started showing very cheap hotels on top of the listings. This ensured the rise of the booking rate of the users, but also affected the amount of revenue being generated by OYO and its partners as a whole. This is one challenge that is currently being analysed further.

Business Metrics

By serving self-maintained (sometimes self-owned as well) properties, OYO sets itself apart from the Online travel aggregators in the market. The very nature of the business makes the challenges at OYO very different from the ones in any of the OTA’s.

The differences spread across multiple dimensions in the ways we calculate our business metrics, how we define success and failure of projects and also how we plan to project our products as brands. Hence we needed to identify a group of OYO specific business metrics, against which we can evaluate our work.

We identified 4 major metrics that directly impact the OYO revenues/customer experience.

Conversion — Number of users that book a hotel as compared to the number of users that check the OYO listings. This has a direct impact on the revenue that OYO generates.
Take — Take defines the amount that OYO gets per booking. If the conversion rate remains the same, and we boost the hotel which provides a higher take, we would directly impact the profits of the company.
Churn — For new properties, that do not have much data to back their claim to the top of the ranking list, we need to still find a way so that they do not get starved.
Brand Visibility — As OYO provides multiple products to the customers like budget, premium, Collection-O, Townhouse, etc. We need to consider them as well while doing the actual ranking.

Press enter or click to view image in full size

Ongoing and future work

For the ongoing and future work we are currently focusing on the following:

Expanding personalised ranking at all OYO geographies.
Identifying and analysing more hotels and user behaviour features.
Testing different machine learning algorithms beyond XGBoost.
Moving onto a pairwise and subsequently a list wise ranking approach.