When customers search for inventory at Hotwire, we want to present them with results that are relevant and meaningful. This will help the customers get to the inventory they are looking for faster and in turn power the growth of the business by impacting suppliers’ ability to push available inventory. Feature testing at Hotwire and Expedia has proven that sort is a very powerful lever to drive the business as a majority of the purchases are made from the top results. The sorting infrastructure at Hotwire was strongly coupled to a rule based system that limited the speed of iteration and the complexity of sorting models that can be tested. We wanted to solve the problem of being able to iterate on complex machine learning sort models in a way that is scalable, rapidly deployable to production, highly available with super low latency and have detailed and real-time live system behavior monitoring.
The search team at Hotwire decided that any solution to the above problem must adhere to the following guiding principles.
- Separation of concerns / loose coupling: Business processes must be isolated from one another for independent evolution
- Capacity scaling for 5 year growth with no architectural inflection: Minimum 5 year shelf life so we can expand without significant “refactoring”
- Rapid and continuous deployment to the live environment: A story should go from business sign-off to live in less than 10 minutes
- High availability and super-low latency: Studies have shown that most users will abandon the site if the page load takes more than a couple of seconds. Based on that a full destination search must complete in less than a few hundred milliseconds.
- Detailed and real-time live system behavior monitoring: We must be able to cheaply monitor and debug the live system at any time
ARCHITECTURE OF SOLUTION
We tackled the problem specifically for hotel sort and implemented a microservice based architecture where the call to sort the hotels was decoupled from the existing Hotwire stack. The microservice called Score Service is a RESTful web service deployed to Amazon Web Services (AWS) and calculates hotel’s score for a given search criteria. When a call to fetch hotel results comes to the existing Hotwire stack, a HTTP call is made to the Score Service with all information needed to score every hotel in the result set. The service then responds with a score for each hotel that is used to sort the results on the page.
BUILD AND DEPLOY PIPELINE
We have an automated build pipeline that kicks off a build on every commit and pushes a “good quality” image to a private Docker registry. A good quality image is defined as the one that passes unit testing, functional integration testing, minimal code coverage and code style checking. Quick performance testing of the image happens automatically after the build and if the performance is good, then the image is ready to be deployed to production. When detailed performance testing is required three kinds of performance testing is done – Load test with current production search volume, stress with 10x production search volume and stability with current production search volume for an extended period of time. The production deploy job when triggered will do a blue-green deploy and will start switching traffic if the newly deployed images passes smoke tests. There is a Jenkins job that does an instant rollback to the previous containers in case of any critical issues. No manual testing is done as part of the build and deploy process.
This architecture helps us achieve our guiding principles.
- Separation of concerns / loose coupling
- Score Service achieves clean separation of concerns as the service handles scoring the inventory and nothing else. It is decoupled from the existing Hotwire architecture by designing it as a microservice deployed to AWS.
- Capacity scaling for 5 year growth with no architectural inflection
- We used Docker to containerize the code and deployed the images to AWS. The microservice architecture coupled with containerization will help achieve scalability.
- Rapid and continuous deployment to the live environment
- The build and release pipeline has been automated completely thus enabling us to deploy code to production in about 5 minutes.
- High availability and super-low latency
- The use of Docker to containerize the code and deploying the Docker images to AWS with Elastic Load Balance (ELB) helps us be highly available. The latency of the search is not yet where we want it to be but we are looking into reducing the time to score hotels to less than a few hundred milliseconds.
- Detailed and real-time live system behavior monitoring
- We have a detailed and real-time dashboard that monitors the total response times of each model, number of successful and failed requests, average response time by containers, number of requests serviced by each container, top errors and traffic distribution of each sort models being tested.
- We have alerts if we see an increased number of scoring failures
- We have AppDynamics that monitors the AWS hosts and alerts for high CPU and memory usage
Here is a list of the technology stack we used in the Score Service.
|Python||Score service code|
|Flask||Python based web framework to develop the web service|
|Python packages including scikit-learn, sklearn and Pandas||Scoring models|
|Jenkins||Continuous integration server|
|nginx||Load balancing across Docker container|
|Amazon Web Services (AWS)||Cloud computing service|
|Elastic Load Balance (ELB)||Load balances across AWS hosts in different availability zones|
MACHINE LEARNING FOR SORT
To determine the score for a given hotel, the score service models uses dozens of different signals that are usually derived from the hotel features or the search criteria provided by the customer. The hotel features such as guest percentage recommendation score, savings, free amenities, and neighborhood quality have a significant impact on the hotel’s position on the results page.
Score Service will enable data scientists to use not only information about the customer, their past purchases and their behavior on our site, but also various data gathered from social media to build models. Using these signals provides customers with the most accurate and transparent way to assess the results, and achieve personalized experience by displaying the most relevant hotels on the top of the results page. Some of the machine learning techniques the data science team has been experimenting with is Boosted Decision Tree (BDT), Logistic Regression and search/customer segmentation.
- Automate the deployment of new models to production including running detailed performance testing in the pipeline
- Improve the speed of the score models
- Improve the accuracy of the score models
- Use Chaos Monkey to test the reliability of the infrastructure 
In conclusion, the microservice architecture adopted for tacking the hotel sort problem has given us a platform that is scalable, rapidly deployable to production, highly available with low latency and has detailed and real-time live system behavior monitoring. It enables the data science team to develop complex machine learning models by moving out of the rule-based engine. This gives us the flexibility to iterate and learn at a rapid rate and hence fail fast and fail cheaply.
Engineers on Hotwire Search Team
Adam Guthrie, Kunal Sachdeva, Anju Suryawanshi, Madhavi Suryadevara, Sanjay Assao, Preethi Bhat
 Blue-green deployment: http://martinfowler.com/bliki/BlueGreenDeployment.html
 Chaos Monkey: https://github.com/Netflix/SimianArmy/wiki/Chaos-Monkey