Mapping Uber's price surge in San Francisco
Uber’s surge pricing is bit of mystery. The ride-sharing comapny applies its proprietary algorithm to automatically enforce a surge pricing, a multipler that scales up the ride cost in order to balance the demand and supply of drivers in an area. The process probably takes into account of things like number of people who have the Uber app open, number of drivers in the area, other factors, and interestingly crime statistics.
Instead of attempting to decipher the algorithm, or study its effect, I simply decided to create a visualization of the price surges in different areas of San Francisco throughout the day. I’ll be going through how I obtained the data, and created the heat map.
Obtain San Francisco neighborhood boundaries data, and calculate their center coordinates.
Collect price surge data 24/7 at each center coordinates via Uber API from a script running on AWS EC2.
Make the map above in d3.js and leaflet.js and add the timeline using vis.js
Since Uber’s surge pricing process is essentially a black box, I decided to arbitrarily pick different locations to get a general representation. I was aware that the surge pricing area could be very localized, as much as a couple of blocks. Actually, someone made an app precisely to take advantage of this. I didn’t confirm how accurate this assumption actually is. My process of dividing San Francisco into different surge areas is more dependent on what sort of meaningful boundaries I could draw on the map. This ended up being neighborhood boundaries created by the Department of City Planning that I came across here.
1) Mapping the neighborhoods
I never made a map before, and knew this would definitely be the most challenging portion of this project. Before collecting any data, I wanted to make sure I could draw the map of San Francisco, and dynamically display information on it. I looked to d3.js and leaflet.js, and went through different basic walkthroughs I found on the web. Through trial and error, I was able to create a map San Francisco with the neighborhood layer. The process also involved converting ESRI shapefile into GeoJSON format and setting the center of map with appropriate scaling. Moreover, I was able to use Leaflet.awesome-markers to place markers onto arbitrary coordinates.
2) Collecting data through Uber API
The Uber API allows you to get the information like price estimate between two coordinates, the surge multiplier in effect, and wait time estimate at the time of the request. Here’s a portion of the json that is returned for UberX price end point.
Note that the price surge only depends on the pick up location. So I only needed to vary the pick up coordinates to get the surge price of different area. From the shapefile that I found, there are 37 neighborhoods. Since the API allows up to 1000 requests per hour, I decided to collect data every 3 minutes at 37 locations because 37*(60/3) = 740 < 1000. Next, I calculated the center of polygon of each neighborhood from the GeoJSON file. I assumed that the centers represent the surge price of each area. I know that this is a very big assumption, but for the sake of practicality we’ll go with it. Perhaps to get better idea of how localized the surges are, I can simultaneously request the prices at a lot of locations at once. I’ll get around to that sometimes.
To collect the data, I wrote a python script to make requests of all 37 locations every 3 minutes, and store the surge prices. I deployed the script on Amazon EC2 to keep it running 24/7 on the cloud. It’s still running now, and I can SSH into the instance to get latest the data. I made a mistake of storing the data in a plain text file instead of setting up a database. This led to some lag in storing the data when the file started to get large, resulting in some bad data. In the case of a bad data point, I simply display a question mark in place of the surge multipler.
3) Adding information to the map
Once I have the data, I needed to display the surge price at each location and be able to update the surge price by varying time. I modified leaflet awesome-markers to display the surge multiplier inside the markers at each location. Next, I used vis.js to add an interactive timeline. The timeline component from vis library came with very nice built-in features which made my life a lot easier. I bind the movement of time bar to a function that updates the data in each marker. In the last step, I also update the color of each area based on the data, and added the play/pause feature.
Thoughts on the data?
I haven’t found any non-obvious patterns of the data. The price surges are definitely more concentrated in the downtown areas. I’ll play around for a little bit, and update here if something interesting comes up. Feel free to let me know in comments if you see something.