Visualizing Taxi Fares.
Note: This content of this post is also posted on the website of the Human Mobility and Networks Lab
A couple of weeks ago I participated in an MIT Transportation Hack-a-thon. The idea of a hackathon is pretty simple. Put a bunch of geeks in a room, give them some data and a theme, set a timer for 12 hours and see what projects come out. In this case, the theme was transportation and the data ranged from surveys to online searches for airfare to taxi cab rides.
Upon arrival, I quickly settled with a team. We did some brainstorming and came up with a project inspired by the collaborative consumption boom, CollabCab. The idea was fairly simple. Public transit options (trains and buses) are great in that they are cheap and sustainable, but they lack flexibility and their roots are fixed despite known differences in morning and evening commutes. Private cabs are much more flexible in terms of origin and destinations, but they aren’t particularly sustainable and can be expensive. Carpooling and ride sharing is great, but there are fairly high coordination costs that make it difficult to set up and sustain.
ColabCab would offer a solution to this. It would entail some type of medium occupancy vehicle, perhaps a van or a minibus, and would be cheaper than a cab, but probably more expensive than a bus (we never really worked out a business plan), but the key would be that it would use differences in temporal travel patterns (like asymmetries in morning and evening commutes) to shift routes based on time of day. In order to make sure everyone knows that updated routes and schedules, a web and smartphone app would be used.
With the concept mapped out, the bulk of my day was spent exploring what data might be used to discover these asymmetries in travel demand. I chose to look at Hubway (Boston’s young bike sharing program) data and taxi cab data showing the origina and destinations (along with distance, fares, etc.) of all the taxi cabs in Boston for four non-contiguous days. This, we hoped, would give us a good idea of where people who might use our service (aka those who didn’t have their own cars) were traveling to and from at different times of the day.
I had time to create two visualizations to explore this. The first I call “Pew Pew” because it reminds me of firing lasers. The video shows cabs making trips over the course of a Wednesday. Specifically its four different Wednesdays displayed at once. Though the data is much more sparse, there are also some hubway bike trips thrown in there. One of the immediate things that jumps out at me is the increase in late night cab fares after 12:30am. This is when the T (Boston’s subway) closes down for the night. The next big wave appears between 5-6am and it seems like the city is firing its lasers directly at the airport. Again, the T is likely the culprit. It doesn’t start running until 5:15 in most places so if you have an early flight, you won’t make it on time unless you call a cab. Morning rush hour kicks in with a volley back from the airport as redeye flights and early morning commuter shuttles arrive. The day then proceeds in a very chaotic way. Visualizing the direction of each cab (eat or west) doesn’t reveal any overarching patterns.
Note: I would highly recommend watching this video in HD.
The second visualization aims at exploring spatiotemporal patterns in where people are requesting cabs to and from. The origin and destinations (OD) of cabs is an important indicator of the asymmetric flows were want to capture. In the interactive tool we I built, the user can can select a small region of the city and immediately see the destinations of all cabs originating from within that square. These trips can be filtered by time as well. The user can also choose to plot the distance of the trip, the total duration, or the fare (with data encoded in color). It’s a bit technicolor and shows some irregularities such as a surprising number of very short cab rides, but the visualization is fun to play with and does show some asymmetries.
As a technical note, the visualizations were all coded in Processing. I cleaned the data using a couple of short python scripts and made the background for the apps using QGIS, an open source GIS platform. The code as written is almost unreadable and certainly not optimized because of the time constraints of the hackathon, but I am happy to share it if anyone is interested.
In all, the event was a lot of fun and fantastically organized. As a helpful observation to anyone planning their own hackathons, I think everyone would be served by planning designated time (maybe 30 minutes) to practicing the pitches at the end of the event. Technical difficulties and “winging it” ended up wasting A LOT of time at the end of a long day. I should also mention that this project won Most Innovative Idea from the panel of judges. Many thanks to my collaborators as well as the co-sponsor of the event, Amadeus
Here is a link to the Hack-a-thon’s website and other great submission!