1.1 Dataset

1.1.1 TLC Yellow Taxi 2019 Data

This visual report uses the 2019 subset of the Yellow Taxi trip data provided by TLC 6, with a total of over 80 million instances with 17 features. The full dataset, which is available as monthly CSV files, was pre-processed in serialisation with Pandas and PyArrow and later aggregated for visualisation and analysis. While this allows for a monthly comparison between trip records, the report focuses on analysis with an hourly temporal resolution, which benefits greatly from the representativeness of big data. Assuming the seasonality effect on taxi usage is consistent, the report findings in the hourly scale are generalisable for other years without any anomalies. The spatial temporal of the dataset is set to the 131 taxi zones table (having excluded the 2 Unknown IDs) that are also provided by TLC.

1.1.2 MTA Static Transport Accessibility Data

The locations of the NYC subway stations in geodatabase format is obtained via NYC OpenData 7, which are then mapped to the corresponding taxi zones. Static subway schedule data for 2019 are available through General Transit Feed Specification (GTFS) from MTA 8. Both datasets are used to calculate the accessibility of public transport at hourly intervals during the day for each taxi zone, and include data for the NYC subway system as well as three commuter rails (Long Island Railroad, Metro-North Railroad, and Staten Island Railway) within NYC boundaries.

1.1.3 qri Transport Usage Data

For public transport usage, the 2019 preprocessed Turnstile Count Data by qri 9 is used, which removes the need for pre-processing the raw dataset by MTA. The turnstile daily entry count is a measure of the total number of people who enter a particular subway station (excluding three commuter rails) in comparison to the number of taxi pick-ups from the TLC dataset.


  1. TLC Trip Record Data - TLC. (2020). Retrieved 30 August 2020, from https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page↩︎

  2. Subway Stations. (2020). Retrieved 30 August 2020, from https://data.cityofnewyork.us/Transportation/Subway-Stations/arq3-7z49↩︎

  3. Other data, including static data. (2020). Retrieved 30 August 2020, from http://web.mta.info/developers/developer-data-terms.html#data↩︎

  4. NYC Subway Turnstile Counts - 2019 | Dataset Published on qri.cloud. (2019). Retrieved 30 August 2020, from https://app.qri.io/nyc-transit-data/turnstile_daily_counts_2019/at/ipfs/QmduJkH9H9JQyseo1jfKaQAFoPKhZSKZ7aoPh7dW71jTXp↩︎