Preface
1
Data Visualisation
1.1
Dataset
1.1.1
TLC Yellow Taxi 2019 Data
1.1.2
MTA Static Transport Accessibility Data
1.1.3
qri Transport Usage Data
1.2
Metrics & Feature Selection
1.2.1
Pick-up Zone Profitability
1.2.2
Hourly Demands
1.2.3
Public Transport Competition Factor
1.3
Data Preprocessing
1.3.1
TLC Yellow Taxi 2019 Data
1.3.2
MTA Static Transport Accessibility Data
1.3.3
qri Transport Usage Data
1.4
Exploratory Data Analysis
1.4.1
Overall trends of taxi trips
1.4.2
Hotspot Analysis
1.4.3
Effect of Public Transport Access
1.5
Conclusion
2
Explanatory Modelling
2.1
Dataset & Sampling
2.1.1
Feature Engineering
2.1.2
Sampling
2.2
Feature Transformation
2.2.1
Log Transformation of Numerical Features
2.2.2
Factorization of Categorical Features
2.3
Feature Analysis
2.3.1
Feature Distribution
2.3.2
Pairwise Correlation
2.3.3
Comparison of Spatial Resolution
2.3.4
Categorical – Numerical Feature Interactions
2.3.5
Feature Selection for Explanatory Models
2.4
Model Selection
2.4.1
Linear Additive Models
2.4.2
Linear Interaction Models
2.4.3
Linear Mixed Models
2.4.4
Experimental Design
2.4.5
Model Selection Metrics
2.5
Modelling Results
2.5.1
Benchmark Model Term Relevance Analysis
2.5.2
Model Comparison on the 10000-Instance Subsample
2.5.3
Final Model Inference
2.5.4
Error Analysis
2.6
Conclusion
Code Section
2.7
Data Preprocessing (
Python
)
2.7.1
Phase 1
2.7.2
Phase 2
2.8
Visualisation (
Python
)
2.8.1
Sankey diagram
2.8.2
Profitability
2.8.3
Airport stripe
2.9
Linear Modelling (
R
)
✪ About the author (Melbourne, AU)
✪ Send me an Email
✪ Check out my GitHub
✪ Connect on LinkedIn
Exploring Yellow Taxi Profitability in NYC: A Spatio-Temporal Analysis
Code Section
Main tools used:
Python:
pandas
,
geopandas
,
scipy
,
pyfeather
,
matplotlib
,
plotly
R:
lme4
,
plotly
,
dplyr
,
tidyr