EXPLORING URBAN DATA WITH MACHINE LEARNING

Urban Mobility Index

Kit Nga Chou | Kirthi Balakrishnan | Michelle Chen | Lizzie Lee

Research Question

How can we utilize Walkscore.com’s pre-existing datasets of major cities to build a training model that can predict the efficiency of any city and/or neighborhood based on their street connectivity & transit density?

methodology

Framework + Pipeline

methodology

Datasets

Three open-source API-based datasets to attempt reverse-engineering Walkscore.com's methodology

1.

Road Maps

Image classification with Keras to identify correlation between visual street network & Walk Scores

2.

Bus Stops

Neighborhood-wise bus stop location identification & occurance calculations

3.

Intersection Nodes

Extracting intersection nodes from openstreetmap plots & calculating densities for each neighborhood

CITIES

Training Data

Boulder, CO | Ann Arbor, MI | Chicago, IL
Washington D.C. | New York, NY | San Francisco, SF

CITIES

Validation Data

Madison, WI | Seattle, WA | Tulsa, OK

data preparation

Webscrapping for Existing Walkscores

using Beautiful Soup

webscrapping

Extracting Boundaries

using regex & javascript via js2py

IMAGE CLASSIFICATION WITH KERAS

URL → EnPath → Polygon

IMAGE CLASSIFICATION WITH KERAS

1

Dynamic Google Maps API to Static Image

Static images do not accept overlaid polygons with holes, which was necessary to extract street data of only a specific boundary

keras image classifier: categorical model

Two Types of Images Compared

ISSUES FACED

Overfitting + Low Validation Accuracy

Dropped from methodology

Unprocessed Image

Accuracy

Training accuracy increases
Validation accuracy is fickle

Unprocessed Image

Loss

Training loss decreases
Validation loss is fickle

Processed Image

Accuracy

Training accuracy increases
Validation accuracy is stagnant

Processed Image

Loss

Training loss decreases
Validation loss increases

linear regression data preparation

2

Bus Stop Density Mapping

Static images do not accept overlaid polygons with holes, which was necessary to extract street data of only a specific boundary

STEP 1

Query

Use Overpass Turbo wizard to generate query

STEP 2

Extract

Use Overpass API to extract points to Python

STEP 3

Count

Use bounding box + count to find number of bus stops

STEP 4

Get Density

By area & population/1000 of the neighborhood

LINEAR REGRESSION DATA PREPARATION

3

Intersection Density Mapping

Extracting line plots from Open Street Maps via the osmnx package in Python

OSMNX → Street Graph → Graph Nodes

APPLICATION + ALGORITHM

Prediction Models

3 Clustering Models Attempted

Three different clustering methods were used after splitting the data into 10 classes based on Walkscore

PREDICTIVE MODELS

Linear Regression

Diagonal Correlation in raw data pattern

parameters

Bus Stop and Intersection Densities by Sq. Km. and 1000 capita are used as predictors for the Walk Scores

(hover to see error difference)

RESEARCH FINDINGS

Results + Implications

We are confident that if we were able to increase the parameters, for instance, strengthening the datasets by adding cities that have diverse neighborhoods with differences in walkability, then we could more accurately predict city’s Walkscores.

RESEARCH FINDINGS

Limitations

Disparity in distribution of training and test datasets and their Walkscores

research FINDINGS

Improvement Gaps

1

Accuracy

Insufficient; needs more data points; needs more computing power

2

Parameters

Parameters' r-squared is not enough; more parameters can be added

3

Scalability

Front-end development to accept different kinds of input to return walkscore

Conclusion

Next Steps

Our ultimate goal behind creating a predictive Walkscore is to encourage planners to create dynamic and walkable neighborhoods, which provides health and sustainability benefits, and also increases neighborhood connectivity to the disadvantaged populations who might not have access to vehicles.

For the next steps, we would hope to devote more time into classifying each neighborhoods by their streets patterns, such as grid patterns vs. loops pattern, then we can compare if streets patterns have any correlation to a higher or lower Walk Score.

EXPLORING URBAN DATA WITH MACHINE LEARNING

Urban Mobility Index

INTRODUCTION

Why Mobility?

The Need for Better Walkability & Transit

Research Question

How can we utilize Walkscore.com’s pre-existing datasets of major cities to build a training model that can predict the efficiency of any city and/or neighborhood based on their street connectivity & transit density?

methodology

Framework + Pipeline

methodology

Datasets

Three open-source API-based datasets to attempt reverse-engineering Walkscore.com's methodology

1.

Road Maps

2.

Bus Stops

3.

Intersection Nodes

CITIES

Training Data

Boulder, CO | Ann Arbor, MI | Chicago, ILWashington D.C. | New York, NY | San Francisco, SF

CITIES

Validation Data

Madison, WI | Seattle, WA | Tulsa, OK

data preparation

Webscrapping for Existing Walkscores

using Beautiful Soup

webscrapping

Extracting Boundaries

using regex & javascript via js2py

IMAGE CLASSIFICATION WITH KERAS

URL → EnPath → Polygon

IMAGE CLASSIFICATION WITH KERAS

1

Dynamic Google Maps API to Static Image

Static images do not accept overlaid polygons with holes, which was necessary to extract street data of only a specific boundary

keras image classifier: categorical model

Two Types of Images Compared

ISSUES FACED

Overfitting + Low Validation Accuracy

Dropped from methodology

Accuracy

Loss

Accuracy

Loss

linear regression data preparation

2

Bus Stop Density Mapping

Static images do not accept overlaid polygons with holes, which was necessary to extract street data of only a specific boundary

STEP 1

Query

STEP 2

Extract

STEP 3

Count

STEP 4

Get Density

LINEAR REGRESSION DATA PREPARATION

3

Intersection Density Mapping

Extracting line plots from Open Street Maps via the osmnx package in Python

OSMNX → Street Graph → Graph Nodes

APPLICATION + ALGORITHM

Prediction Models

3 Clustering Models Attempted

K-MEANS

AGGLOMERATIVE CLUSTERING

gaussian mixture

PREDICTIVE MODELS

Linear Regression

Diagonal Correlation in raw data pattern

parameters

Bus Stop and Intersection Densities by Sq. Km. and 1000 capita are used as predictors for the Walk Scores

Model Metrics

Bus Stop & Intersection densities form 38% of the parameters used in evaluation of Walk Scores

RESEARCH FINDINGS

Results + Implications

RESEARCH FINDINGS

Limitations

Disparity in distribution of training and test datasets and their Walkscores

Boulder, CO | Ann Arbor, MI | Chicago, IL
Washington D.C. | New York, NY | San Francisco, SF