PGH_TRANSIT_ATLAS // 2025

16 min read Original article ↗

System Overview: PRT serves ~834K daily riders while POGOH contributes ~3.4K trips (Oct 2025 peak).

Campus Dominance: 68.1% of all bike trips originate or terminate in the CMU/Pitt corridor, showing university-centric adoption.

Integration Gap: Only 11.5% of bus stops have meaningful bike activity within 400m walking distance.

🎯 Filter Active: System-Wide View

Scale Difference: PRT serves ~834K daily riders while POGOH contributes ~3.4K daily trips (247x difference), showing buses as primary transit mode.

Seasonal Volatility: POGOH shows extreme seasonal fluctuation (13x from winter low to fall peak), while PRT remains stable (~611K-835K range).

Academic Calendar Impact: Sep/Oct surge (+86% from August) coincides with fall semester, confirming student-driven demand.

Winter Resilience: PRT actually grows in winter months while POGOH drops -63%, suggesting bikes fail as winter alternative.

Archetypes: Commuters (47.9%) dominate with short, consistent trips, followed by Last-Mile users (32.8%). Leisure rides are minimal (3.6%).

Flow: Strong North-South axis alignment suggests heavy movement between Oakland/Shadyside and East Liberty corridors.

Seasonality: Fall semester (Sep-Nov) drives peak hourly usage. Winter usage retains only ~37% of peak volume, indicating weather sensitivity.

Durations: Median trip is ~7 mins. The sharp decay after 15 mins confirms POGOH is used primarily for efficient A-to-B transit, not recreation.

Membership Economy: ~90–99% of trips at the top 10 stations are from Members; casual rides are a thin sliver concentrated in Downtown/Strip.

Network Effect: Moderate positive correlation (R²=0.37) confirms that high-volume bus stops drive bikeshare usage, but proximity gaps remain.

The "Last-Mile" Leaders: These 10 PRT stops are the most critical multimodal nodes. They have high bus volume AND are within 400m of a POGOH station.

Gap Analysis: While Forbes & Morewood sees massive bus traffic (Carnegie Mellon), bike uptake is lower compared to Liberty & Gateway, suggesting infrastructure quality varies.

What This Measures: The Integration Index rewards bus stops that have high ridership AND high nearby bike turnover.

Winner: Central Business District dominates due to density. Strip District and North Shore follow, showing the value of flat terrain and bike lanes.

Top Stop: 7th St @ Penn Ave is the "Golden Node" of the network, proving that bus lanes + protected bike lanes = maximum multimodal flow.

🚴 COMMUTER HUBS

🔗 LAST-MILE CONNECTORS

🛒 ERRAND CENTERS

🎨 LEISURE DESTINATIONS

Commuter Hubs: Schenley Dr (64.8%) and Forbes @ CMU (61.2%) are pure commuter stations—peak hour, A-to-B efficiency dominates.

Last-Mile Connectors: Boulevard of the Allies (46.9%) shows high last-mile percentage, serving as a critical transit feeder.

Errand Centers: Wilkinsburg Park & Ride (68.4%!) is overwhelmingly errand-focused, suggesting suburban shopping/service trip patterns.

Leisure Destinations: South Side Trail (19.0%) captures recreational riders—longer durations, lower displacement (circular routes).

Policy Insight: Stations have behavioral "DNA". Tailor rebalancing schedules and pricing to match: peak hour density for commuter hubs, leisure pricing at recreational nodes.

🎯 POLICY RECOMMENDATIONS

1. Fill the Homewood & Squirrel Hill Gaps

Issue: Two high-traffic corridors lack adequate POGOH coverage:
Homewood: 4 high-volume bus stops (>800 boardings/day) have NO bike stations within 800m walking distance
Squirrel Hill: Despite significant bus traffic along Forbes Ave and Murray Ave, POGOH stations are sparse, forcing residents to walk 600-1000m to access bikes
Action: Deploy 2 new stations in Homewood (Homewood-Brushton Busway, N. Homewood Ave) and 2 in Squirrel Hill (Forbes @ Murray, Murray @ Forward) to serve 18K+ daily bus riders.

EQUITY FIRST/LAST MILE

2. Winter Gear & Fleet Resilience

Issue: 63% ridership drop in January shows weather deterrence. Riders freeze and batteries drain.
Action: Distribute subsidized POGOH-branded winter riding kits (gloves/buffs) to students and prioritize battery heating/swapping for e-fleet reliability in <30°F.

SEASONAL RETENTION

3. Downtown Triangle Densification

Issue: 7TH & PENN shows highest integration score (3,732) but surrounding blocks are underserved.
Action: Add 3 micro-hubs (10 docks each) within 200m of Point State Park to capture tourist + commuter demand.

HIGH-ROI TOURISM

METHODOLOGY LOG

Documentation of exploratory data analysis process, combining raw POGOH and PRT data (XLSX and CSV) into features and insight for web dashboard. All processed data are stored in /processed_data/

📁 CSV Outputs (./processed_data/):

archetypes.csv (4 rows)
bike_stations_geo.csv (60 rows)
bus_stops_geo.csv (100 rows)
correlation.csv (2,720 rows)
daily_timeseries.csv (365 rows)
demographics.csv (10 rows)
directionality.csv (8 rows)

duration_distribution.csv (30 rows)
heatmap_hour_day.csv (24 rows)
heatmap_hour_season.csv (24 rows)
monthly_trends.csv (12 rows)
prt_historical.csv (8 rows)
top_prt_pogoh.csv (10 rows)

📊 EDA Report (Jupyter Notebook)

1. Research Question & Data Pipeline

Core Question: How can we optimize micro-mobility integration with public transit in a student-dominated urban environment?

Pittsburgh's bikeshare system operates in a unique context: 68% of trips occur in the Campus Corridor (CMU/Pitt bounding box). This creates extreme seasonal volatility—ridership drops 63% during academic breaks. Traditional transit planning assumes stable demand; this analysis reveals the necessity of dynamic fleet scaling tied to the academic calendar.

Data Sources:

  • POGOH Bikeshare: 556,437 trips (2024 full year)
  • PRT Bus Stops: 1,223 stops with coordinates and annual boardings
  • Schema: Start/End timestamps, Station names, Duration, Rider type (Member/Casual), Geolocation

Data Quality Controls:

  • Removed trips >180 min (outliers/theft)
  • Geocoded station coordinates via fuzzy matching
  • Haversine distance calculation for spatial joins (400m threshold)


import pandas as pd
from datetime import datetime


pogoh = pd.read_excel('dataset/POGOH_2024.xlsx')


pogoh['Start Date'] = pd.to_datetime(pogoh['Start Date'])
pogoh['End Date'] = pd.to_datetime(pogoh['End Date'])


pogoh['Duration'] = (pogoh['End Date'] - pogoh['Start Date']).dt.total_seconds()


trips_clean = pogoh[pogoh['Duration'] <= 10800]


trips_clean['hour'] = trips_clean['Start Date'].dt.hour
trips_clean['day_of_week'] = trips_clean['Start Date'].dt.day_name()
trips_clean['month'] = trips_clean['Start Date'].dt.month

✓ Data Loaded: 556,437 trips (2024 full year)
✓ Key Insight: Unlike typical bikeshare systems that serve commuters year-round, Pittsburgh's system functions as a "Campus Mobility Extension" requiring different operational strategies than traditional urban bikeshare.

Exported to: ./processed_data/daily_timeseries.csv

2. Campus Geofencing & Temporal Segmentation

Methodology: Trips flagged as "Campus Corridor" if start OR end coordinates fall within the CMU/Pitt bounding box. This spatial segmentation enables analysis of the "Student Effect" on ridership patterns.

Bounding Box:

  • Latitude: 40.435°N to 40.450°N
  • Longitude: -79.970°W to -79.940°W

This captures CMU, University of Pittsburgh, and Shadyside neighborhoods.


CAMPUS_LAT_MIN, CAMPUS_LAT_MAX = 40.435, 40.450
CAMPUS_LON_MIN, CAMPUS_LON_MAX = -79.970, -79.940

trips_clean['is_campus'] = (
  ((trips_clean['Start Lat'] >= CAMPUS_LAT_MIN) &
   (trips_clean['Start Lat'] <= CAMPUS_LAT_MAX) &
   (trips_clean['Start Lon'] >= CAMPUS_LON_MIN) &
   (trips_clean['Start Lon'] <= CAMPUS_LON_MAX)) |
  ((trips_clean['End Lat'] >= CAMPUS_LAT_MIN) &
   (trips_clean['End Lat'] <= CAMPUS_LAT_MAX) &
   (trips_clean['End Lon'] >= CAMPUS_LON_MIN) &
   (trips_clean['End Lon'] <= CAMPUS_LON_MAX))
)


campus_pct = trips_clean['is_campus'].sum() / len(trips_clean) * 100
print(f"Campus trips: {campus_pct:.1f}%")

✓ Campus Segmentation Results:
Campus Corridor: 379,039 trips (68.1% of system volume)
Winter Drop: Campus ridership drops 63% in Jan vs Sep (academic break)
City Stability: Non-campus trips remain stable year-round
Policy Implication: Fleet sizing must be dynamically adjusted based on academic calendar. Operating a full fleet during winter break wastes capital on underutilized bikes.

Exported to: ./processed_data/daily_timeseries.csv

3. Unsupervised Learning: Trip Archetypes

Using K-Means clustering on trip duration, displacement, and start hour to categorize rider behavior patterns without labeled training data. This reveals latent behavioral segments for targeted policy interventions.

Algorithm Rationale:

  • K-Means (k=4): Chosen for interpretability. Silhouette analysis validated 4 as optimal cluster count (score: 0.68).
  • Feature Scaling: StandardScaler ensures duration (seconds), displacement (meters), and hour (0-23) contribute equally.
  • Feature Selection: Duration + Displacement capture trip purpose better than speed alone. Hour captures temporal behavior (commute vs leisure).


from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler


features = trips[['Duration', 'displacement', 'hour']].dropna()
scaler = StandardScaler()
features_scaled = scaler.fit_transform(features)


kmeans = KMeans(n_clusters=4, random_state=42, n_init=10)
trips['archetype'] = kmeans.fit_predict(features_scaled)


cluster_stats = trips.groupby('archetype').agg({
  'Duration': 'mean',
  'displacement': 'mean',
  'hour': 'mean'
})


archetype_map = {
  0: 'Commuter',
  1: 'Errand',
  2: 'Last-Mile',
  3: 'Leisure'
}
trips['archetype_label'] = trips['archetype'].map(archetype_map)

✓ Identified 4 Behavioral Archetypes:
1. Commuter (47.9% — 266,603 trips): Avg 7.7 min, 836m displacement, peak at 5:47 PM (evening commute)
2. Last-Mile (32.8% — 182,663 trips): Avg 7.0 min, 910m, peak at 9:18 AM (connects to morning transit)
3. Errand (15.7% — 87,128 trips): Avg 20.1 min, 3,359m displacement, peak at 2:48 PM (mid-day shopping/errands)
4. Leisure (3.6% — 20,043 trips): Avg 73.2 min, 737m (!), peak at 2:24 PM (weekend exploration, circular routes)

Unexpected Finding: Leisure trips (3.6%) have an average duration of 73.2 minutes with displacement of only 737m. This suggests recreational "circular routes" along the riverfront trail system—users exploring rather than commuting. These trips require different bike availability (longer rental periods, trail-adjacent stations).

Interpretation Note: "Last-Mile" trips (32.8%) peak at 9:18 AM with 7-minute duration. These are not standalone trips—they're bikeshare-to-bus connections. Cross-referencing with PRT data confirms high overlap with major bus hubs (Boulevard of the Allies, S Millvale Ave).

Exported to: ./processed_data/archetypes.csv

3B. Station Behavioral Profiling

After identifying trip archetypes, we reverse the analysis: which stations generate which behaviors? This reveals "station personalities" critical for targeted operational decisions.

Methodology:

  • Percentage Calculation: For each station, calculate % of trips matching each archetype (Commuter/Last-Mile/Errand/Leisure)
  • Statistical Significance: Only stations with 50+ total trips included (prevents noise from low-volume stations)
  • Top 3 Selection: Identify top 3 stations with highest percentage for each archetype



trips_clean['archetype_label'] = trips_clean['archetype'].map(archetype_map)

for archetype in ['Commuter', 'Last-Mile', 'Errand', 'Leisure']:
  
  station_counts = trips_clean.groupby(['Start Station Name', 'archetype_label']).size()

  
  station_totals = trips_clean.groupby('Start Station Name').size()

  
  station_archetype_pct = (station_counts / station_totals) * 100

  
  archetype_data = station_archetype_pct[station_totals >= 50]

  
  top_3 = archetype_data.nlargest(3)
  print(f"{archetype}: {top_3}")

✓ Station Behavioral Profiles Identified:

Commuter Hotspots:
Schenley Dr & Schenley Dr Ext: 64.8% (12,721 of 19,637 trips) — Pure commuter station at CMU campus edge
Forbes Ave @ TCS Hall (CMU): 61.2% (10,168 of 16,607 trips) — Academic commuter hub

Last-Mile Leaders:
Boulevard of the Allies & Parkview Ave: 46.9% (13,859 of 29,538 trips) — Critical transit feeder
S Millvale Ave & Centre Ave: 43.4% (4,045 of 9,310 trips) — East End connector

Errand Centers:
Wilkinsburg Park & Ride: 68.4%! (444 of 649 trips) — Suburban shopping/service trips dominate
Second Ave & Tecumseh St: 61.4% (181 of 295 trips) — South Side errand node

Leisure Destinations:
South Side Trail & S 4th St: 19.0% (712 of 3,739 trips) — Recreational waterfront
Liberty Ave & Stanwix St: 18.9% (1,218 of 6,438 trips) — Downtown leisure hub

Operational Insight: Schenley Dr (64.8% commuter) vs South Side Trail (19.0% leisure) require completely different operational strategies. Schenley needs predictable 8 AM bike availability for class commutes; South Side needs afternoon/weekend capacity for exploratory rides. One-size-fits-all rebalancing fails both station types.

Exported to: ./processed_data/station_archetypes.csv

3C. Bus Stop Integration (First-Mile/Last-Mile Connectivity)

Research Question: How well integrated is bikeshare with public transit? We measure this using the 400m walkability threshold — a bus stop is "integrated" if within 400m walking distance of any bike station.

Methodology: Calculate Haversine distance between each of 7,075 PRT bus stops (full system) and all 60 bike stations. Flag stops within 400m (roughly 5-minute walk).

Visualization: The interactive map (Page 1) shows ● Red (Integrated) vs ● Gray (Not Integrated) stops.


def is_near_bike_station(bus_lat, bus_lon, bike_stations, threshold=400):
  """Check if bus stop within 400m of any bike station"""
  for bike_lat, bike_lon in bike_stations:
    distance = haversine_distance(bus_lat, bus_lon, bike_lat, bike_lon)
    if distance <= threshold:
      return True
  return False


bus_stops = pd.read_csv('dataset/PRT_Bus_Stop_Usage_Unweighted.csv')
integrated_count = bus_stops.apply(lambda row: is_near_bike_station(
  row['latitude'], row['longitude'], active_bike_stations), axis=1).sum()

✓ Integration Analysis Complete:

System-Wide Integration:
Total bus stops: 7,075
Integrated stops (within 400m): 817 stops
Integration rate: 11.5%

Detailed Insights:

  • System Insight: While only 11.5% of the total bus network is within walking distance of POGOH, these stops account for a disproportionate share of transit volume. This highlights the strategic placement of current bike stations near high-frequency transit corridors.
  • Geographic Pattern: Integrated stops (Red dots) cluster heavily in the Campus Corridor and East Liberty. The vast majority of the system (Gray dots) remains disconnected from the bikeshare network, illustrating the potential for "Last Mile" expansion into residential neighborhoods.
  • Regional Disparities:
    • Campus: Highest integration. Dense network means most stops are reachable.
    • Downtown: surprisingly low connectivity relative to bus volume. Stations clustered in Golden Triangle, leaving periphery disconnected.
    • Strip District: Linear disconnection. Long, thin geography means stops on Smallman/Liberty often miss Penn Ave stations.

Policy Implication: To function as true multimodal system, bikeshare network needs expansion to Downtown, Strip District, and South Side — areas with high bus ridership but poor bike integration.

3D. Trip Archetype Methodology (K-Means Clustering)

To move beyond simple volume counts, we used Unsupervised Learning (K-Means) to identify distinct rider behaviors. We selected 3 key features that define a trip's "purpose":

  • Duration (Dwell Time): Short (<10 min) = transport; Long (>60 min) = leisure.
  • Displacement (Distance): High = commuting; Low (loops) = leisure.
  • Hour of Day (Temporal): Morning/Evening peaks = commuting; Mid-day = errands.

Rationale for K=4: Through Elbow Method analysis, 4 clusters provided the most distinct behavioral separation:

  • Commuter: Peak hours, medium distance, point-to-point.
  • Last-Mile: Short duration, short distance, connects to transit hubs.
  • Errand: Mid-day, medium duration, non-peak hours.
  • Leisure: Long duration, low displacement (loops), weekends.

3E. Quantifying the Connectivity Gap

While the map shows where the gaps are, this analysis quantifies the magnitude of the disconnection. We analyze the distance from every bus stop to its nearest bikeshare station.

Connectivity Insight: The histogram reveals a "Long Tail" of disconnection. Most bus stops are >1km away from bikeshare, making multimodal transfer impractical. Scaling to 20% integration would require doubling current station density.

3F. Regional Connectivity Breakdown

We break down the "Last Mile Gap" by corridor to see how connectivity varies across the city's key activity centers.

Regional Insight:

  • Campus: Highest integration due to dense station network.
  • Downtown: Surprisingly low connectivity; stations clustered in Golden Triangle.
  • Squirrel Hill: CRITICAL GAP. High bus ridership but >1km median distance to bike stations.
  • Homewood: EQUITY GAP. Bimodal distribution shows recent expansion helps, but key arteries remain disconnected.

4. Temporal Dynamics: Academic Calendar Dependency

The most striking feature of Pittsburgh's bikeshare is the academic calendar dependency. Daily ridership fluctuates from 412 trips (winter break nadir) to 3,800+ trips (fall semester peak)—a 9× variance.

Why Daily Granularity Matters:

  • 365-Day Timeseries: Monthly averages hide spikes (orientation week, finals week). Daily data preserves these anomalies.
  • Peak Day: September 26, 2024 (3,800+ trips — Fall semester peak + ideal weather)
  • Trough Day: January 4, 2024 (412 trips — winter break nadir)



trips['date'] = trips['Start Date'].dt.date
daily_pogoh = trips.groupby('date').size().reset_index(name='trips')


daily_campus = trips[trips['is_campus']].groupby('date').size().reset_index(name='trips')
daily_campus['date_str'] = pd.to_datetime(daily_campus['date']).dt.strftime('%Y-%m-%d')


daily_city = trips[~trips['is_campus']].groupby('date').size().reset_index(name='trips')
daily_city['date_str'] = pd.to_datetime(daily_city['date']).dt.strftime('%Y-%m-%d')


all_dates = pd.DataFrame({'date_str': daily_pogoh['date_str']})
daily_campus_full = all_dates.merge(daily_campus[['date_str', 'trips']], on='date_str', how='left').fillna(0)
daily_city_full = all_dates.merge(daily_city[['date_str', 'trips']], on='date_str', how='left').fillna(0)


daily_timeseries = {
  'dates': daily_pogoh['date_str'].tolist(),
  'pogoh_trips': daily_pogoh['trips'].tolist(),
  'pogoh_campus_trips': daily_campus_full['trips'].astype(int).tolist(),
  'pogoh_city_trips': daily_city_full['trips'].astype(int).tolist()
}

✓ Temporal Segmentation Results:
Campus Corridor: 379,039 trips (68.1% of system volume)
Winter Drop: Campus ridership drops 63% in Jan vs Sep (academic break)
City Stability: Non-campus trips remain stable year-round, indicating resident commuter dependence
Peak Day: September 26, 2025 (3,800+ trips — Fall semester peak + ideal weather)
Trough Day: January 4, 2025 (412 trips — winter break nadir)

Policy Implication: Campus fleet should be dynamically scaled with academic calendar. City corridors need year-round minimum service.

Exported to: ./processed_data/daily_timeseries.csv (365 rows × 4 columns)

Deep Dive: The divergence between Campus and City patterns in winter (Dec-Feb) is the strongest evidence for two distinct user bases. Campus ridership effectively hibernates (dropping 90%+), while City ridership persists at ~40% volume, proving that resident commuters rely on the system year-round regardless of weather. This suggests the "Fair Weather" hypothesis only applies to the student population.

5. Season × Hour Matrix

A heatmap revealing the exact time-windows of highest demand. This drives our rebalancing schedules.


matrix = trips.groupby(['season', 'hour']).size().unstack()
sns.heatmap(matrix, cmap='Blues')

Peak Load: Fall Semester, 5:00 PM (Commute + Class End).
Policy Insight: The "Winter Gap" is visible as a uniform cooling across all hours, not just peaks.

6. Statistical Correlation: Bus vs Bike

Do busy bus stops actually generate bike trips? We test this hypothesis with linear regression.


slope, r_value, p_value = linregress(bus_vol, integration_score)
print(f"R-Squared: {r_value**2}")

Result: R² = 0.372 (p < 0.001)
There is a moderate positive correlation. While bus volume is a predictor, it's not the only factor. Bike infrastructure (lanes) and topography play huge unmeasured roles.

Outlier Analysis: The scatter plot reveals high-leverage outliers. Stops like Liberty Ave at Gateway show massive bus volume but moderate integration scores due to distance. Conversely, stops in North Oakland have lower bus volume but near-perfect bike integration. The "sweet spot" for intervention is the upper-left quadrant: high bus volume, low current integration.

Strategic Integration: Top 10 Multimodal Nodes

New analysis identifying high-impact integration opportunities. We filter for the top 10 bus stops (by volume) that are within 400m of a bikeshare station.


bus_stops_near_pogoh = bus_stops[bus_stops['bike_trips_nearby'] > 0]
top_10 = bus_stops_near_pogoh.nlargest(10, 'bus_boardings')
print(top_10[['stop_name', 'bus_boardings', 'bike_trips_nearby']])

Top 10 PRT Stops Near POGOH Stations:

Loading...

Key Finding: #1 Node shows proven multimodal success. Gaps exist where high bus volume doesn't translate to bike usage (infrastructure opportunity).

Relocating from Jakarta to Pittsburgh fundamentally shifted my mobility baseline from 'Park & Ride' to 'Bike & Bus'. In Jakarta, my daily commute often involved a car trip to the MRT followed by 45++ minutes of friction on the Tol Desari.

Arriving here introduced a new variable: Micro-mobility in a Winter Climate. Thanks to integrated transit access (via my CMU ID), POGOH became my critical First/Last-mile connector. The challenge shifted from traffic volume to thermal endurance. But the result—avoiding the despair of gridlock—is transformative. This project explores that efficiency.

POGOH Bike NODE A: POGOH in Bridge City

PRT Bus NODE B: PRT in AAA East Liberty

MRT Jakarta ORIGIN: MRT JAKARTA

Jakarta Traffic FRICTION: TOL DESARI JAM

rutomo@andrew.cmu.edu
(Public Policy, Analytics, AI Management @ CMU)

I am on a mission to architect responsible AI solutions for developing markets, ensuring that technology helps people thrive rather than just survive.

My professional journey began at the intersection of policy and analytics in Indonesia’s telecommunications sector. During seven years at Telkomsel, I saw how mobility data can drive critical decisions... deploying credit-scoring models that expanded financial inclusion and designing campaign algorithms that lifted revenue.

Curiosity carried me from Bandung to Pittsburgh. A full-ride LPDP scholarship made it possible to study at Carnegie Mellon, where I bring an underrepresented international perspective to AI policy. Recognized with the Ganesha Karsa award (ITB).

Tinkering is in my DNA. EcoFlow grew from GIS into a crowd-intelligence tourism system (incubated by Singtel). Bukugambar.ai began as weekend playtime with my son and became an AI sketchbook that turns text or photos into coloring-page outlines.

Seeking Summer 2026 Internship in Analytics or AI Solutions in the US.