The Global Airport Observations dataset is a Parquet-formatted, 1.75B-record collection of weather observations from 1940 up to today. The 655 MB Parquet file containing the 2024 data collected observation from thousands of stations in 14 countries. Often, stations reported observations hourly if not more frequently.
The data is hosted with one file per year on Cloudflare on behalf of the Source Cooperative. The data was originally sourced from the Iowa Environmental Mesonet.
In this post, I'll analyse this dataset.
My Workstation
I'm using a 5.7 GHz AMD Ryzen 9 9950X CPU. It has 16 cores and 32 threads and 1.2 MB of L1, 16 MB of L2 and 64 MB of L3 cache. It has a liquid cooler attached and is housed in a spacious, full-sized Cooler Master HAF 700 computer case.
The system has 96 GB of DDR5 RAM clocked at 4,800 MT/s and a 5th-generation, Crucial T700 4 TB NVMe M.2 SSD which can read at speeds up to 12,400 MB/s. There is a heatsink on the SSD to help keep its temperature down. This is my system's C drive.
The system is powered by a 1,200-watt, fully modular Corsair Power Supply and is sat on an ASRock X870E Nova 90 Motherboard.
I'm running Ubuntu 24 LTS via Microsoft's Ubuntu for Windows on Windows 11 Pro. In case you're wondering why I don't run a Linux-based desktop as my primary work environment, I'm still using an Nvidia GTX 1080 GPU which has better driver support on Windows and ArcGIS Pro only supports Windows natively.
Installing Prerequisites
I'll use jq to help analyse the data in this post.
$ sudo apt update $ sudo apt install \ jq
I'll use DuckDB, along with its H3, JSON, Lindel, Parquet and Spatial extensions in this post.
$ cd ~ $ wget -c https://github.com/duckdb/duckdb/releases/download/v1.5.1/duckdb_cli-linux-amd64.zip $ unzip -j duckdb_cli-linux-amd64.zip $ chmod +x duckdb $ ~/duckdb
INSTALL h3 FROM community; INSTALL lindel FROM community; INSTALL json; INSTALL parquet; INSTALL spatial;
I'll set up DuckDB to load every installed extension each time it launches.
.timer on .width 180 LOAD h3; LOAD lindel; LOAD json; LOAD parquet; LOAD spatial;
The maps in this post were rendered with QGIS version 4.0.1. QGIS is a desktop application that runs on Windows, macOS and Linux. The application has grown in popularity in recent years and has ~15M application launches from users all around the world each month.
I used QGIS' HCMGIS plugin to add a satellite imagery basemaps from Bing and Esri to this post.
Friendly Field Names
I'll import the 2024 data into a DuckDB table. I'll structure the columns so that they explain themselves better than the original, abbreviated column names.
CREATE OR REPLACE TABLE weather AS SELECT location: { altitude: alti, elevation: elevation, latitude: latitude, longitude: longitude, country: country, county: county, name: name, state: state, station: station, wfo: wfo, }, time_: { tzname: tzname, valid: valid }, temperature: { celsius: tmpc, fahrenheit: tmpf, }, visibility: vsby, wind: { gust: gust, direction: drct, knots: sknt }, dewpoint: { celsius: dwpc, fahrenheit: dwpf, }, precipitation: { one_hour_inches: p01i, one_hour_metres: p01m }, mean_sea_level_pressure: mslp, relative_humidity: relh FROM 'https://data.source.coop/dynamical/asos-parquet/year=2024/data.parquet';
This is the last observation made by Calgary International Airport in 2024.
$ echo "FROM weather WHERE location.name = 'CALGARY INTL' ORDER BY time_.valid DESC LIMIT 1" \ | ~/duckdb -json asos.duckdb \ | jq -S .
[ { "dewpoint": { "celsius": -11.0, "fahrenheit": 12.2 }, "location": { "altitude": 30.15, "country": "CA", "county": null, "elevation": 1084.0, "latitude": 51.1139, "longitude": -114.0203, "name": "CALGARY INTL", "state": "CA_AB", "station": "CYYC", "wfo": null }, "mean_sea_level_pressure": 1028.3, "precipitation": { "one_hour_inches": 0.0, "one_hour_metres": 0.0 }, "relative_humidity": 85.4, "temperature": { "celsius": -9.0, "fahrenheit": 15.8 }, "time_": { "tzname": "America/Edmonton", "valid": "2024-12-31 23:48:00+00" }, "visibility": 1.5, "wind": { "direction": 60.0, "gust": null, "knots": 7.0 } } ]
Below is a breakdown of unique values and NULL coverage across each column.
SELECT column_name, column_type, null_percentage, approx_unique, min, max FROM (SUMMARIZE SELECT * EXCLUDE(dewpoint, location, precipitation, temperature, time_, wind), dewpoint_celsius: dewpoint.celsius, dewpoint_fahrenheit: dewpoint.fahrenheit, location.*, precipitation.*, temperature_celsius: temperature.celsius, temperature_fahrenheit: temperature.fahrenheit, time_.*, wind.* FROM weather);
┌─────────────────────────┬──────────────────────────┬─────────────────┬───────────────┬────────────────────────┬────────────────────────┐ │ column_name │ column_type │ null_percentage │ approx_unique │ min │ max │ │ varchar │ varchar │ decimal(9,2) │ int64 │ varchar │ varchar │ ├─────────────────────────┼──────────────────────────┼─────────────────┼───────────────┼────────────────────────┼────────────────────────┤ │ visibility │ DOUBLE │ 4.86 │ 231 │ 0.0 │ 1010009.0 │ │ mean_sea_level_pressure │ DOUBLE │ 72.95 │ 1947 │ 889.0 │ 1131.3 │ │ relative_humidity │ DOUBLE │ 0.55 │ 7241 │ 0.52 │ 100.42 │ │ dewpoint_celsius │ DOUBLE │ 0.46 │ 794 │ -98.0 │ 37.11 │ │ dewpoint_fahrenheit │ DOUBLE │ 0.46 │ 806 │ -144.4 │ 98.8 │ │ altitude │ DOUBLE │ 4.16 │ 866 │ 0.0 │ 295.27 │ │ elevation │ DOUBLE │ 0.00 │ 2058 │ -499.14362 │ 3792.0 │ │ latitude │ DOUBLE │ 0.00 │ 3778 │ -45.0211 │ 82.5178 │ │ longitude │ DOUBLE │ 0.00 │ 3485 │ -177.3756 │ 177.5667 │ │ country │ VARCHAR │ 0.00 │ 11 │ AU │ ZA │ │ county │ VARCHAR │ 26.09 │ 1294 │ Accomack │ Zapata │ │ name │ VARCHAR │ 0.00 │ 4381 │ ABBOTSFORD │ Zihuatanejo │ │ state │ VARCHAR │ 0.00 │ 76 │ AK │ ZA │ │ station │ VARCHAR │ 0.00 │ 5698 │ 00U │ ZZV │ │ wfo │ VARCHAR │ 26.09 │ 126 │ ABQ │ VEF │ │ one_hour_inches │ DOUBLE │ 0.25 │ 486 │ 0.0 │ 75.69 │ │ one_hour_metres │ DOUBLE │ 0.25 │ 467 │ 0.0 │ 1922.53 │ │ temperature_celsius │ DOUBLE │ 0.00 │ 888 │ -66.89 │ 242.0 │ │ temperature_fahrenheit │ DOUBLE │ 0.00 │ 1019 │ -88.4 │ 467.6 │ │ tzname │ VARCHAR │ 0.00 │ 111 │ Africa/Johannesburg │ Pacific/Norfolk │ │ valid │ TIMESTAMP WITH TIME ZONE │ 0.00 │ 592837 │ 2024-01-01 02:00:00+02 │ 2025-01-01 01:59:00+02 │ │ gust │ DOUBLE │ 86.42 │ 299 │ 0.0 │ 402.0 │ │ direction │ DOUBLE │ 4.54 │ 229 │ 0.0 │ 360.0 │ │ knots │ DOUBLE │ 1.30 │ 562 │ 0.0 │ 1796.11 │ └─────────────────────────┴──────────────────────────┴─────────────────┴───────────────┴────────────────────────┴────────────────────────┘
Canadian Data
The 2024 dataset has observations from 525 weather stations in Canada.
SELECT location.name, COUNT(*) FROM weather WHERE location.country = 'CA' GROUP BY 1 ORDER BY 2 DESC;
┌──────────────────────┬──────────────┐ │ name │ count_star() │ │ varchar │ int64 │ ├──────────────────────┼──────────────┤ │ ESQUIMALT HARBOUR │ 30370 │ │ INUKJUAK ARPT │ 25546 │ │ Dease Lake │ 24046 │ │ SCHEFFERVILLE A │ 22918 │ │ KUUJJUARAPIK ARPT │ 21664 │ │ PUVIRNITUQ │ 21509 │ │ BELLA BELLA CAMPBELL │ 21413 │ │ MATAGAMI │ 21144 │ │ CHAPAIS │ 21119 │ │ PORT HAWKESBURY │ 20741 │ │ REVELSTOKE AIRPO │ 20329 │ │ GJOA HAVEN │ 20181 │ │ MUSKOKA │ 20044 │ │ Peawanuck │ 20001 │ │ SABLE ISLAND │ 19836 │ │ LA GRANDE IV ARP │ 19693 │ │ FORT SEVERN │ 19244 │ │ PETERBOROUGH │ 19207 │ │ WATERLOO │ 18946 │ │ PICKLE LAKE A │ 18918 │ │ · │ · │ │ · │ · │ │ · │ · │ │ UMIUJAQ │ 2413 │ │ PEORIA AGDM ALTA │ 2407 │ │ OLDS AGDM ALTA │ 2290 │ │ FORT RESOLUTION │ 2271 │ │ LOWER CARP LAKE NWT │ 2230 │ │ MORRIN AGDM ALTA │ 2198 │ │ FOREMOST AGDM ALTA │ 2196 │ │ FORT LIARD │ 2161 │ │ GRISE FIORD AIRPORT │ 2116 │ │ DE WEMINDJI │ 2074 │ │ ENCHANT AGDM ALTA │ 2074 │ │ EASTMAIN RIVER │ 2066 │ │ BARNWELL AGDM ALTA │ 2051 │ │ HOLMAN │ 1841 │ │ AKULIVIK │ 1793 │ │ OLD CROW │ 1767 │ │ FORT NORMAN AIRPORT │ 1688 │ │ ST. PAUL AGDM ALTA │ 1572 │ │ SACHS HARBOUR │ 1526 │ │ WASKAGANISH AIRPORT │ 1144 │ └──────────────────────┴──────────────┘
Below, I'll produce a Parquet file with the locations of each station.
COPY ( SELECT DISTINCT geometry: ST_ASWKB(ST_POINT(location.longitude, location.latitude)), location.name FROM weather WHERE location.country = 'CA' ) TO 'asos.canada.parquet' ( FORMAT 'PARQUET', CODEC 'ZSTD', COMPRESSION_LEVEL 22, ROW_GROUP_SIZE 15000);
The whole of Canada looks well-covered.
This shows the density of observation stations in Southern Alberta and BC.
Observation Frequency
Calgary Airport looks to submit at least one observation per hour, though several hours have even more.
WITH a AS ( SELECT yyyy_ww: strftime(time_.valid, '%Y-%W'), day_num: strftime(time_.valid, '%u (%a)'), rec_num: COUNT(*) FROM weather WHERE location.name = 'CALGARY INTL' GROUP BY 1, 2 ORDER BY 1, 2 ) PIVOT a ON day_num USING SUM(rec_num) GROUP BY yyyy_ww ORDER BY yyyy_ww;
┌─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────────┐ │ yyyy_ww │ 1 (Mon) │ 2 (Tue) │ 3 (Wed) │ 4 (Thu) │ 5 (Fri) │ 6 (Sat) │ 7 (Sun) │ │ varchar │ int128 │ int128 │ int128 │ int128 │ int128 │ int128 │ int128 │ ├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤ │ 2024-01 │ 27 │ 25 │ 49 │ 32 │ 28 │ 25 │ 32 │ │ 2024-02 │ 25 │ 24 │ 45 │ 40 │ 27 │ 28 │ 26 │ │ 2024-03 │ 25 │ 28 │ 39 │ 34 │ 25 │ 24 │ 26 │ │ 2024-04 │ 28 │ 27 │ 25 │ 28 │ 26 │ 25 │ 27 │ │ 2024-05 │ 28 │ 25 │ 25 │ 26 │ 25 │ 34 │ 34 │ │ 2024-06 │ 30 │ 37 │ 59 │ 47 │ 29 │ 24 │ 26 │ │ 2024-07 │ 25 │ 35 │ 24 │ 34 │ 24 │ 24 │ 25 │ │ 2024-08 │ 25 │ 26 │ 25 │ 26 │ 24 │ 36 │ 24 │ │ 2024-09 │ 41 │ 30 │ 24 │ 25 │ 46 │ 48 │ 31 │ │ 2024-10 │ 27 │ 25 │ 33 │ 26 │ 24 │ 24 │ 27 │ │ 2024-11 │ 26 │ 25 │ 25 │ 26 │ 26 │ 25 │ 26 │ │ 2024-12 │ 25 │ 25 │ 44 │ 47 │ 39 │ 40 │ 33 │ │ 2024-13 │ 27 │ 29 │ 25 │ 29 │ 33 │ 36 │ 24 │ │ 2024-14 │ 27 │ 25 │ 26 │ 44 │ 54 │ 31 │ 27 │ │ 2024-15 │ 27 │ 24 │ 30 │ 24 │ 26 │ 27 │ 26 │ │ 2024-16 │ 24 │ 46 │ 36 │ 25 │ 26 │ 25 │ 27 │ │ 2024-17 │ 24 │ 25 │ 24 │ 25 │ 25 │ 25 │ 24 │ │ 2024-18 │ 24 │ 49 │ 47 │ 43 │ 29 │ 26 │ 24 │ │ 2024-19 │ 30 │ 27 │ 25 │ 24 │ 24 │ 24 │ 28 │ │ 2024-20 │ 25 │ 29 │ 24 │ 32 │ 43 │ 28 │ 34 │ │ · │ · │ · │ · │ · │ · │ · │ · │ │ · │ · │ · │ · │ · │ · │ · │ · │ │ · │ · │ · │ · │ · │ · │ · │ · │ │ 2024-35 │ 24 │ 24 │ 35 │ 25 │ 26 │ 24 │ 24 │ │ 2024-36 │ 24 │ 27 │ 25 │ 24 │ 24 │ 24 │ 24 │ │ 2024-37 │ 24 │ 35 │ 34 │ 42 │ 25 │ 26 │ 27 │ │ 2024-38 │ 24 │ 24 │ 24 │ 26 │ 47 │ 25 │ 24 │ │ 2024-39 │ 24 │ 24 │ 24 │ 35 │ 24 │ 22 │ 26 │ │ 2024-40 │ 28 │ 24 │ 29 │ 26 │ 25 │ 25 │ 24 │ │ 2024-41 │ 24 │ 24 │ 26 │ 26 │ 25 │ 24 │ 25 │ │ 2024-42 │ 24 │ 24 │ 24 │ 26 │ 24 │ 27 │ 24 │ │ 2024-43 │ 31 │ 42 │ 25 │ 26 │ 26 │ 27 │ 25 │ │ 2024-44 │ 25 │ 32 │ 26 │ 26 │ 41 │ 25 │ 30 │ │ 2024-45 │ 27 │ 32 │ 26 │ 26 │ 24 │ 27 │ 27 │ │ 2024-46 │ 31 │ 25 │ 26 │ 24 │ 36 │ 29 │ 28 │ │ 2024-47 │ 34 │ 38 │ 47 │ 43 │ 45 │ 45 │ 26 │ │ 2024-48 │ 25 │ 28 │ 26 │ 56 │ 45 │ 34 │ 26 │ │ 2024-49 │ 25 │ 25 │ 44 │ 33 │ 25 │ 25 │ 31 │ │ 2024-50 │ 24 │ 24 │ 27 │ 37 │ 26 │ 27 │ 27 │ │ 2024-51 │ 38 │ 51 │ 38 │ 24 │ 26 │ 31 │ 27 │ │ 2024-52 │ 27 │ 25 │ 29 │ 26 │ 25 │ 23 │ 24 │ │ 2024-53 │ 37 │ 38 │ NULL │ NULL │ NULL │ NULL │ NULL │ │ 2025-00 │ NULL │ NULL │ 5 │ NULL │ NULL │ NULL │ NULL │ └─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────────┘
Below is one such day with more than one observation per hour.
SELECT time_.valid, temperature, visibility, wind FROM weather WHERE DATE(time_.valid) = '2024-01-11'::DATE AND location.name = 'CALGARY INTL' ORDER BY time_.valid;
┌──────────────────────────┬───────────────────────────────────────────┬────────────┬─────────────────────────────────────────────────────┐
│ valid │ temperature │ visibility │ wind │
│ timestamp with time zone │ struct(celsius double, fahrenheit double) │ double │ struct(gust double, direction double, knots double) │
├──────────────────────────┼───────────────────────────────────────────┼────────────┼─────────────────────────────────────────────────────┤
│ 2024-01-11 00:00:00+02 │ {'celsius': -21.0, 'fahrenheit': -5.8} │ 3.0 │ {'gust': 21.0, 'direction': 360.0, 'knots': 15.0} │
│ 2024-01-11 00:54:00+02 │ {'celsius': -22.0, 'fahrenheit': -7.6} │ 2.25 │ {'gust': 18.0, 'direction': 20.0, 'knots': 13.0} │
│ 2024-01-11 01:00:00+02 │ {'celsius': -22.0, 'fahrenheit': -7.6} │ 1.5 │ {'gust': 19.0, 'direction': 30.0, 'knots': 14.0} │
│ 2024-01-11 02:00:00+02 │ {'celsius': -22.0, 'fahrenheit': -7.6} │ 2.5 │ {'gust': 17.0, 'direction': 30.0, 'knots': 12.0} │
│ 2024-01-11 03:00:00+02 │ {'celsius': -23.0, 'fahrenheit': -9.4} │ 2.5 │ {'gust': 19.0, 'direction': 20.0, 'knots': 12.0} │
│ 2024-01-11 03:47:00+02 │ {'celsius': -23.0, 'fahrenheit': -9.4} │ 8.0 │ {'gust': NULL, 'direction': 20.0, 'knots': 16.0} │
│ 2024-01-11 04:00:00+02 │ {'celsius': -23.0, 'fahrenheit': -9.4} │ 8.0 │ {'gust': 22.0, 'direction': 20.0, 'knots': 15.0} │
│ 2024-01-11 04:34:00+02 │ {'celsius': -23.0, 'fahrenheit': -9.4} │ 7.0 │ {'gust': 20.0, 'direction': 40.0, 'knots': 12.0} │
│ 2024-01-11 05:00:00+02 │ {'celsius': -23.0, 'fahrenheit': -9.4} │ 7.0 │ {'gust': NULL, 'direction': 30.0, 'knots': 13.0} │
│ 2024-01-11 06:00:00+02 │ {'celsius': -24.0, 'fahrenheit': -11.2} │ 7.0 │ {'gust': 19.0, 'direction': 30.0, 'knots': 12.0} │
│ 2024-01-11 06:34:00+02 │ {'celsius': -24.0, 'fahrenheit': -11.2} │ 5.0 │ {'gust': 20.0, 'direction': 30.0, 'knots': 13.0} │
│ 2024-01-11 07:00:00+02 │ {'celsius': -24.0, 'fahrenheit': -11.2} │ 4.0 │ {'gust': 17.0, 'direction': 40.0, 'knots': 12.0} │
│ 2024-01-11 08:00:00+02 │ {'celsius': -24.0, 'fahrenheit': -11.2} │ 4.0 │ {'gust': NULL, 'direction': 50.0, 'knots': 15.0} │
│ 2024-01-11 08:24:00+02 │ {'celsius': -24.0, 'fahrenheit': -11.2} │ 2.0 │ {'gust': NULL, 'direction': 50.0, 'knots': 13.0} │
│ 2024-01-11 09:00:00+02 │ {'celsius': -25.0, 'fahrenheit': -13.0} │ 1.75 │ {'gust': NULL, 'direction': 50.0, 'knots': 17.0} │
│ 2024-01-11 10:00:00+02 │ {'celsius': -25.0, 'fahrenheit': -13.0} │ 2.5 │ {'gust': 23.0, 'direction': 20.0, 'knots': 14.0} │
│ 2024-01-11 11:00:00+02 │ {'celsius': -26.0, 'fahrenheit': -14.8} │ 2.5 │ {'gust': NULL, 'direction': 40.0, 'knots': 11.0} │
│ 2024-01-11 12:00:00+02 │ {'celsius': -26.0, 'fahrenheit': -14.8} │ 2.5 │ {'gust': NULL, 'direction': 50.0, 'knots': 13.0} │
│ 2024-01-11 13:00:00+02 │ {'celsius': -27.0, 'fahrenheit': -16.6} │ 2.5 │ {'gust': 23.0, 'direction': 10.0, 'knots': 14.0} │
│ 2024-01-11 14:00:00+02 │ {'celsius': -28.0, 'fahrenheit': -18.4} │ 3.0 │ {'gust': NULL, 'direction': 10.0, 'knots': 19.0} │
│ 2024-01-11 15:00:00+02 │ {'celsius': -29.0, 'fahrenheit': -20.2} │ 3.0 │ {'gust': 21.0, 'direction': 10.0, 'knots': 15.0} │
│ 2024-01-11 16:00:00+02 │ {'celsius': -29.0, 'fahrenheit': -20.2} │ 3.0 │ {'gust': 25.0, 'direction': 10.0, 'knots': 20.0} │
│ 2024-01-11 16:19:00+02 │ {'celsius': -29.0, 'fahrenheit': -20.2} │ 3.0 │ {'gust': 22.0, 'direction': 10.0, 'knots': 16.0} │
│ 2024-01-11 16:48:00+02 │ {'celsius': -29.0, 'fahrenheit': -20.2} │ 1.75 │ {'gust': 22.0, 'direction': 360.0, 'knots': 14.0} │
│ 2024-01-11 17:00:00+02 │ {'celsius': -29.0, 'fahrenheit': -20.2} │ 1.75 │ {'gust': 16.0, 'direction': 20.0, 'knots': 9.0} │
│ 2024-01-11 17:17:00+02 │ {'celsius': -29.0, 'fahrenheit': -20.2} │ 1.0 │ {'gust': NULL, 'direction': 10.0, 'knots': 13.0} │
│ 2024-01-11 17:51:00+02 │ {'celsius': -29.0, 'fahrenheit': -20.2} │ 1.75 │ {'gust': 18.0, 'direction': 10.0, 'knots': 12.0} │
│ 2024-01-11 18:00:00+02 │ {'celsius': -29.0, 'fahrenheit': -20.2} │ 3.0 │ {'gust': NULL, 'direction': 10.0, 'knots': 14.0} │
│ 2024-01-11 19:00:00+02 │ {'celsius': -30.0, 'fahrenheit': -22.0} │ 1.75 │ {'gust': 15.0, 'direction': 20.0, 'knots': 9.0} │
│ 2024-01-11 19:50:00+02 │ {'celsius': -30.0, 'fahrenheit': -22.0} │ 4.0 │ {'gust': NULL, 'direction': 40.0, 'knots': 11.0} │
│ 2024-01-11 20:00:00+02 │ {'celsius': -29.0, 'fahrenheit': -20.2} │ 4.0 │ {'gust': NULL, 'direction': 30.0, 'knots': 11.0} │
│ 2024-01-11 20:18:00+02 │ {'celsius': -29.0, 'fahrenheit': -20.2} │ 1.75 │ {'gust': NULL, 'direction': 50.0, 'knots': 14.0} │
│ 2024-01-11 20:31:00+02 │ {'celsius': -29.0, 'fahrenheit': -20.2} │ 1.0 │ {'gust': 16.0, 'direction': 30.0, 'knots': 8.0} │
│ 2024-01-11 20:45:00+02 │ {'celsius': -29.0, 'fahrenheit': -20.2} │ 4.0 │ {'gust': NULL, 'direction': 40.0, 'knots': 11.0} │
│ 2024-01-11 21:00:00+02 │ {'celsius': -30.0, 'fahrenheit': -22.0} │ 5.0 │ {'gust': 15.0, 'direction': 20.0, 'knots': 10.0} │
│ 2024-01-11 21:48:00+02 │ {'celsius': -30.0, 'fahrenheit': -22.0} │ 2.5 │ {'gust': 16.0, 'direction': 10.0, 'knots': 11.0} │
│ 2024-01-11 22:00:00+02 │ {'celsius': -30.0, 'fahrenheit': -22.0} │ 2.5 │ {'gust': NULL, 'direction': 10.0, 'knots': 14.0} │
│ 2024-01-11 22:08:00+02 │ {'celsius': -30.0, 'fahrenheit': -22.0} │ 5.0 │ {'gust': NULL, 'direction': 10.0, 'knots': 14.0} │
│ 2024-01-11 22:49:00+02 │ {'celsius': -30.0, 'fahrenheit': -22.0} │ 9.0 │ {'gust': 16.0, 'direction': 350.0, 'knots': 11.0} │
│ 2024-01-11 23:00:00+02 │ {'celsius': -30.0, 'fahrenheit': -22.0} │ 9.0 │ {'gust': 16.0, 'direction': 10.0, 'knots': 10.0} │
└──────────────────────────┴───────────────────────────────────────────┴────────────┴─────────────────────────────────────────────────────┘
Global Coverage
There were observations from 14 countries in 2024.
SELECT DISTINCT location.country FROM weather ORDER BY 1;
┌─────────┐ │ country │ │ varchar │ ├─────────┤ │ AU │ │ BR │ │ CA │ │ CN │ │ DE │ │ FR │ │ GB │ │ IN │ │ JP │ │ KR │ │ MX │ │ RU │ │ US │ │ ZA │ └─────────┘
Below is a heatmap of their locations with the brightest hexagons representing the greatest number of observations.
CREATE OR REPLACE TABLE h3_stats AS SELECT hexagon: H3_LATLNG_TO_CELL( location.latitude, location.longitude, 3), num_recs: COUNT(*) FROM weather GROUP BY 1; COPY ( SELECT geometry: ST_ASWKB(H3_CELL_TO_BOUNDARY_WKT(hexagon)::geometry), num_recs FROM h3_stats ) TO 'asos.h3s.parquet' ( FORMAT 'PARQUET', CODEC 'ZSTD', COMPRESSION_LEVEL 22, ROW_GROUP_SIZE 15000);
Observations by Decade
I'll get the number of records per year in this dataset.
CREATE OR REPLACE TABLE yearly_counts ( year INT, num_obs BIGINT );
$ for YEAR in {1940..2026}; do echo $YEAR echo "INSERT INTO yearly_counts (year, num_obs) SELECT $YEAR, COUNT(*) FROM 'https://data.source.coop/dynamical/asos-parquet/year=$YEAR/data.parquet'" \ | ~/duckdb asos.duckdb done
There have been more than 1.75B observations across this entire dataset.
SELECT SUM(num_obs) FROM yearly_counts;
┌────────────────┐ │ sum(num_obs) │ │ int128 │ ├────────────────┤ │ 1750088324 │ │ (1.75 billion) │ └────────────────┘
Below are the largest number of observations seen within any one year within any given decade.
SELECT decade: FLOOR(year / 10)::INT * 10, MAX(num_obs) FROM yearly_counts GROUP BY 1 ORDER BY 1;
┌────────┬──────────────┐ │ decade │ max(num_obs) │ │ int32 │ int64 │ ├────────┼──────────────┤ │ 1940 │ 5284124 │ │ 1950 │ 6046249 │ │ 1960 │ 5789299 │ │ 1970 │ 10288125 │ │ 1980 │ 12507342 │ │ 1990 │ 24676905 │ │ 2000 │ 49115404 │ │ 2010 │ 58220234 │ │ 2020 │ 59742202 │ └────────┴──────────────┘
I checked the latest forecast in this year's dataset as it was valid up to about 30 minutes before I ran this command.
SELECT MAX(valid) FROM 'https://data.source.coop/dynamical/asos-parquet/year=2026/data.parquet'
┌──────────────────────────┐
│ max("valid") │
│ timestamp with time zone │
├──────────────────────────┤
│ 2026-05-26 20:40:00+03 │
└──────────────────────────┘
Thank you for taking the time to read this post. I offer both consulting and hands-on development services to clients in North America and Europe. If you'd like to discuss how my offerings can help your business please contact me via LinkedIn.