Settings

Theme

Show HN: Pcapsql – SQL interface for PCAP analysis

github.com

1 points by __padding a month ago · 1 comment · 2 min read

Reader

I was chatting with a colleague a while ago, and they wanted to run some SQL queries against a PCAP – aggregate by source IP, that sort of thing. They went looking and found PacketQ (DNS/ICMP only), Apache Drill's PCAP support (outer headers only), DuckDB has a PCAP reader community extension (same deal, limited to outer headers). You can always write a one-off script with scapy or gopacket, but it gets old fast.

We deal a lot with tunneled traffic, none of those tools really seemed very feature rich on that front, e.g. for VXLAN every tool just showed UDP/4789, not the encapsulated packets.

So as a weekend project that got a bit out of hand, I built what they were asking for. Protocol layers become tables, you query with SQL, it parses through tunnels:

      -- Traffic inside VXLAN tunnels
      SELECT ip4_to_string(src_ip) as src, ip4_to_string(dst_ip) as dst, COUNT(*)
      FROM ipv4
      WHERE tunnel_type = 'vxlan'
      GROUP BY 1, 2;

      -- Top talkers
      SELECT ip4_to_string(src_ip) as src, SUM(total_length) as bytes
      FROM ipv4
      GROUP BY 1
      ORDER BY bytes DESC
      LIMIT 10;
Also handles TLS decryption (SSLKEYLOGFILE), HTTP/2 frame parsing, GRE/MPLS/GTP, export to Parquet, and querying directly from S3.

Built with Rust on Apache Arrow and DataFusion.

https://github.com/mtottenh/pcapsql

__paddingOP a month ago

Some fun stuff you can do - RTT analysis, just using SQL :-) :

  -- RTT Analysis using TCP timestamp echo
  -- Measures actual network RTT by tracking when our ts_val gets echoed back
  -- For client-side captures: dst_port in (80,443) = outbound, src_port in (80,443) = inbound


  WITH outbound AS (
      -- Packets TO server (dst_port is well-known)
      SELECT
          t.frame_number,
          f.timestamp as send_ts,
          t.dst_port as server_port,
          t.ts_val,
          i.src_ip as local_ip,
          t.src_port as local_port,
          i.dst_ip as remote_ip
      FROM tcp t
      JOIN ipv4 i ON t.frame_number = i.frame_number
      JOIN frames f ON t.frame_number = f.frame_number
      WHERE t.ts_val IS NOT NULL
         AND t.dst_port IN (80, 443, 8080, 8443)
  ),
  inbound AS (
      -- Packets FROM server (src_port is well-known)
      SELECT
          t.frame_number,
          f.timestamp as recv_ts,
          t.src_port as server_port,
          t.ts_ecr,
          i.dst_ip as local_ip,
          t.dst_port as local_port,
          i.src_ip as remote_ip
      FROM tcp t
      JOIN ipv4 i ON t.frame_number = i.frame_number
      JOIN frames f ON t.frame_number = f.frame_number
      WHERE t.ts_ecr IS NOT NULL
        AND t.ts_ecr > 0
        AND t.src_port IN (80, 443, 8080, 8443)
  ),
  -- Match: find when our ts_val was echoed back by the server
  rtt_samples AS (
      SELECT
          o.frame_number as send_frame,
          MIN(i.frame_number) as recv_frame,
          o.server_port,
          o.send_ts,
          MIN(i.recv_ts) as recv_ts
      FROM outbound o
      JOIN inbound i
          ON o.local_ip = i.local_ip
          AND o.local_port = i.local_port
          AND o.remote_ip = i.remote_ip
          AND o.server_port = i.server_port
          AND i.ts_ecr = o.ts_val
          AND i.frame_number > o.frame_number
      GROUP BY o.frame_number, o.server_port, o.send_ts
  ),
  rtt_values AS (
      SELECT
          server_port,
          EXTRACT(EPOCH FROM (recv_ts - send_ts)) * 1000.0 as rtt_ms
      FROM rtt_samples
      WHERE recv_ts > send_ts
  )
  SELECT
      server_port,
      hdr_count(hdr_histogram(rtt_ms)) as samples,
      ROUND(hdr_min(hdr_histogram(rtt_ms)), 2) as min_ms,
      ROUND(hdr_percentile(hdr_histogram(rtt_ms), 0.50), 2) as p50_ms,
      ROUND(hdr_percentile(hdr_histogram(rtt_ms), 0.75), 2) as p75_ms,
      ROUND(hdr_percentile(hdr_histogram(rtt_ms), 0.95), 2) as p95_ms,
      ROUND(hdr_percentile(hdr_histogram(rtt_ms), 0.99), 2) as p99_ms,
      ROUND(hdr_max(hdr_histogram(rtt_ms)), 2) as max_ms,
      ROUND(hdr_mean(hdr_histogram(rtt_ms)), 2) as mean_ms
  FROM rtt_values
  WHERE rtt_ms > 0 AND rtt_ms < 30000
  GROUP BY server_port
  ORDER BY samples DESC;

Results from my local machine to a speedtest server:

  +-------------+---------+--------+--------+--------+--------+--------+--------+---------+
     | server_port | samples | min_ms | p50_ms | p75_ms | p95_ms | p99_ms | max_ms | mean_ms |
     +=======================================================================================+
     | 443         | 315     | 1.0    | 75.0   | 263.0  | 349.0  | 3007.0 | 3007.0 | 177.27  |
     |-------------+---------+--------+--------+--------+--------+--------+--------+---------|
     | 80          | 6       | 70.0   | 71.0   | 72.0   | 72.0   | 72.0   | 72.0   | 71.17   |
     +-------------+---------+--------+--------+--------+--------+--------+--------+---------+

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection