Waking up at 3 AM because of an outage is already tough. There’s a lot of pressure to find and fix issues quickly, but what if you also have to deal with complex log data that has hundreds of fields? In these scenarios, writing efficient diagnostic queries is a lot to ask, even for experienced operations engineers. You’re sleep-deprived, working as fast as possible, and trying to make sense of complex schema.
This is the problem we set out to solve when we created a Claude skill to help SREs and other practitioners better understand their CDN logs. The goal: let users ask questions of their full-fidelity CDN data in plain English and get correct SQL, accurate results, and helpful visualizations without needing to know schema details, query optimization patterns, or domain-specific knowledge about Hydrolix.
Here’s how we built it, why it works, and what we learned along the way.
Before we get started, here’s an example of a real-world use case that this Claude skill can be used for. Here’s the prompt:
Review CDN performance for the past 12 hours based on the akamai.logs table. Create visualizations to highlight any issues I need to address.
And here’s the response. The following visualization shows hourly bandwidth and key findings.

At a glance, a practitioner can scan these key findings to understand critical information about cache hit ratio, which resources can be cached and optimized, and which services are having problems. It also provides a helpful dashboard with metrics, too, as the next image shows. The metrics include total requests, cache hit ratio, error rate, and P99 latency—all metrics that practitioners need for understanding how their CDN infrastructure is performing.

The Real Challenge Isn’t Writing SQL—It’s Writing the Right SQL
TrafficPeak customers have petabytes of CDN logs stored in Hydrolix. A single table can contain billions of rows with hundreds of fields covering everything from cache behavior to compression ratios to streaming quality metrics. Maximizing insights from this data traditionally requires specialized knowledge, including:
- Schema expertise: Understanding which of 100+ fields answers your question. For example, do you need
cacheStatus,cacheRefreshSrc, orcacheTTL? - Summary table mechanics: Knowing that
cnt_allis an ALIAS column that expands tocountMerge(count())and should never be wrapped insum()orcount() - Performance optimization: Writing queries that include time filters and appropriate LIMIT clauses so they complete in seconds rather than timing out.
- Table selection logic: Deciding when to use detailed raw logs versus a summary table.
Skills solve this problem by providing specific instructions for AI to answer your questions.
What Is a Skill? Encoding Expert Knowledge for AI
A skill is a knowledge package that guides an AI through a specific domain. Think of it as encoding the expertise of your best engineer into a format that an AI can follow consistently, every time.
The CDN Analytics Skill has three core components:
1. Orchestration logic: A 10-step workflow that guides Claude from understanding a user’s question through query generation, execution, visualization, and error recovery.
2. Comprehensive schema documentation: Detailed information on tables and fields.
3. Query template library: 30+ proven query patterns across 10 analysis categories, showing both correct usage and common mistakes to avoid.
The skill integrates with Hydrolix through MCP (Model Context Protocol), which connects Claude directly to the query engine. Before generating any SQL, Claude reads the relevant reference documentation. This isn’t optional. The workflow requires it. This simple requirement of reading before writing the query prevents most of the common mistakes we saw in early testing.
The 10-Step Workflow: How Questions Become Answers
Every user question goes through the same systematic process. Let’s walk through it.
Step 1-2: Understanding and Clarifying
When you ask “what’s my cache hit ratio?”, the skill first categorizes your question. Is this cache performance analysis? Error analysis? Cost attribution? There are 10 major categories, and classification determines which schema documentation and query templates Claude will reference.
Next comes clarification. The skill asks targeted questions to fill in gaps such as:
- Time range: Should the range be “last 24 hours” (default), “last week”, or “January 15-20”?
- Scope: Do you need specific hostnames, geographic regions, or all traffic?
- Grouping: Should the data be grouped by time buckets, by country, by URL, or should everything be aggregated?
This upfront clarification prevents the back-and-forth of “I need more information” that makes AI interactions frustrating. Claude asks everything it needs in one go, then proceeds.
Step 3: Reading the Schema (The Critical Step)
Before generating SQL, Claude loads detailed documentation for the relevant tables. This documentation includes far more than column names and data types. It explains when to use each field, what values mean, and how fields relate to each other.
Consider a 1-minute summary table for CDN logs. It may contain billions rows of 1-minute pre-aggregated rollups. This can be the workhorse table for most analytics queries. But using it correctly requires understanding how summary tables work in Hydrolix.
Summary tables store intermediate aggregation states as AggregateFunction types. This enables “aggregates of aggregates.” You can roll up 1-minute data to hourly or daily data without losing accuracy. The table has two types of columns:
- Raw AggregateFunction columns like
count(),sum(totalBytes),uniqMerge(cliIP). These store binary states and you cannot SELECT them directly. - ALIAS columns like
cnt_all,sum_totalBytes,pct_offLoadHit. These apply -Merge combinators to produce final values.
Here’s the critical insight that took us several iterations to encode properly: ALIAS columns are already aggregate functions. When you SELECT cnt_all, it expands to countMerge(count()). This automatically aggregates within each GROUP BY group.
The common mistake, one we saw repeatedly in early testing, is wrapping ALIAS columns in sum() or count():
-- WRONG: Nests aggregate functions, produces errors or wrong results
SELECT
toStartOfHour(datetime) as hour,
sum(cnt_all) as total_requests
FROM akamai.summary
GROUP BY hour
-- RIGHT: ALIAS column already performs aggregation
SELECT
toStartOfHour(datetime) as hour,
cnt_all as total_requests
FROM akamai.summary
GROUP BY hourThe skill documentation includes sections explicitly titled “WRONG” and “RIGHT” with these examples. When Claude reads this documentation before generating SQL, it avoids the mistake.
Step 4: Table Selection Logic
With multiple tables available, choosing the right one matters. The skill encodes a decision tree to choose between raw and multiple summary tables at different time granularities.
The performance difference is significant. A 7-day cache hit ratio analysis that requires scanning 160 billion rows of raw log data will be terribly inefficient. The same query against the right summary scans roughly 10 million rows (7 days × 1,440 minutes per day × ~1,000 unique dimension combinations) and completes in under 2 seconds, all while providing the same information as the inefficient query.
Step 5: Circuit Breakers as Performance Guards
Every query must include at least one performance guard:
- Time filter on the primary key (typically the
datetimefield) - LIMIT clause
- Both (recommended for exploratory queries)
The skill enforces hard limits:
- Maximum time range: 30 days
- Query timeout: 120 seconds (enforced by Hydrolix)
When a user asks for something that exceeds these limits, such as, “show me cache hit ratio for all of 2024,” Claude explains the constraints and offers alternatives such as multiple queries for different months, coarser time granularity (daily instead of minute-by-minute), or focusing on specific time windows of interest.
This prevents wasting time and resources on inefficient queries.
Step 6: Query Generation From Templates
The skill includes 30+ query templates across 10 categories that encode best practices.
Let’s look at an example: cache performance analysis. The template library includes:
- Cache hit ratio over time with configurable time buckets
- Byte hit ratio by hostname
- Cache status distribution by geographic region
- Comparison of edge hits versus origin hits
Each template shows the correct pattern and explains when to use it. When a user asks “what’s my cache hit ratio by country?”, Claude doesn’t generate SQL from scratch. It adapts the “cache hit ratio by geographic dimension” template, substituting the appropriate grouping field and time range.
This approach dramatically reduces errors. Instead of solving the general problem of generating arbitrary SQL, we’re solving the specific problem of adapting a proven template for the user’s requirements.
Step 7-8: Execution and Visualization
After generating a query and running it through the MCP integration with Hydrolix, results return with full metadata including row counts, execution time, and data types.
But raw query results still need to be translated into insights. The skill decides whether visualization would help. For time-series data, Claude creates interactive line or area charts. For category comparisons, bar charts. For proportions, pie charts.
These aren’t static images. They’re React components using Recharts, which are fully interactive with hover tooltips and formatted axis labels. When showing bytes transferred, values appear as “2.3 TB” not “2534231134208”. When showing cache hit ratios, they’re percentages with one decimal place, not raw floats with 15 digits.
The skill encodes visualization best practices: maximum 10-15 data points for readability, readable color palettes, clear titles, and formatted values. The user gets a professional-quality chart without specifying any of these details.
Step 9: Presenting Results with Context
Claude doesn’t just display results. It summarizes key findings in plain English: “Your cache hit ratio averaged 87.3% over the last 24 hours, with a dip to 82.1% between 2:00-4:00 AM on January 15.”
It also adds context: “This dip coincided with increased traffic to newly published content, which typically has lower cache hit ratios for the first few hours.”
And it suggests follow-ups: “Want to see cache hit ratio by specific hostname? Or examine which URLs had the most cache misses during that time window?”
This transforms raw data into actionable insights.
Step 10: Error Handling and Recovery
When queries fail, the skill includes a recovery workflow:
1. Check the error message against known patterns.
2. Apply the appropriate fix:
- Wrong table selection → switch to the correct table.
- Summary table ALIAS mistakes → fix the column usage.
- Timeout → add stricter time filters or reduce date range.
- Missing fields → substitute available alternatives.
3. Explain what went wrong and what was changed.
4. Retry automatically.
If the retry fails, Claude explains the limitation and suggests alternatives.
Why This Approach Works
You might be thinking: couldn’t any AI with access to the schema generate these queries? We thought so too, initially. Then we tested it.
Without the skill, Claude (and other LLMs) make predictable mistakes:
- Wrapping ALIAS columns in aggregate functions
- Querying wrong tables for multi-day aggregations
- Omitting time filters, leading to timeouts
- Using fields that don’t exist or misunderstanding what they represent
- Generating syntactically correct but semantically wrong SQL
The skill solves this by making Claude read authoritative documentation before generating queries. Common pitfalls are explicitly documented as “WRONG” patterns with explanations. Circuit breakers prevent costly mistakes upfront. The example library provides proven starting points. The error handling workflow guides automatic recovery.
The end result: a user asks a plain English question and gets correct table selection, proper ALIAS column usage, appropriate time filters, formatted results with context, optional visualization, and follow-up suggestions, all without needing to know SQL, schema details, or Hydrolix internals.
You can ask questions like:
- “What’s my cache hit ratio for the last week?”
- “Show me error rates by status code for hostname X.”
- “Which URLs consumed the most bandwidth yesterday?”
- “What’s my edge versus origin traffic breakdown by region?”
You’ll get correct SQL, accurate results, and helpful visualizations, all without writing a single line of code. And if you’re curious what SQL was generated, just ask. Claude will show you the query and explain how it works.
That’s the power of encoding expertise. The knowledge is there when you need it, invisible when you don’t, and reliable every single time.
Next Steps
- With CDN Insights, you can start monitoring your CDN infrastructure in minutes. Compare how multiple CDNs are performing at any scale, segment data by region, ASN, and more, then drill down into individual logs for further troubleshooting. You can also keep full-fidelity data long-term (15 months by default) without needing to sample or throw data away.
- If you’re using Akamai, consider TrafficPeak for Akamai, a managed observability solution for Akamai Connected Cloud powered by Hydrolix that can give you deep insights on your CDN infrastructure and more.