Harness Blog: DevOps, CI/CD Insights

AI Coding Security Risks Demand Dependency Firewalls

AI coding tools can introduce vulnerable dependencies fast. Learn how dependency firewalls block risky packages at the registry level. Explore now.

June 22, 2026

Time to Read

AI coding security risks emerge the moment your assistant suggests `npm install suspicious-package` and your team accepts without question. In production environments, AI-generated code recommendations bypass traditional review workflows, introducing vulnerable dependencies at a pace human oversight cannot match. One accepted suggestion can pull in dozens of transitive dependencies, each a potential supply chain entry point.

This is not about slowing developers down. It is about giving platform and security teams a scalable control point while developers keep moving fast.

‍

How AI Assistants Accelerate Dependency Risk

AI coding tools operate on pattern recognition trained across millions of public repositories. When a developer asks for authentication logic, the assistant suggests popular packages based on usage frequency, not security posture. The tool has no visibility into CVE databases, package maintainer history, or recent compromise patterns. It recommends what worked statistically, not what remains safe operationally.

This creates volume problems traditional security gates cannot address. A team of ten engineers using AI assistance can introduce 50 new external dependencies per sprint. Manual security review of each package, its maintainers, and its transitive tree becomes a bottleneck that development velocity simply routes around. The dependencies enter `package.json`, pass CI checks that only verify build success, and deploy to production before anyone evaluates supply chain risk.

‍

The npm Supply Chain Attack Surface

Recent npm ecosystem compromises demonstrate how attackers exploit this acceleration. Package maintainers get compromised through credential theft or social engineering. Attackers publish malicious updates to widely-used packages. AI assistants continue recommending these packages based on historical popularity metrics. Development teams install them automatically as part of normal workflow.

The May 2026 TanStack supply chain attack illustrates the current threat landscape. Attackers published malicious npm packages impersonating TanStack libraries, targeting developer credentials, secrets, and CI/CD-related access tokens. Because TanStack packages are widely used across React ecosystems, AI coding assistants readily suggested these typosquatted variants based on name similarity and perceived popularity. Teams relying on AI suggestions without registry-level controls had no automated way to prevent these packages from entering their dependency trees. The attack specifically exploited the trust developers place in AI-generated recommendations, harvesting credentials that could enable deeper supply chain compromise.

The older ‘event-stream’ incident followed a similar pattern. A legitimate package with millions of weekly downloads received a malicious update that harvested cryptocurrency wallet credentials. The compromise remained undetected for weeks because the package maintained its reputation score and continued appearing in AI-generated suggestions.

‍

Why Traditional Security Checks Miss AI-Introduced Dependencies

Standard vulnerability scanning happens too late in the development lifecycle. Most teams run security checks after code reaches staging or pre-production environments. By this point, vulnerable dependencies have already integrated into application logic, created transitive dependency chains, and potentially exposed sensitive data in development environments.

SCA Tells You What Risk Exists. Dependency Firewalls Stop It From Entering.

This is the critical distinction. Software Composition Analysis (SCA) tools scan your codebase and report which vulnerable packages you already have. They are reactive: they tell you about risk after it exists in your environment. Dependency firewalls are preventive: they stop risky packages at the registry boundary before they ever reach your codebase, builds, or pipelines.

SCA scanning remains valuable for ongoing visibility. But when AI coding tools introduce dependencies at high velocity, you need a control that operates before installation, not after. The registry boundary is that control point.

Static analysis tools detect known CVEs but miss zero-day vulnerabilities and recently compromised packages. The gap between package compromise and CVE publication creates a window where AI assistants continue recommending dangerous dependencies while security databases report clean status. Teams operating on daily or weekly vulnerability scan schedules remain exposed to supply chain attacks that evolve hourly.

License compliance presents another blind spot. AI coding tools suggest packages based on functionality, not licensing terms. A developer receives a suggestion for an AGPL-licensed package when building a proprietary commercial application. The licensing conflict only surfaces months later during audit preparation, requiring expensive refactoring or license negotiation.

The Speed-Versus-Security Tension

Development teams adopt AI assistance specifically for velocity gains. Asking developers to manually verify every AI-suggested dependency contradicts the efficiency goal that justified the AI tool investment. This creates a cultural pressure where security verification becomes the exception rather than the rule.

‍

Why the Registry Boundary Is the Natural Control Point

The upstream proxy is where external dependencies enter your organization. Every npm install, pip install, or maven dependency resolution that reaches out to a public registry passes through this layer. This makes it the natural enforcement point for open-source governance.

Think of it the same way you think about network firewalls. You do not let arbitrary external traffic into your internal network without evaluation. The same logic applies to software packages. The upstream proxy fetches and caches artifacts from external registries, so placing policy evaluation at this layer means every external package gets assessed before it becomes available to any developer, any build, or any pipeline in your organization.

A dependency firewall at the registry boundary evaluates external packages before they enter your organization's artifact ecosystem. Rather than scanning for vulnerabilities after installation, it blocks retrieval of packages that fail security, compliance, or policy checks. When a developer or AI assistant attempts to install a package, the request routes through the firewall, which evaluates:

- Known vulnerabilities: CVSS severity scores checked against your defined threshold.

- Package age and stability: Newly published packages can be flagged or blocked.

- License compatibility: Alignment with organizational compliance requirements.

- Custom policy rules: Organization-specific policies written in Rego for nuanced control.

- Transitive dependency risk: Security posture of the complete dependency tree.

Packages that fail evaluation are not cached and are not available for download. Developers receive immediate feedback about why the package was blocked. This shifts security decisions from post-integration remediation to pre-installation prevention.

‍

How Harness Artifact Registry Implements Dependency Governance

Harness Artifact Registry implements dependency firewall capabilities as part of its upstream proxy architecture. When configured as your primary package source, it evaluates external dependencies against configurable policies before caching them for internal use.

The Flow

Here is how it works in practice:

‍

- **Blocked** versions are not cached and are not available for download. The install fails with a clear policy violation message.

- **Warning** versions are cached and available for download, but flagged for visibility. Teams can review warnings on the dashboard and decide whether to tighten policy.

- **Passed** versions are cached normally and available without restriction.

Policy Configuration

The platform supports configurable policy sets that you apply to your upstream proxy registries:

CVSS severity threshold: Block packages with known vulnerabilities above your defined severity level (e.g., block all Critical and High CVEs).
License policy: Block or warn on packages with specific license types (GPL, AGPL, or any license incompatible with your distribution model).
Package age policy: Flag or block packages published within a configurable window (e.g., packages less than 30 days old receive a warning).
Custom Rego policies:Write organization-specific rules for nuanced evaluation logic that goes beyond built-in checks.

Policy sets group-related rules and can be applied across multiple registries. This means security rules defined once apply consistently, whether developers are pulling JavaScript packages, Java libraries, or container images. For a deeper look at how this fits into a unified artifact management strategy, Harness AR handles the full lifecycle from ingestion to deployment.

Visibility and Governance

The Dependency Firewall dashboard provides visibility into policy evaluations, showing which packages were blocked, which received warnings, and which passed. This supports both incident response and continuous policy refinement based on actual development patterns.

Integration with CI/CD pipelines ensures build environments use the same controlled package sources as local development. A package that passes firewall evaluation in development remains available in CI without re-fetching from public registries. This consistency eliminates scenarios where local and build environments reference different package versions or bypass controls.

For organizations managing multiple package formats (npm, Maven, Docker, Helm), Harness AR provides unified policy management across registries, reducing policy management overhead while maintaining comprehensive supply chain governance.

Operational Implementation

Initial firewall deployment involves cataloging currently used dependencies and establishing baseline policies. Most organizations start with blocking known high-severity CVEs and expand policy coverage incrementally. This prevents disrupting existing workflows while building the approved package catalog that development teams and AI tools can safely reference.

Learn more about Harness Artifact Registry for more information about implementing these controls.

‍

Building Sustainable AI-Assisted Development Workflows

Dependency firewalls enable rather than restrict AI coding tool adoption. When developers know that package suggestions route through security evaluation, they can accept AI recommendations with appropriate confidence. The firewall handles security verification automatically, removing the burden of manual package vetting from development workflows.

This creates a sustainable balance between velocity and governance. AI assistants continue accelerating development by suggesting relevant packages. The dependency firewall ensures those suggestions meet organizational security standards before integration. Development teams focus on building features while platform teams maintain supply chain integrity through policy rather than through manual review queues.

Organizations implementing dependency firewalls report faster incident response when supply chain compromises occur. Instead of searching codebases for usage of a newly compromised package, firewall logs immediately identify which projects requested it and whether the request was approved. Remediation becomes targeted rather than organization-wide.

The investment in dependency firewall infrastructure pays forward as AI coding tools become more capable. Future assistants generating entire microservices will introduce even more dependencies even faster. The control point established now scales to handle that acceleration without requiring fundamental workflow changes or security architecture redesign.

If AI is accelerating how your teams write code, Harness Artifact Registry helps ensure the dependencies entering that code are governed before they reach builds, pipelines, or production. The registry boundary is where supply chain security starts.

Ship From Where You Build: Harness Delivery Intelligence, Now Inside Antigravity

Connect Harness MCP Server inside Google Antigravity to let AI agents inspect pipelines, debug deployments, trigger approved runs, and act on real-time delivery context with RBAC, audit logs, and human-in-the-loop controls.

June 22, 2026

Time to Read

Key takeaway: The Harness MCP Server now connects directly inside Google Antigravity. Developers can link Harness in under two minutes and give the agent structured, real-time access to their pipelines, execution history, services, environments, and policies, without leaving the editor. What makes it reliable isn't the connection itself. It's the Harness Software Delivery Knowledge Graph underneath, which gives the agent the context to act accurately, fast, and within your guardrails.

‍

AI has made the inner loop faster than ever. Inside Antigravity, you can write, refactor, and test code in seconds. But the moment a change needs to be built, deployed, or debugged in production, you leave the editor entirely, back to juggling pipelines, approvals, scan results, and failed runs across a half-dozen browser tabs. That gap between fast code and slow delivery is the part AI hasn't fixed yet.

The Harness MCP integration closes that gap. Connect once, and Antigravity gains direct access to your Harness delivery environment. The agent now understands your delivery system the same way it understands your codebase. So you can ask it to list pipelines, explain a failure, or trigger a deployment, and it acts on live Harness context instead of generic knowledge.

Connect Antigravity to Harness in a Few Steps

There's no YAML to write and no manual server config. You generate a Personal Access Token in Harness, open Antigravity's Customizations panel, add the Harness MCP server, and paste the token. That's the entire setup.

Generate a PAT. In Harness, go to Account Settings → Personal Access Tokens and scope the token to the org, project, and pipelines you want the agent to reach.
Open Customizations. In Antigravity, go to Settings → Customizations to configure default behaviors, skills, and MCP server connections.
Add the Harness MCP server. Click + MCP Servers, search Harness, select it, and paste your PAT.
Start building. The agent now operates with your Harness account, org, and project context. Describe what you need and it acts on real pipeline and execution data.

Settings → Customizations → Add MCP Servers, search “Harness,” connect, done.

All The Software Delivery Use Cases Within Antigravity

Once Harness is connected, you interact with your delivery system the same way you interact with your code, in plain language, from the same window.

Use case	How it works with Harness MCP
Create a pipeline from code	Describe your service and target. The agent reads your existing templates, connectors, and services in Harness to generate a YAML-valid pipeline grounded in your actual account configuration.
Inspect pipelines & executions	Ask what exists and what ran. The agent lists pipelines by org and project, and pulls execution history with status, duration, and run IDs.
Debug a failed deployment	Ask why a run failed. The agent queries the execution context, isolates the failing step, surfaces the root cause, and recommends a fix, without leaving the editor.
Trigger a deployment with approval	Tell the agent to run a pipeline. It shows the execution details and asks for confirmation before triggering. You approve in chat, and the run is logged with a full audit trail.

Ask it what's in your delivery environment

Start simple: "Can you list pipelines in <Your Project> project in the default org in my Harness account?" The agent resolves the project identifier, pages through every pipeline, and returns a structured report (names, identifiers, creation times, descriptions, and tags) with links straight back to the Harness UI.

*All pipelines, resolved and rendered with live Harness links.*

‍

Drill into execution history

From there, ask about recent activity: "List out my recent executions in this project." The agent reads the execution history, converts raw timestamps and durations into something readable, and lays out every run, including the one that came back ApprovalRejected, so you can see exactly what happened and when.

*Execution history with status, duration, and IDs, the context the agent reasons over to explain a failure.*

‍

Triggering a deployment, with a human in the loop

This is where most "AI in delivery" stories get nervous, and where the design matters most. When you ask the agent to run something, it doesn't just act. It shows you the exact tool it wants to call and the arguments it will send, then waits for your approval.

*Every tool call surfaces its arguments and pauses for explicit approval; nothing runs silently.*

‍

Approve it, and the run goes through. The agent triggers the pipeline using the Harness run action and returns the live execution (pipeline ID, status, trigger type, and a link to the execution in Harness), so you can follow it from the same chat.

*Confirmed and triggered: the run is live in Harness, with a full execution record and audit trail.*

‍

This Is Not AI Without Guardrails

The natural question, once an agent can trigger pipelines, what stops it from doing something it shouldn't? The same controls that govern everything else in Harness.

Trust dimension	How it works
RBAC enforcement	Every MCP tool call runs within the permissions of the authenticated user's PAT. The agent operates with exactly your access, no privilege escalation.
Human in the loop	Execution triggers, policy changes, and production actions require explicit confirmation in chat before they proceed.
Audit logging	All tool calls, approvals, and outputs are logged. Full traceability for compliance and security teams.
Zero data training	Your pipeline definitions, execution logs, and org context are never used to train AI models.

Why Context Beats Raw API Access

MCP lets a model call external tools by reading API descriptions and deciding which to invoke. That flexibility is useful, but when an agent needs to reason across an entire delivery lifecycle (CI, CD, security scans, approvals, environments, cost signals), raw API access creates a reliability problem. The agent has to discover which endpoints exist, call them in the right order, paginate correctly, and infer how fields relate across systems. Every inferred join is a place to guess. Guessing is where hallucinations happen.

The Harness Software Delivery Knowledge Graph removes the guesswork. It's a purpose-built model of everything that happens after code is written (builds, test runs, deployments, approvals, scans, environment states, feature flags, infrastructure changes, cost signals, and rollbacks) represented as a connected, typed, semantically annotated graph. Every field carries metadata telling the agent how to use it, and relationships between entities are explicitly declared, not inferred.

This is the difference between an agent that can access your delivery system and one that understands it.

When Antigravity connects to Harness via MCP, it isn't handed a list of endpoints. It gets a structured model of your delivery organization, where relationships are known, data types are enforced, and the agent can construct precise queries rather than guessing at field semantics. The same controls apply structurally, too: an approval gate isn't an optional step the agent might skip; it's a typed relationship with state. The agent can't promote past a gate that hasn't cleared, because the graph reflects that clearly. Speed and governance aren't a tradeoff; they coexist by design.

Software Delivery Context At Your Fingertips

If you're already a Harness customer, you're a couple of minutes away from having the software delivery control in Antigravity. New to Harness? Sign up for free and connect from day one. For enterprise onboarding and design-partner access, contact your Harness account team.

The Harness connection gives the agent the ability to act in your delivery system. The Knowledge Graph gives it the understanding to act well. Together, that's what reliable AI in software delivery actually looks like, now available wherever you build, including inside Antigravity.

‍

Real-Time CPU and Memory Insights for Harness CI Cloud Builds

Get real-time CPU, memory, and disk I/O insights for Harness CI Cloud builds. Right-size machines, debug OOMs faster, detect regressions, and optimize CI performance with zero setup.

June 17, 2026

Time to Read

When a CI pipeline runs on cloud infrastructure, the build machine is ephemeral. It spins up, executes your build, and disappears. During that window, you have zero visibility into how much CPU and memory your pipeline actually consumes.

This blind spot creates real problems. Teams over-provision VMs "just in case," wasting compute spend. Others under-provision and deal with silent OOM-kills or CPU throttling — the only clue being a cryptic exit code 137. Without historical resource profiles, there's no data-driven way to right-size pipelines or catch regressions introduced by dependency upgrades.

We built CPU and Memory Insights to solve this. It gives you real-time and historical visibility into resource consumption during every Harness CI Cloud build — with zero configuration and zero impact on build performance.

Why Resource Visibility Matters

Consider a typical scenario: your build takes 12 minutes on a Large machine (4 vCPU, 8GB RAM). Is it CPU-bound during compilation? Memory-bound during docker build? Or is it I/O-bound pulling dependencies? Without metrics, you're guessing.

With CPU and Memory Insights, you can:

Right-size your machines — see that a "Large" build peaks at 30% CPU and safely downgrade to "Medium," cutting your cloud spend.
Debug failures faster — watch the memory ramp leading to an OOM kill and pinpoint which step caused it.
Detect regressions — compare P90 CPU across builds to catch when a dependency update made things worse.

How It Works

The system collects resource metrics from inside the ephemeral VM, streams them in real-time to the Harness platform, and renders interactive charts in the execution view.

Architecture

Harness CI Cloud uses a multi-layered architecture for pipeline execution. The metrics flow is overlaid on the same path used for build orchestration:

The key insight: lite-engine is the only component running inside the VM — it's the only one with access to actual resource utilization. But it has no persistent storage. Everything must be streamed out before the VM is destroyed.

Data Collection

When a VM is provisioned for your build, lite-engine starts a background process that samples system metrics every second:

CPU utilization — aggregate percentage across all cores
Memory usage — total and available, in GB
Disk I/O — read and write throughput in bytes/sec

Each sample is written as a single JSON line (NDJSON format) to the Harness Log Service using a dedicated stream key. This is the same battle-tested infrastructure that powers step-level log streaming — we reuse its real-time SSE transport, blob storage, and access control. No new infrastructure needed.

Real-Time Streaming

The metrics stream opens during VM setup and closes during VM destroy, giving continuous coverage regardless of how many steps run or fail in between. The stream is independent of step execution — there are no gaps between steps.

During execution, the UI connects via Server-Sent Events (SSE) to receive metrics as they're collected. For completed builds, the same data is available from blob storage. The UI handles both transparently — same visualization whether you're watching a live build or reviewing a historical one.

Summary Statistics

When the VM is destroyed, lite-engine computes a final summary before closing the stream:

Peak CPU — maximum utilization observed
Average CPU — mean utilization across the entire stage
P90 CPU — 90th percentile utilization (useful for right-sizing decisions)
Total Disk I/O — cumulative bytes read and written

The frontend also computes P50, P90, P95, and P99 percentiles client-side, which means you get full statistics even for in-progress executions.

What You See in the UI

Click the resource indicator button in the execution view (it shows your platform and size, e.g., "Linux (Large)"). A drawer opens with three charts:

CPU Usage

An area chart showing utilization percentage over time, with a P90 reference line. The stats bar shows total cores, peak utilization, average, and percentiles (P50/P90/P95/P99).

Memory Usage

An area chart with dual Y-axes: percentage on the left, GB on the right. Helps you understand both relative and absolute consumption at a glance.

Disk I/O

A line chart showing read and write throughput in MB/s. Useful for identifying I/O-bound steps like image pulls or large file operations.

A stage selector dropdown at the top lets you switch between stages in multi-stage pipelines.

Cross-Platform Support

CPU and Memory Insights works across all Harness Cloud infrastructure:

Platform	Support
Linux (x86_64)	Full metrics (CPU, memory, disk I/O)
Linux (arm64)	Full metrics
macOS (Apple Silicon)	Full metrics
Windows	Full metrics

layer normalizes platform-specific differences. Whether the underlying OS reports per-core or aggregate CPU, or uses different disk I/O naming conventions, the metrics are always presented consistently: aggregate CPU as a single percentage, memory in GB, and disk throughput as a delta rate.

Performance Impact

Resource collection runs with negligible overhead:

Metric	Value
CPU overhead on build VM	< 0.1%
Memory footprint	~2MB
Data generated per hour of build	~800KB
Sampling interval	1 second

For long-running builds, the frontend intelligently downsamples to 120 data points for chart rendering while preserving visual accuracy — peaks and valleys are maintained using the LTTB (Largest-Triangle-Three-Buckets) algorithm.

Reliability

Builds can end in many ways: graceful completion, timeout, infrastructure failure, or force-kill. We handle all of them:

Happy path: lite-engine writes the summary and closes the stream on VM destroy.
Crash path: The platform-level cleanup phase independently closes the metrics stream if lite-engine didn't. This runs regardless of how the VM terminated.

This dual-closure approach ensures metrics data is never orphaned — you always get at least the raw timeline, even if the summary couldn't be computed.

What's Next

We're continuing to invest in resource intelligence for CI builds:

Step-level attribution — correlating resource spikes with specific pipeline steps to pinpoint exactly which step is expensive.
Automated right-sizing recommendations — using historical P90 data to suggest optimal machine sizes for your pipelines.
Resource threshold alerts — notifying you when builds consistently approach memory limits, before they OOM-kill.
Build-over-build comparison — overlaying metrics from the current build against previous runs to visualize the resource impact of code changes.

Get Started

CPU and Memory Insights is enabled by default for all pipelines running on Harness CI Cloud no setup required.

To explore the feature:

Open any pipeline execution running on a Harness Cloud machine.
Click the resource indicator in the stage execution header (for example, Linux (Large)).
Open the insights drawer to view real-time and historical CPU and memory usage for your build.

No YAML changes. No additional agents. No configuration needed.

Use this visibility to quickly identify resource bottlenecks, right-size your build infrastructure, and improve overall CI efficiency.

Ready to optimize your builds? Try it in your next pipeline run or learn more in the Harness CI documentation.

From Commit to Approval, Without Leaving VS Code

The Harness VS Code Extension is now on the Marketplace. Monitor pipelines, debug logs, approve deployments, and query failures with Claude Code, Copilot, or Cursor, without leaving VS Code.

Your Harness pipelines, logs, and deployment approvals are now a sidebar panel away inside VS Code.

The Harness VS Code Extension is live on the VS Code Marketplace today, no .vsix download, no manual install. Search "Harness" in the Extensions view, and you're a click away from real-time CI/CD visibility without leaving your editor.

Everything Software Delivery in One Panel

Capability	What it does
Pipeline monitoring	Live status for active runs, with automatic git context detection, executions for your current branch and commit surface automatically.
Log viewer	Click any pipeline step to open its logs in a dedicated editor tab, syntax-highlighted. Failed steps are flagged immediately.
Inline approvals	Approve or reject Harness native, Jira, and ServiceNow deployment gates directly in the editor. No navigating to the UI.
AI-assisted debugging	Ask IDE-integrated Cursor, GitHub Copilot, or Claude about a failure. Pipeline context (name, status, execution ID, URL) is injected automatically. No copy-pasting.

Ask Your AI. It Already Has the Context.

When a pipeline fails, the default loop is: open Harness UI, find the execution, read the logs, copy the relevant output, open your AI assistant, paste, and ask. That's four context switches before you've started fixing anything.

The extension collapses that into one step. An input sits at the bottom of the Harness panel. Type your question, select Claude Code, GitHub Copilot, or Cursor from the dropdown, and the extension packages the current execution context automatically before sending.

‍

‍

What makes the context useful, not just present, is the Harness Software Delivery Knowledge Graph. The Knowledge Graph is a structured data model that connects every entity across your SDLC: pipelines, services, deployments, environments, artifacts, policy results, and more. When the extension sends your AI tool the execution context for a failing pipeline, it's pulling from that graph. So Claude Code, Copilot, or Cursor isn't just reading a raw log dump. It's receiving structured, relationship-aware data about what ran, what it depends on, and where it broke. That's the difference between an AI that can technically answer a question about your pipeline and one that can accurately answer it.

‍Claude Code responses appear directly in the Harness sidebar (CLI mode) or open the Claude Code panel with the prompt pre-loaded (extension mode). Click Configure MCP in the AI footer to wire up your Harness credentials: project scope or global, your choice.

‍GitHub Copilot is auto-detected when the extension is installed. Context and prompt open in Copilot Chat, ready to go.

‍Cursor is auto-detected when you're running inside Cursor. For the simplest setup, install the Harness plugin from the Cursor marketplace. OAuth authentication, no manual configuration.

Install in Two Minutes

Install:

Open the Extensions view (Ctrl+Shift+X), search "Harness", and click Install. Or from the terminal:

code --install-extension harness-inc.harness-vscode

Connect your account:

Click the Harness icon in the Activity Bar → run Harness: Configure API Key → enter your instance URL and Personal Access Token. Your Account ID is extracted from the PAT automatically.

Select your org and project. Pipelines load immediately.

‍Requirements: VS Code 1.85.0+, active Harness account.

Watch it in action

Watch the walkthrough from our very own Luis Redda.

Stay in VS Code. Your Pipelines Will Follow.

The context-switching loop (open Harness, find the execution, copy the log, switch to your AI tool, paste, and ask) doesn't have to be part of how you work. Pipeline status, logs, approvals, and AI-assisted debugging all live in the same panel as your code. Install the extension, connect your account, and the next time something breaks, you'll already be where you need to be.‍

For more information, checkout the docs.

Azure Deployment Strategies & CI/CD Best Practices

Master Azure deployment with CI/CD, canary releases, feature flags, GitOps, and IaC. Learn how progressive delivery and Harness help teams ship faster, safer, and with fewer incidents.

Modern Azure deployment goes beyond basic pipelines. Teams that combine CI/CD automation with progressive delivery and feature flags ship faster and with far fewer incidents.
Choosing the right deployment strategy for each workload type dramatically reduces blast radius and makes rollbacks a matter of seconds, not hours.
Embedding feature management and experimentation directly into Azure deployments lets teams decouple deployment from release before full rollout.

‍

Learn how to master Azure deployment with CI/CD pipelines, progressive delivery, and feature flags. See how Harness helps engineering teams ship faster and safer on Azure.

Azure deployment sounds straightforward. Push code, it runs in the cloud. But if you've managed a 2 a.m. production incident because a deployment went sideways on AKS, you know the gap between "it deploys" and "it deploys safely at scale" is significant.

This guide covers the deployment strategies, pipeline structures, and operational patterns that close that gap -- from how to sequence a canary rollout to how Harness Continuous Delivery makes the whole operation measurably safer.

What Is Azure Deployment?

Azure deployment is the process of releasing application code, configuration, or infrastructure changes to Microsoft Azure. That can target VMs, AKS clusters, Azure App Service, Azure Functions, Azure Container Instances -- whatever your workload runs on.

At the artifact level, a deployment pushes a container image, a build package, or a Terraform plan into an Azure environment. What distinguishes a mature deployment workflow from a basic one is the control layer around that push:

CI gates every commit. No artifact reaches Azure without passing build, test, and static analysis stages.
CD automates the path from staging to production. Humans approve; pipelines execute.
Deployment strategy determines blast radius. Canary, blue-green, and rolling deployments each make a different tradeoff between speed, safety, and cost.
IaC keeps environments consistent. If a resource change isn't in code, it doesn't happen.
Observability triggers rollback. Post-deployment verification watches metrics automatically. If error rates cross the threshold, the pipeline acts -- no engineer needs to catch it first.

Azure Deployment Strategies: Pick the Right Tradeoff

The strategy you choose determines how much of your user base absorbs a bad release before you can respond. The tradeoffs are clear.

Blue-Green Deployment

Blue-green keeps two identical environments live: blue handles production traffic; green runs the new version. When green passes validation, traffic cuts over instantly.

What this means in practice on Azure:

You're running double the infrastructure during every deployment window -- parallel App Service slots, duplicate AKS node pools, or mirrored Container Apps environments.
Rollback is instant: flip traffic back to blue.
Validation happens before any user sees the new version.

Use blue-green when: rollback speed matters more than infrastructure cost, and you need zero-downtime cutover with the option to abort completely.

Skip blue-green when: your workload has stateful dependencies or database schema changes that make running parallel environments operationally complex.

Canary Deployment

Canary deployments send a defined percentage of traffic to the new version while the rest stays on stable. Start small, watch metrics, and expand only when data supports it.

A standard canary ramp on a high-traffic Azure workload:

1% of traffic to canary. Watch p95 latency and error rate for 15-30 minutes.
5% if metrics hold. Watch for another 30 minutes.
25% if metrics hold.
100% once you're confident.

At each stage, define a specific rollback trigger before the deployment starts -- not while you're watching dashboards. For example: if error rate rises more than 0.2% above baseline, or p95 latency increases more than 50ms, auto-roll back and alert.

The blast radius of a bad release tops out at whatever percentage is currently on canary. Catch a problem at 1%, and one in a hundred users hits it -- not all of them.

Rolling Deployment

Rolling deployments replace instances of the old version in batches. No double infrastructure -- each batch of pods gets updated and validated before the next batch rolls.

This is resource-efficient, but old and new versions run simultaneously during the rollout. That creates two constraints:

API calls from old instances can reach new instances. If your API contract changed, backward compatibility is required.
Database schema changes need to be backward-compatible before the rollout starts. Migrate first, then deploy.

Use rolling when: your workload is stateless, API changes are backward-compatible, and infrastructure cost is a constraint.

Building a CI/CD Pipeline for Azure

A reliable Azure deployment pipeline runs the same automated process on every commit. Here's how the stages flow using Harness-powered pipelines.

Stage 1: Source Trigger

A commit or PR kicks off the pipeline. Every change -- bug fixes, config updates, dependency bumps -- goes through the same stages. No exceptions for "small" changes; that's where incidents come from.

Stage 2: Build and Unit Test

Code compiles. Container images build. Unit tests run. If anything fails here, the pipeline stops. Don't let a broken build consume downstream compute.

Tag images with the pipeline sequence ID or commit SHA -- never "latest" in production. You need to be able to redeploy any version from six months ago without guessing which image it was:

yaml

- step:
    type: BuildAndPushDockerRegistry
    name: Build and Push
    spec:
      connectorRef: azure_container_registry
      repo: myapp
      tags:
        - <+pipeline.sequenceId>
        - <+trigger.commitSha>

Stage 3: Static Analysis and Security Scanning

Run SAST on every PR. DAST is often run asynchronously (e.g., nightly or pre-release) due to runtime and environment requirements -- it's slower and will add minutes to every commit if you run it inline. Container scanning happens before the image lands in Azure Container Registry. Block the push if critical vulnerabilities are found; don't flag and continue.

Stage 4: Artifact Publishing

Validated images push to Azure Container Registry. Deployment packages go to your artifact store. Nothing reaches Azure environments without passing stages 2 and 3.

Stage 5: Infrastructure Provisioning

IaC definitions -- Bicep, ARM, or Terraform -- apply any environment changes before application artifacts deploy. Infrastructure and application deployments should be independent pipelines where possible. Coupling them couples their blast radii.

Stage 6: Staging Deployment and Integration Tests

Deploy to staging first. Run smoke tests and integration tests against real infrastructure. Review testing methodologies for CD pipelines to validate the release before production. This is where environment-specific bugs surface: network policies, service mesh configs, secrets management -- things unit tests don't catch.

Stage 7: Production Deployment with Progressive Delivery

Deploy to production using your chosen strategy. For canary: configure traffic weights in Azure Front Door, Application Gateway, or your AKS ingress controller. Automate the traffic ramp -- don't rely on manual weight adjustments at each stage.

Stage 8: Post-Deployment Verification

Harness AI-assisted deployment verification watches error rates, p95 latency, pod restart counts, and relevant business metrics (conversion rate, checkout completion) for at least 30 minutes post-deployment. If a threshold is breached, the pipeline rolls back without waiting for a human to notice.

Example rollback trigger thresholds:

Error rate increases more than 0.2% over baseline → auto rollback
p95 latency increases more than 50ms over baseline → auto rollback
Pod restart count increases more than 3x → halt rollout, alert on-call

Infrastructure as Code for Azure: Keep Environments Consistent

Manual Azure resource changes create configuration drift. When production diverges from what your IaC defines, incidents become harder to diagnose because you can't be certain what state the environment is actually in.

The rule: if a change isn't in code, it doesn't happen in production. That applies to VM sizes, network security groups, Key Vault access policies, AKS node pool configs -- everything.

What IaC actually gives you:

Version control for infrastructure. Every change is in a PR, reviewable, and revertible.
Reproducible environments. Spin up a staging environment that mirrors production exactly, run your tests, tear it down.
Drift detection. Automated checks compare the live Azure environment against your IaC definitions. If they diverge, you get an alert or auto-remediation.
Audit trails. Compliance teams can see what changed, when, and who approved it -- without digging through Azure activity logs.

Harness Infrastructure as Code Management adds drift detection, cost visibility, and policy enforcement directly in the pipeline. A Terraform plan that would provision resources over budget threshold fails the policy check before apply runs.

Progressive Delivery in Azure

Traditional deployments push everything to everyone at once. If something is broken, every user hits it simultaneously. Progressive delivery replaces that with a controlled ramp.

The technical mechanics depend on your Azure service:

AKS: Weighted ingress routing using NGINX ingress or Azure Application Gateway Ingress Controller.
Azure App Service: Deployment slots with traffic splitting configured via Azure CLI or portal.
Multi-region: Weighted routing rules in Azure Front Door.

The operational pattern is the same regardless: start at 1-5% of traffic, define automated rollback triggers before the deployment starts, measure for at least 15-30 minutes per stage, and expand only when metrics confirm the release is healthy.

What makes this work at scale is automated deployment verification. Instead of an engineer watching dashboards at every ramp stage, the system watches metrics and halts or rolls back if guardrails are breached.

Feature Flags in Azure Deployments: Separate Deployment from Release

Deploying code and releasing features to users are two different pipeline stages. Feature flags are how you keep them separate.

When you ship behind flags, code deploys to Azure in an off state. The flag controls which users see it, when, and at what percentage. No high-stakes launch moment -- you ramp exposure the same way you'd ramp a canary.

This matters most in complex Azure architectures where services deploy independently. A new API version can deploy across your AKS cluster while the flag gates user-facing exposure until every downstream service is ready. No coordinated rollout timing. No deployment freeze while other services catch up.

How Flags Integrate with the Azure CI/CD Pipeline

The flag lives in application code. The pipeline deploys the code; Harness Feature Management controls flag state. Those are independent systems.

javascript

// Feature flag check in application code
const isNewCheckoutEnabled = await featureFlags.isEnabled('new-checkout', {
  userId: user.id,
  region: user.region
});

if (isNewCheckoutEnabled) {
  return newCheckoutFlow(cart);
} else {
  return legacyCheckoutFlow(cart);
}

Patterns That Work Well for Azure Deployments

Ship dark, release progressively. Deploy to all Azure regions behind a flag. Enable for internal users first. Validate against real infrastructure without external exposure. Then ramp: 1%, 5%, 25%, 100% -- each step gated by metrics.

Region-by-region rollouts. Target Azure regions sequentially using flag targeting rules. East US first; if error rates hold for 24 hours, enable in West Europe. No new deployment required to expand.

A/B test infrastructure changes. Testing a new AKS node type or a different caching layer? Harness Experimentation lets you route a percentage of workloads to the new configuration and compare against guardrail metrics with statistical validity -- not gut feel.

Release monitoring at the feature level. System-level monitoring tells you error rate is up 0.3%. Harness Release Monitoring tells you the new checkout variant is adding 40ms of p95 latency. The second tells you what to fix.

Warehouse-Native Experimentation

For teams running Azure Synapse Analytics or Azure Databricks, warehouse-native experimentation computes experiment results directly in your data warehouse -- no ETL pipelines, no data export, no additional latency in your analysis.

GitOps for Azure: Git as the Source of Truth

GitOps applies the same version-control workflow you use for application code to your Azure infrastructure and deployment configuration. Desired state lives in the repo. The live Azure environment is continuously reconciled against it.

For AKS workloads, the GitOps loop runs like this:

Engineer opens a PR with a Kubernetes manifest change.
PR is reviewed, approved, and merged to main.
GitOps controller detects the diff between desired state (repo) and live state (cluster).
Controller applies the change to the AKS cluster automatically.
If the live state drifts from the repo at any point -- manual kubectl change, failed sync -- the controller flags it or auto-remediates.

Every infrastructure change goes through code review. Every rollback is a revert commit. Audit trail is automatic.

Harness GitOps provides enterprise-grade GitOps with the audit trails, RBAC, and governance controls that Azure production environments demand -- without the operational overhead of managing Argo CD clusters yourself. The same discipline applies beyond Kubernetes: GitOps principles on ARM definitions, Bicep modules, or Terraform workspaces mean every Azure environment change follows the same review-approve-apply workflow as application code.

Governance and Policy in Azure Deployments

At enterprise scale, governance needs to be pipeline-native -- not a checklist that runs after deployment. Policy as Code applies compliance rules directly inside your Azure deployment pipelines, replacing manual approval checklists with automated checks that run before anything reaches production.

Harness DevOps Pipeline Governance enforces this at every stage:

Required security gates. SAST, SCA, and container scanning run automatically on every PR and build. Critical findings block promotion to production. Policy enforcement is in the pipeline -- no human bottleneck.
Immutable audit logs. Every deployment, approval, flag change, and rollback is timestamped and attributed. Required for SOX, HIPAA, or ISO 27001 compliance in Azure environments.
Environment-specific approvals. Staging promotes automatically; production requires sign-off. The approval workflow lives in the pipeline definition, not in someone's email inbox.
Cost guardrails. Policy checks block Terraform plans that would provision Azure resources over budget thresholds. Catch infrastructure cost overruns before apply runs, not after the invoice arrives.

Azure Deployment Best Practices

These are the patterns that separate teams shipping confidently on Azure from teams that dread release day.

Never deploy directly to production. Even for "tiny" changes. Every change goes through at least one pre-production environment with automated testing.
Make every deployment artifact immutable. Tag container images with commit SHAs. You should be able to redeploy any version from six months ago in under five minutes, without digging through Slack to figure out which image tag it was.
Decouple infrastructure and application deployments. Changing Azure resources and changing application code should be separate pipelines. Coupling them couples their blast radii.
Define rollback before you deploy. Every deployment needs a rollback plan -- and ideally, an automated one. If rollback requires more than a button click, simplify the pipeline.
Monitor at the feature level, not just the system level. "Error rate is up 0.3%" tells you something is wrong. "The new checkout variant is causing a 12% increase in cart abandonment," tells you what to fix.
Treat configuration as code. Azure App Configuration values, Key Vault references, and environment variables belong in version control and deploy through the same pipeline as application code.
Ship continuously, not on a schedule. The longer the gap between deployments, the more changes are bundled, the harder it is to isolate what broke. Continuous delivery with small, frequent deploys reduces the cost of every individual change.

How Harness Powers Azure Deployment at Scale

Teams shipping to Azure need CI, CD, feature management, infrastructure automation, and observability connected into a single workflow -- with the governance controls that enterprise Azure environments require.

Harness gives Azure teams:

Continuous Integration with intelligent test selection, incremental builds and pipeline caching, and pipeline analytics that eliminate build bottlenecks.
Continuous Delivery with canary, blue-green, and rolling strategies built in -- including AI-assisted deployment verification that watches metrics and rolls back without human intervention.
Infrastructure as Code Management for Terraform and Bicep workflows with drift detection, cost visibility, and policy enforcement.
Feature Management & Experimentation to decouple deployment from release, run A/B tests against real Azure traffic, and monitor at the feature level.
CD data visualization to track deployment frequency, lead time, and change failure rate across your Azure environments.

The result: Azure deployments that are faster, safer, and measurably better -- with the data to prove it.

Azure Deployment: Frequently Asked Questions

What is the difference between Azure deployment and Azure DevOps?

Azure deployment is the process of releasing application code or infrastructure changes to Azure cloud resources. Azure DevOps is Microsoft's platform for managing source control, CI/CD pipelines, work items, and artifact management. You can use Azure DevOps to orchestrate deployments, but it's one of several tools that can do so. Harness provides Azure deployment capabilities with enterprise-grade progressive delivery, feature management, and governance that extend beyond native Azure Pipelines.

What Azure deployment strategy should I use for a high-traffic application?

For high-traffic Azure applications, canary deployments offer the best balance of safety and speed. Start at 1% of traffic, watch error rates and p95 latency closely, and ramp to 5%, 25%, and 100% as metrics confirm health. Define automated rollback triggers at each stage before the deployment starts.

Blue-green deployments work well when you need instant rollback capability and can absorb double the infrastructure cost during deployment windows. Rolling deployments suit stateless workloads where brief mixed-version operation is acceptable, as long as API and schema changes are backward-compatible.

How do feature flags fit into an Azure CI/CD pipeline?

Feature flags integrate at the application code level, not the pipeline level. Code deploys to Azure with new features disabled behind flag checks. The deployment pipeline handles getting code to Azure; the feature flag controls which users see the new functionality and when. This lets your pipeline run continuously -- shipping every commit -- while you control feature exposure independently through feature management.

How do I prevent configuration drift in Azure?

Define all Azure resources in Infrastructure as Code -- Bicep, ARM templates, or Terraform -- and enforce a policy that no manual changes are made to production environments directly. Automated drift detection continuously compares the live Azure environment against the desired state in your IaC definitions and alerts (or auto-remediates) when they diverge.

What metrics should I watch during an Azure deployment?

At minimum: HTTP error rates (watch for increases above 0.2% over baseline), p95 and p99 latency (degradation shows here before average latency moves), pod restart counts for AKS workloads, and relevant business metrics like conversion rate or checkout completion.

Monitor at the feature or deployment level, not just at the infrastructure level. "Error rate is up" tells you something is wrong. "Feature X caused a 15% increase in checkout errors" tells you what to fix.

Can I run A/B tests on Azure infrastructure changes, not just product features?

Yes. Experimentation works for engineering validation as well as product changes. Route a percentage of AKS workloads to a new node type, compare caching strategies, or test a new database configuration -- all with the same statistical guardrails you'd apply to a UI experiment. For teams with Azure Synapse Analytics, warehouse-native experimentation computes results directly in your data warehouse without additional ETL overhead.

With AI, The Proof Is in Production

Human and AI code review can’t guarantee production safety. Learn why feature flags, progressive delivery, and metrics-driven releases are essential in the AI software era.

Human review, and AI review, can only get you so far

Let's be frank: the last few years in software engineering have been earth-shattering. The foundations of the discipline have changed. Code can be written, rewritten, tested, and shipped faster than ever before. Agents are burning through trillions of tokens, and every month they get better at turning vague intent into working software.

That is exciting. It is also destabilizing.

Many teams are still built around the assumption that every meaningful change can be understood by a human before it merges. A developer opens a pull request, a reviewer reads it, a test suite runs, and the team decides whether the change is safe enough to deploy.

That model was already under pressure before AI, but now it is breaking.

LLMs can produce code far faster than any team can review it. The volume problem is obvious: if one engineer with an agent can generate several times more change than before, the review queue grows faster than the organization can absorb. The harder problem is trust. Even when a change looks reasonable, and even when another model reviews it, the system still cannot guarantee the behavior of that change in production.

AI review does not eliminate this problem. You can ask a different model, use a different prompt, or build an entire agentic code-review workflow. That can catch real issues. It can improve consistency. It can reduce the burden on humans. But it is still a non-deterministic system evaluating the output of another non-deterministic system. It can tell you what looks wrong. It cannot prove that a change will not degrade production.

Even staging and QA only get you so far. A non-production environment is not, and cannot be, exactly the same as production. It will not have the same traffic shape, data distribution, customer behavior, integrations, timing, scale, noisy neighbors, or failure modes. The closer you make it, the more useful it becomes, but it is still a model of production. It is not production.

So the question is not, "How do we review everything perfectly?"

The better question is, "How do we release in a way that assumes review is imperfect?"

The Old Idea That Suddenly Matters Again

Would you believe that one of the best answers to this problem has existed for a long time?

In December 2009, Flickr published an unassuming engineering post called Flipping Out. The idea was simple: release new features without deploying new code for every feature launch. Flickr described a model where code was merged continuously, deployed from the main branch, and gated behind small runtime switches. A feature could exist in production but remain unavailable until a configuration value flipped it on.

At first, that may not seem directly related to AI-generated code. But follow the thread.

What Flickr was describing is what we now call feature flagging. Combined with trunk-based development, feature flags let teams deploy code continuously without releasing every behavior immediately. The key distinction is simple but profound: deployment and release are not the same thing.

Deployment is getting code into an environment.

Release is exposing behavior to users.

Those two actions are often treated as one event, but they do not have to be. Feature flags are a way to choose between code paths at runtime and explicitly decouple deployment from release. With AI-accelerated engineering, that separation becomes a basic safety requirement.

If AI can generate more changes than humans can manually reason through, then the release system has to become more empirical. It has to answer: what is this feature actually doing to real users, real systems, and real business metrics?

The Game Is Production Feedback

Hiding unfinished work behind if statements is only the beginning. The real value is controlled exposure. A feature can be deployed to production, then released first to internal testers. Then to one percent of users. Then five. Then ten. At every step, you observe the impact before deciding whether to continue.

Production is where the unknowns live. Your tests can tell you whether the code behaves as expected in known scenarios. Your reviewers can tell you whether the change looks reasonable. Your static analysis tools can tell you whether it violates known rules. But only production can show you whether the change behaves well under the messy reality of actual usage.

Why APM Is Not Enough

Most teams already have observability. They have dashboards, logs, traces, alerts, and APM tools. You still need all of that, but aggregate system health is a blunt instrument when the risk is tied to one feature in a partial rollout.

APM tools are usually excellent at telling you something changed in the system. They are much less reliable at telling you which feature caused the change, especially during progressive delivery.

Imagine an AI-generated change increases crash rate by 10 percent for users who receive it. If that feature is only enabled for five percent of traffic, the total crash rate across the whole application may move by only half a percent. That can look like noise. It may not page anyone. It may not even be visible until the rollout expands to 20, 30, or 50 percent of traffic.

Harness FME Release Monitoring is designed around that gap. Rather than looking only at aggregate platform health, Release Monitoring measures the impact of feature flags and experiments on performance and behavioral metrics. If multiple features are rolling out at once, you do not want to know only that the application got worse. You want to know which feature is responsible, which users saw it, and which metric moved.

Metrics Become the Review Layer

Code review does not go away. Human review still matters. AI review still helps. Tests still matter. Security scanning still matters. Production metrics add the control those systems cannot provide on their own: measured impact.

In Harness FME, metrics evaluate the impact of feature flags and experiments on user behavior and system performance. They can measure errors, conversions, page load performance, interactions, satisfaction, sessions, shopping cart behavior, and any other event stream that matters to the product.

"Safe" is not a purely technical word. Depending on the feature, safety might mean error rates stay flat, page loads do not slow down, conversion does not drop, support tickets do not spike, or customers do not start rage clicking their way through a broken flow.

The right guardrails depend on the feature. Engineering leadership may care about latency and error rate. Product leadership may care about adoption and retention. Support may care about ticket volume. The power of a metric-driven release process is that all of those concerns can be defined before the rollout, measured during the rollout, and used to decide whether the feature keeps moving forward.

That changes the AI conversation. Reviewers are no longer being asked to predict every possible effect of a change from the diff alone. The release system is responsible for measuring the effects that actually matter.

Alert, Kill, Learn, Continue

Once metrics are attached to a rollout, the next step is automation.

Harness FME alerts and monitoring can notify teams when metrics cross critical thresholds or when statistically significant impact is detected on key or guardrail metrics. If the impact is negative, the team can stop the rollout, kill the flag, and investigate with a much narrower blast radius than a traditional deploy-and-pray release.

The operational model starts to look different:

AI helps generate the change.
Humans and AI review the change where review adds value.
The change merges behind a feature flag.
The code deploys to production without immediately releasing to everyone.
The feature ramps through controlled production exposure.
Metrics determine whether the rollout continues, pauses, rolls back, or gets killed.

That loop is much more realistic for the AI era than pretending review can scale linearly with code generation.

With FME pipelines, this can also become part of the delivery workflow itself. Harness pipelines can include FME steps for operations like creating or updating feature flags, changing rollout behavior, modifying targets, setting default allocations, and killing a flag. Feature release can move from an ad hoc manual process to an auditable automation path.

AI velocity does not need chaos with better dashboards. It needs disciplined automation with measurable gates.

Production Is the Proof

Software engineering has changed permanently. The amount of code that can be produced by a small team is going up. The number of ideas that can be prototyped is going up. The number of changes waiting to be reviewed, validated, merged, and released is also going up.

But some things have not changed.

Production is still the only environment that is truly production. Users still behave in ways you did not predict. Distributed systems still fail in ways your test plan did not imagine. Business metrics still matter more than whether the diff looked elegant.

So yes, keep reviewing code. Use AI reviewers where they help. Keep improving tests. Keep scanning for vulnerabilities. Keep investing in non-production environments.

None of that is proof by itself.

When features are being written faster than humans can comprehensively review them, the release process has to become empirical. Put the code behind a flag. Release it progressively. Measure the impact per feature. Alert on guardrails. Kill the feature when the data says it is hurting users.

In the age of the LLM, the proof is in production.

The Future of IaC: Continuous Governance Through a Control Plane

Learn how platform engineering teams use infrastructure control planes to reduce drift, enforce governance, and scale self-service safely.

Infrastructure failures increasingly happen after provisioning through drift, unmanaged changes, and fragmented workflows.
Traditional IaC pipelines validate infrastructure at a single point in time, but modern cloud environments require continuous governance.
Effective infrastructure control planes unify provisioning, configuration, policy enforcement, drift detection, and self-service workflows.
Platform engineering teams scale faster when governance is embedded directly into developer workflows instead of layered on afterward.
Internal developer portals only succeed when backed by standardized templates, policy guardrails, and centralized infrastructure controls.

Infrastructure provisioning is no longer the hard part.

Most engineering organizations have already standardized on Infrastructure as Code (IaC), GitOps workflows, Terraform or OpenTofu, and CI/CD pipelines. Provisioning cloud infrastructure has become relatively repeatable.

But operating infrastructure at scale remains deeply fragmented.

That’s the tension platform engineering teams are now dealing with: infrastructure doesn’t typically fail during provisioning anymore because it fails after deployment through drift, inconsistent runtime configuration, policy violations, and unmanaged operational changes.

As cloud environments become more dynamic, traditional infrastructure automation models are showing their limits.

During the recent Harness webinar Designing a Control Plane for Cloud Infrastructure, Rohit, Product Manager for ICM at Harness, and Mrinalini Sugosh, Product Marketing Manager at Harness, outlined why platform teams are shifting from static provisioning workflows toward continuous infrastructure control. That shift fundamentally changes how platform engineering teams need to think about governance, self-service, and infrastructure operations.

Provisioning Isn’t the Hard Part Anymore

The industry has spent the last decade solving infrastructure provisioning.

Terraform, OpenTofu, GitOps workflows, CI/CD automation, and cloud-native APIs dramatically improved infrastructure consistency and repeatability. Most teams can now provision infrastructure reliably through declarative workflows.

But provisioning is only one moment in the infrastructure lifecycle.

Modern environments continuously change:

Auto-scaling modifies infrastructure dynamically
Managed cloud services evolve underneath applications
Teams introduce manual changes during incidents
Runtime tooling drifts independently from IaC definitions
Multiple infrastructure systems operate without shared governance

That distinction matters because most IaC pipelines still operate like transactional systems:

Run plan
Validate configuration
Apply changes
Exit

The problem is that cloud infrastructure does not remain static after deployment.

Traditional infrastructure workflows validate infrastructure at a single point in time. Modern infrastructure requires continuous observation and enforcement.

Infrastructure Drift Is the Real Operational Problem

Infrastructure drift is no longer an edge case.

It’s the default operating condition for most large-scale cloud environments.

A developer updates a security group directly in AWS during an incident. An engineer modifies a Kubernetes runtime configuration outside GitOps. A platform team upgrades infrastructure dependencies manually to unblock production.

The infrastructure technically “works,” but the declared state and actual state no longer match.

Over time, that creates:

Governance gaps
Security inconsistencies
Audit failures
Cost overruns
Broken deployment assumptions
Operational fragility

Rohit described this reality during the webinar as the “glass break” problem:

“In incident scenarios, the instinct is to fix things with ClickOps is the easiest way possible, which leads to drift. If not remediated, after the incident.”

Most organizations attempt to solve this operationally through:

Manual reviews
Separate policy engines
Ticketing workflows
Ad hoc approvals
Disconnected scanning tools

But fragmented tooling compounds the problem.

Infrastructure provisioning, runtime configuration, deployment workflows, security scanning, and self-service portals often evolve independently. Each layer introduces its own operational logic, approval models, and governance controls.

Eventually, the platform itself becomes the source of complexity.

What a Modern Infrastructure Control Plane Actually Does

A control plane changes the operating model.

Instead of treating infrastructure governance as a one-time validation step, platform teams move toward continuous governance:

Desired state is continuously observed
Actual state is continuously measured
Drift is continuously identified
Policy violations are continuously enforced
Remediation becomes operationalized

This is the difference between infrastructure automation and infrastructure operations.

According to the webinar speakers, modern control planes are designed to unify several traditionally disconnected functions into a single operational layer, including infrastructure provisioning, runtime configuration management, policy enforcement, cost governance, drift detection, security scanning, self-service infrastructure workflows, and deployment orchestration. The major architectural shift is that governance is no longer treated as a separate overlay added after deployment, but instead becomes embedded directly into the system itself, including at the design stage.

This approach enables organizations to enforce controls such as blocking unsupported OpenTofu versions, preventing GPU provisioning in development environments, enforcing tagging standards, validating security posture before provisioning, and surfacing projected infrastructure cost changes during approval workflows. As Rohit explained, “You want these gates as part of the release process rather than as an afterthought in production.” This philosophy aligns closely with modern platform engineering models, where governance is automated, centralized, and reusable across teams and environments.

The 4 Core Capabilities of an Effective Infrastructure Control Plane

1. Unified Provisioning and Configuration Workflows

Most enterprises still manage infrastructure provisioning and runtime configuration through separate operational systems. Infrastructure is commonly provisioned with Terraform, runtime environments are configured with Ansible, deployments are managed through CI/CD pipelines, and security tooling operates independently from the rest of the delivery process. This fragmented approach creates operational silos, duplicate governance workflows, policy inconsistencies, fragile integrations, and significant platform maintenance overhead.

Modern control planes address this problem by consolidating these functions into a unified operational model. During the webinar, Harness demonstrated how OpenTofu and Terraform provisioning, Ansible configuration management, CI/CD orchestration, security scanning, approval workflows, cost visibility, and drift monitoring can all operate within a single system. By reducing the amount of platform “wiring” required between tools, organizations can establish more consistent governance patterns across the entire software delivery lifecycle while simplifying operational management.

This approach also aligns with broader trends in continuous testing in CI/CD, AI-driven software delivery, and GitOps deployment automation, where operational consistency and automation become foundational platform capabilities.

2. Embedded Policy and Security Controls

Governance at scale cannot rely on tribal knowledge or manual review processes. High-performing platform engineering teams operationalize governance through reusable policies, standardized templates, and inheritance-based control models that can be applied consistently across environments and teams.

The webinar highlighted several examples of this model in practice, including OPA policy enforcement at the account, organization, and project levels, design-time validation before provisioning, embedded security scanning with tools such as Checkov, approval gates enriched with cost and compliance data, and reusable “golden provisioning pipelines.” These capabilities demonstrate how governance can be integrated directly into platform workflows instead of being treated as a separate operational layer.

Manual governance processes do not scale effectively in modern infrastructure environments. Policy-as-code approaches allow platform teams to standardize controls globally while still preserving flexibility for individual development teams. This reduces approval bottlenecks, accelerates compliance workflows, and increases developer autonomy without compromising security or operational consistency.

Well-designed guardrails often improve delivery speed rather than slowing it down because developers can operate within predefined safe boundaries. This principle has become central to modern platform engineering, where governance is designed to be automated, centralized, and reusable across the organization.

3. Drift Detection and Remediation

Many infrastructure as code systems still approach drift detection reactively, and in some environments, drift may go undetected entirely. Modern control planes instead provide continuous monitoring of infrastructure state and compare deployed resources against declared configurations in real time.

Harness demonstrated several capabilities designed to improve operational visibility and auditability, including full infrastructure state version history, attribute-level drift visibility, continuous monitoring for external configuration changes, and historical comparisons across versions. These features help platform teams identify configuration deviations earlier while also improving traceability during incident investigations and operational reviews.

More importantly, continuous drift monitoring enables organizations to move toward proactive remediation models rather than depending entirely on manual operational intervention. As infrastructure environments continue to scale, automated drift detection and remediation are becoming increasingly important because manual review processes cannot keep pace with the volume and complexity of modern cloud infrastructure.

4. Self-Service With Guardrails

Self-service infrastructure without governance often leads to uncontrolled infrastructure sprawl, which is one reason many Internal Developer Portal initiatives struggle after initial adoption. Exposing powerful infrastructure capabilities without consistent operational guardrails can create additional complexity instead of improving developer productivity.

Modern platform engineering requires organizations to balance several competing priorities simultaneously, including developer autonomy, operational consistency, security requirements, cost governance, and compliance enforcement. The most effective platform teams solve this challenge through standardized operational patterns such as golden templates, centralized policy inheritance, reusable provisioning pipelines, embedded approval workflows, standardized workflows, and carefully controlled abstractions.

This model allows developers to provision and manage infrastructure independently while still operating within safe and compliant boundaries. By embedding governance directly into self-service workflows, organizations can improve developer experience without requiring every engineering team to develop deep expertise in the underlying complexity of cloud infrastructure and platform operations.

The Shift From Infrastructure Automation to Infrastructure Operations

Infrastructure automation solved provisioning.

Platform engineering now needs to solve operations.

That requires shifting from:

Static validation → continuous governance
Tool-centric workflows → system-centric workflows
Manual reviews → embedded controls
Infrastructure provisioning → infrastructure lifecycle management

The control plane model reflects that evolution.

It’s not simply another IaC orchestration layer.

It’s an operational framework for continuously governing infrastructure delivery across provisioning, configuration, deployment, security, and self-service systems.

As infrastructure complexity grows, this architectural shift is becoming less optional.

It’s becoming foundational to how modern platform engineering organizations operate at scale.

FAQ

What is an infrastructure control plane?

An infrastructure control plane is a centralized operational system that continuously manages provisioning, governance, policy enforcement, drift detection, and infrastructure lifecycle workflows across cloud environments.

How is a control plane different from Infrastructure as Code?

Infrastructure as Code defines desired infrastructure state. A control plane continuously observes, governs, validates, and operationalizes infrastructure after deployment.

Why is infrastructure drift a major problem?

Drift creates inconsistencies between declared infrastructure and actual runtime environments, increasing security risk, operational instability, audit failures, and troubleshooting complexity.

What role does platform engineering play in infrastructure governance?

Platform engineering teams create standardized workflows, templates, guardrails, and self-service systems that allow developers to provision infrastructure safely and consistently.

How do control planes improve developer self-service?

Control planes provide reusable templates, embedded governance, and policy enforcement that allow developers to self-service infrastructure without introducing operational risk.

What are “golden paths” in platform engineering?

Golden paths are standardized workflows, templates, and operational patterns that simplify software delivery while enforcing security, governance, and operational best practices.

Why do Internal Developer Portals need governance?

Without governance, self-service platforms can increase infrastructure sprawl, security gaps, and operational inconsistency by exposing powerful infrastructure workflows without guardrails.

How does Harness support infrastructure control planes?

Harness combines Infrastructure as Code Management (IaCM), Internal Developer Portals (IDP), CI/CD, governance, security scanning, and drift detection into a unified software delivery platform.

Conclusion

Cloud infrastructure has evolved far beyond static provisioning workflows, making infrastructure deployment alone insufficient for maintaining governance, operational consistency, security, and reliability at scale. Modern platform engineering teams require systems that continuously observe infrastructure state, enforce policies, validate configurations, detect drift, and operationalize governance throughout the entire infrastructure lifecycle rather than only during deployment events. This shift is driving the emergence of infrastructure control planes as a foundational operating model for modern platform teams. By embedding governance, automation, visibility, and self-service capabilities directly into infrastructure workflows, organizations can improve developer autonomy while maintaining centralized operational control. Solutions such as Harness Infrastructure as Code Management and Internal Developer Portal capabilities are designed to help platform teams operationalize continuous governance, proactive drift detection, and scalable self-service infrastructure delivery across increasingly complex cloud environments.

Announcing OPA Policy Evaluation on Your Own Infrastructure

Harness solves the firewall dilemma for OPA. Shift-left governance-as-code while keeping API tokens and internal systems secure within your local perimeter.

Let's face it: "move fast and break things" is a great way to end up sitting in a war room at 3:00 AM. Engineer burnout is at record highs, we don’t need sloppiness to hurt us further.

Look. Here’s the reality: thanks to AI code generation tools, we are writing more code than ever before. Delivering that with pipelines built for human-speed development? That’s become the chokepoint. Everything in delivery needs to get faster and better. That includes governance.

We’ve long used Open Policy Agent (OPA) to embed automated governance directly into delivery pipelines to stop teams from cutting corners. OPA is Policy as Code and by default evaluates on our secure cloud infrastructure. But for large, highly regulated enterprises, corporate firewalls and strict data residency rules present a classic dilemma:

What happens when a policy needs to access data that resides within a corporate firewall? How do we run these policies so that they connect to internal systems securely and access that data within the corporate trust boundary?

We’re tackling that challenge now. New to Harness is the ability to evaluate OPA Policies on Local Infrastructure.

‍

The Architectural Hurdle: Firewalls & Local Secrets

Platform and security engineering teams love OPA because it allows them to gate pipelines based on real-time business logic. For example, you may want to implement a waiver or exceptions workflow that grants a one-time exception to a specific Policy from being broken. And you may want to track that a waiver was issued in a ticketing system like ServiceNow.

However, executing this evaluation in a standard SaaS model breaks down when:

The Target System you are querying is Inbound-Protected: Your internal ticketing system, database schema verifier, or proprietary security scanner lives deep behind your corporate firewall.
Secrets Must Stay Local: To query that internal system, OPA needs an API token, certificate, or password. Sending that credential to an external cloud environment—even one as secure as Harness—is often an immediate veto from Chief Information Security Officers (CISOs).

Historically, teams had to choose between drilling holes in their firewall, duplicating infrastructure, or reverting to manual spreadsheets and agonizing verification meetings.

‍

Enter Local OPA Evaluation on Kubernetes

With this new capability, Harness lets you direct the OPA evaluation engine to run in your own environment (specifically on your local Kubernetes clusters).

How It Works

Instead of pulling your secure internal metrics out to the cloud for policy validation, Harness sends the evaluation intent down to your local cluster. The evaluation triggers locally, pulls secrets natively from your secure environment, queries your private behind-the-firewall tools, and passes a simple, immutable Pass/Fail status back to the Harness pipeline

This approach delivers the best of both worlds: the ease and scalability of a unified platform control plane, backed by the absolute security of local execution

See It: Gating Pipelines on Secure Ticket States

Consider a classic enterprise scenario: gating a production deployment based on an internal ticketing system.

If the ticket is approved, the sync proceeds automatically. If the ticket is canceled, pending, or in an unexpected state, the pipeline halts or triggers an automated rollback strategy before any risk is introduced to production. Because the execution stays within your perimeter, your ticketing credentials remain entirely untouched by external systems.

Check out this quick demo video to see exactly how to configure your Kubernetes cluster to handle OPA evaluations locally:

‍

Use Cases

Use Case 1- Allowing for OPA waivers/exceptions

A common pattern we saw amongst our customers was they wanted an “exceptions” or “waiver” workflow where customers, for certain use cases, could waive a failed OPA policy for a particular scenario. Let’s take the following example:

You have a pipeline that has an OPA policy mandating that there’s >95% test coverage before a deployment is done
A hotfix comes in at the last minute that fails the 95% test coverage
Given the urgency of the situation, you want to bypass the OPA policy

In these kinds of situations, teams often want some kind of mechanism to allow a waiver where they allow the pipeline to run this one specific time due to special circumstances. Additionally, customers want to keep track that a waiver was issued in a third-party ticketing system (like JIRA or ServiceNow). With the Local OPA evaluations capability, you can now write policies that query the internal ticketing system as shown above.

‍

Use Case 2 - Using OPA to check for Pipeline Tampering

Another common authorization workflow we saw was customers trying to ensure that their pipeline YAMLs hadn’t been tampered with. For example, customers often want to ensure that the pipeline they have authored and stored in Harness SaaS is exactly the one that runs at the time of deployment. They want to ensure that no third party tampers with the pipeline YAML before it is actually being run. The approach we saw customers take was the following:

They would author a pipeline in Harness SaaS
They would take the pipeline YAML and take a hash of the pipeline
They would store the hash of the pipeline within their internal database/system
At the time of the pipeline actually running, they compare the hash of the “correct pipeline” with the hash of the pipeline being run to check for equivalence

The steps outlined above allow for ensuring that nobody has tampered with the pipeline’s yaml before it is run. However, to write a rego policy that can actually do a hash code equivalence check (step 4) you need to make a call to the internal database system where the hash code of the correct pipeline lives. This again necessitated having the rego policy read credentials and connect to a 3rd party system. Again, one way to solve this problem was to allow customers to run these OPA policies on their own K8s clusters.

‍

Use Case 3 - Very Large or Sensitive Payloads

Finally, some customers use our custom policy step action to perform an authorization check midway through a pipeline. For several of these situations, customers want to send data for the OPA policy to check that is sensitive in nature. For such use cases, they don’t want the sensitive payload to be sent to the OPA service running in Harness SaaS. Instead they want the payload to be sent to the OPA rego policy running in their own infrastructure.

‍

Zero Friction, Maximum Compliance

So, what does this mean for your daily operations?

The beauty of local OPA evaluation is that your developers won't notice a single change in their daily workflow. They continue to leverage the fastest builds and automated continuous delivery pipelines they love.

Meanwhile, Platform Leaders gain a comprehensive, immutable audit trail of every single evaluation, ensuring painless compliance reviews without hampering developer velocity.

Ready to eliminate toolchain chaos and secure your deployment guardrails? Get started with Harness Continuous Delivery & GitOps today.

Mainframe DevOps: Modern CI/CD for Big Iron

Drive developer productivity by replacing brittle, legacy mainframe scripts with declarative, secure, and fully automated multi-tier release pipelines.

June 10, 2026

Time to Read

For Platform Engineering teams, the goal has always been clear: build a secure, scalable internal developer platform that reduces cognitive load and accelerates time-to-market. Yet, a massive obstacle often remains hidden in plain sight: the mainframe.

While your distributed teams are shipping cloud-native microservices multiple times a day, your core backend mainframe applications frequently remain locked in an isolated silo, lagging behind on slow monthly or quarterly cadences.

The reality of modern enterprise software is deeply interconnected. A single customer-facing feature might require an update to a mobile front-end running in the cloud, an API layer, and a core COBOL application running on a mainframe. When these components are fractured across disconnected deployment tools, it creates an operational nightmare for platform teams.

It is time to eliminate the legacy boundaries. Here is how you can bring mainframe applications out of isolation and orchestrate them alongside your distributed, cloud-native stack using a single, unified developer platform.

One strategic CI/CD platform

Maintaining separate toolchains (modern CI/CD platforms for the cloud and legacy, script-heavy workflows for the mainframe) forces platform teams to absorb massive technical debt.

Eliminate Toolchain Chaos: Operating disparate point solutions for different hosting tiers compounds your team's maintenance overhead and integration toil.
Consolidate Visibility and Insights: Fragmented tools create a complete blind spot. Without a single pane of glass, it is nearly impossible for platform leads to pull accurate, process-agnostic DORA metrics across the entire enterprise portfolio.
Mitigate Release Coordination Risk: When complex applications have mainframe backends and distributed front-ends, cross-tier releases quickly turn into a chaotic mess of manual spreadsheets, endless sync meetings, and high change failure rates.

By pulling mainframe applications into the same automated platform that governs your cloud environments, you deliver a consistent developer experience, enforce centralized standards, and significantly reduce total cost of ownership (TCO).

With advances in mainframe build-and-deploy tooling, orchestration is easier than ever.

See Mainframe CI/CD in Action

Want to see how easy it is to replace manual compilation and deployment routines with an elegant, visual pipeline template? Watch this brief demonstration highlighting the end-to-end integration between modern orchestration, IBM DBB, and Wazi Deploy:

‍

Modern Mainframe Pipelines: Declarative, Automated, and Secure

Bringing modern CI/CD to the mainframe doesn't require a risky architectural rewrite; it requires wrapping your "Big Iron" infrastructure in a modern, pipeline-driven automation layer. Harness seamlessly integrates with your existing IBM ecosystem and your broader DevSecOps toolchain to make mainframe delivery as repeatable and secure as any cloud deployment.

1. Automated, Smart Builds with IBM DBB

Instead of relying on tribal knowledge or manual build scripts, your platform can natively trigger utilities like IBM Dependency Based Build (DBB). Your centralized continuous integration pipeline orchestrates the workflow, while DBB analyzes code changes and manages dependencies to compile only what is necessary directly on z/OS.

2. Shift-Left Security Gates

Incorporate policy-as-code and automated security scanning tools directly into the mainframe lifecycle. By embedding static analysis or open-source vulnerability scans straight into the pipeline, you can flag risks early and prevent security issues from escaping into production without adding developer friction.

3. Standardized Deployments with Wazi Deploy

When binaries are ready to move through your testing and production environments, the platform handles the deployment mechanics by executing IBM Wazi Deploy. This replaces highly customized, brittle deployment scripts with a structured, declarative configuration that updates application components natively on z/OS.

Taming Complex, Multi-Service Releases

The biggest win for a Platform Engineering Lead is solving the "pipeline of pipelines" dilemma. When a synchronized product release requires coordinating dependencies across separate teams, technologies, and cadences, you need a powerful orchestration engine.

Harness moves beyond isolated, single-service pipelines to provide Enterprise Release Orchestration. This gives your platform team a visual, unified calendar and workflow engine to cleanly sequence dependencies across both distributed and mainframe pipelines.

Every action is governed by granular, environment-aware role-based access control (RBAC), built-in approval workflows (such as Jira or ServiceNow integrations), and a comprehensive, immutable audit trail. If a deployment fails at any tier, the platform provides immediate visibility into the root cause, protecting system uptime and shielding your organization from compliance risks.

Shai-Hulud Miasma: Inside the Compromise of Red Hat’s Packages

An in-depth look at the Miasma supply chain attack that compromised Red Hat npm packages. Learn how the malware spread, stole credentials, abused trusted publishing, and the steps teams can take to mitigate risk.

The Shai-Hulud lineage has a new face. On June 1, 2026, security teams independently flagged a fresh supply chain compromise inside the @redhat-cloud-services npm namespace. 32 packages and 96 versions were all republished with a credential-stealing worm.

These aren't typosquats. They are the official packages in a trusted scope, pulling somewhere 80,000-117,000 average weekly downloads. This article walks through how one compromised maintainer account turned Red Hat's own CI/CD pipeline into a malware channel, what is actually new under the hood versus earlier Shai-Hulud waves, and how to clean it up without tripping the worm's self-destruct.

Preface

Open-source ecosystems run on trust and most of that trust is now automated. A modern build pulls hundreds of transitive dependencies, publishes through CI/CD with nobody watching and checks provenance to prove an artifact came from where it claims. Provenance can tell you where a package was built but can't tell you if the build environment was clean.

“Miasma is what happens when an attacker stops trying to fake that trust signal and just earns it from inside a pipeline that already has it.”

Introduction

Miasma is a multi-stage dropper. It runs during npm installation, scans the machine and any reachable cloud for credentials, then republishes itself through every package the stolen tokens can reach. It's a direct descendant of the Mini Shai-Hulud worm. What changed is the packaging: the wrapping, the staging, and the disguise. Where Shai-Hulud used Dune references, Miasma switches to Greek mythology hence naming things "spartan" and labeling its exfiltration repos Miasma: The Spreading Blight.

Here's what actually separates this wave from earlier Shai-Hulud activity:

Every infection gets its own encryption.
- Instead of copying itself byte for byte, the malware generates a uniquely encrypted payload per infection. So a hash-based IOC is only good for one package version, which quietly breaks both version tracking and signature detection.
It goes after cloud identities, not just secrets.
- New collectors enumerate every GCP and Azure identity the infected machine can reach. Earlier variants mostly grabbed the static keys.
It abuses trusted publishing from a hijacked account.
- Rather than steal a long-lived npm token, the attacker pushed commits that requested short-lived OIDC tokens inside GitHub Actions and published with valid SLSA provenance. It's the same trick we saw in the TanStack and Bitwarden compromises.

Timeline

Date	Target	What happened
Apr 22, 2026	`@bitwarden/cli`	Compromised via a poisoned GitHub Actions workflow. Payload self-identifies as "Shai-Hulud: The Third Coming."
Apr 29, 2026	4 SAP npm packages	Compromised using an npm token leaked through a malicious CircleCI pull-request build.
May 12, 2026	160+ packages (incl. Mistral, TanStack)	Mini Shai-Hulud hits in one wave; malicious releases carry valid provenance.
May 12, 2026	Shai-Hulud source code	TeamPCP publishes the full source to GitHub and posts on BreachForums urging others to run their own campaigns.
May 29, 2026	First Miasma marker	First repo carrying the Miasma: The Spreading Blight description appears, likely the actor testing the variant.
Jun 1, 2026	30+ `@redhat-cloud-services` packages	Red Hat Cloud Services npm packages compromised ("Miasma").

‍

Deep Dive Into the Miasma Compromise

This wasn't a stolen token push. It happened inside Red Hat's own release infrastructure. A Red Hat employee's GitHub account was taken over and used to commit straight into internal repositories hence skipping the code review step entirely. Here's how it played out:

Step 1: The Big Picture

Red Hat publishes free software building blocks (called "packages") that thousands of other developers download and use in their own apps. An attacker found a way to poison those building blocks so that anyone who downloaded them would get secretly hacked. This kind of attack is called a "supply chain attack." Instead of breaking into your house, they forge the lock before it ever reaches the store.

Step 2: How they got in

The attacker didn't steal a password or a key. They hijacked a Red Hat employee's GitHub account and quietly slipped their own code into Red Hat's project. Normally, any code change gets reviewed by another human first. But they used a sneaky trick of Orphan commits that let the changes bypass that review making sure that nobody saw them go in.

Step 3: The clever part about "trust"

The industry recently moved to a system where instead of using permanent passwords to publish software, the publishing system hands out temporary and single-use permission slips (short lived tokens).

The idea was "no permanent password to steal means safer."

But the attacker had taken over the machine that creates those permission slips. So every poisoned package came stamped with a legitimate "this was built by a trusted system" seal of approval which was technically true and completely useless because the trusted system itself was compromised.

Step 4: The trap springs instantly

They rigged the poisoned packages so the malicious code runs the instant you install them, before you can read the code or before anything looks wrong. One giant red flag they point out: one of the infected packages was supposed to contain only text definitions (no programs at all), yet it was set up to run a program on install. That's like a sealed envelope that somehow starts ticking.

Step 5: Hiding the malware

The actual malicious code was buried under layers of disguise. It was scrambled, encrypted and rebuilt from lists of numbers, specifically to fool automated security scanners. It also quietly downloads its own tools if your machine doesn't already have them to make sure that it works on almost any computer.

Step 6: Stealing credentials

Once running, it grabs everything it can. It reads environment variables, host details, and local credential files, pulls GitHub CLI tokens with gh auth token and scans the filesystem for secrets that match known patterns. It doesn't stop at files on disk. If it has a valid identity, it queries cloud metadata services, reads from AWS Secrets Manager and SSM Parameter Store, pulls Azure Key Vault and GCP Secret Manager values and lists Kubernetes and Vault secrets. On CI runners it can even read secrets out of the runner's memory, which gets around log masking because the secret is never written to a log.

Representative token patterns searched by the payload include:

Step 7: Hiding the exfiltration as Anthropic traffic

To smuggle the stolen secrets out without setting off alarms, the malware sent data to a web address that looks normal. The full address is hxxps[:]//api[.]anthropic[.]com/v1/api, which is a real Anthropic host. A plain GET to it returns Anthropic's normal 404 not_found_error, so /v1/api isn't a real route and Anthropic's systems were not compromised. The point is to cover. The domain looks harmless in network logs and the path looks like an API call. It's also awkward to block, since lots of companies legitimately call Anthropic.

The malware reuses the same "GitHub dead-drop" trick from earlier Shai-Hulud versions. If it finds a working GitHub token, it uses it to create a public repo on the victim's account and saves stolen data there as JSON files (under a results/ folder, named with a timestamp and counter). The repo gets a random name in the form adjective-noun-number and its description is set to a fixed string.

“Miasma: The Spreading Blight”

When the payload includes a stolen token in a commit message, it uses the threat marker:

IfYouInvalidateThisTokenItWillNukeTheComputerOfTheOwner

Step 8: Spreading itself

Why it spreads like a worm: This is the nastiest part. When the malware finds credentials that can publish software, it infects those packages too and republishes them so the infection jumps from victim to victim automatically similar to the way a real worm or virus spreads. Researchers found it in over 200 infected projects. It also has multiple hidden backup copies of itself buried around GitHub so even if you clean one up it is designed to crawl back.

Affected Packages

The names of some of the affected packages are:

@redhat-cloud-services/vulnerabilities-client
@redhat-cloud-services/tsc-transform-imports
@redhat-cloud-services/topological-inventory-client
@redhat-cloud-services/sources-client
@redhat-cloud-services/rule-components
@redhat-cloud-services/remediations-client
@redhat-cloud-services/rbac-client

Mitigation

The single biggest takeaway for a normal developer: be suspicious when installing a package triggers programs to run, especially a package that has no business running anything. That `preinstall` behavior was the whole foundation of the attack.

Because of the dead-man switch, sequence is the whole game. Work through these in order:

Isolate first by taking infected machines and CI runners offline. Save logs and then remove the malware's persistence.
Then rotate the keys.
Remove bad packages by uninstalling any affected @redhat-cloud-services version. Reinstall a clean one and regenerate the lockfiles.
Block install scripts by running npm ci --ignore-scripts in CI so the preinstall hook can't run.
Fix the pipeline and don't allow workflows that run on any branch with id-token: write and also review every commit.
Search for traces by looking for new repos named Miasma: The Spreading Blight and odd patch-version published.

How Harness Supply Chain Security Helps

Harness SCS helps you quickly detect and contain compromised dependencies like the redhat-cloud-services package before they impact your pipelines. With real-time visibility into your SBOMs and dependency graph, you can identify affected versions, trace their usage across builds and environments and block them using OPA policies. This ensures malicious packages never propagate through your CI/CD or AI workflows.

Detect Compromised Packages

Harness SCS enables instant search across all repositories and artifacts to quickly identify if compromised package versions exist in your environment. The moment such a malicious package is disclosed, you can pinpoint its presence and assess impact across your entire supply chain in seconds.

Block Compromised Packages

Harness AI streamlines response to incidents like the redhat-cloud-services package compromise through simple natural-language prompts. With a single prompt, you can generate OPA policies to block affected versions of redhat-cloud-services packages, for example, across all pipelines, preventing malicious packages from entering builds or deployments. As new compromised versions emerge, these policies can be quickly updated to maintain strong preventive controls across your SDLC.

Harness SCS automatically detects compromised versions across both production and non-production environments. Teams can track remediation, assign fixes and monitor progress through to deployment, ensuring exposed credentials and vulnerable dependencies are addressed quickly. This end-to-end visibility helps contain the impact and prevents compromised packages from persisting in your supply chain.

Next Steps In The Face Of Supply Chain Attacks

The Mini Shai-Hulud worm highlights how quickly a malicious package can expose high-value secrets when embedded deep within registries and CI runners. Given its role in managing dependencies and packages across projects, the impact extends beyond code to API keys, prompt data and downstream systems, often bypassing traditional security checks.

Defending against such attacks requires more than reactive fixes. Teams need real-time visibility into dependencies, the ability to enforce policies to block compromised versions and continuous tracking to ensure remediation is complete across all environments. Harness SCS enables teams to quickly identify where affected package versions are used, prevent them from entering new builds and ensure fixes are consistently rolled out.

With these controls in place, organizations can limit credential exposure, contain threats early and secure their supply chain against attacks like the redhat-cloud-services compromise.

Get Ship Done: Everything We Shipped in May 2026

See 60+ Harness updates from May 2026 across measuring AI ROI, AI-native development, software delivery, and security.

AI coding tools promise faster development. What they don't show you is the queue forming at the pipeline, the security scanner you bypassed to stay fast, or the cost dashboard with a line now labeled "unknown" that is steadily growing. In May, we shipped 60+ features in 31 days across the entire delivery system: not just the editor, but everything downstream of it.

May Highlights

Measuring AI investment with two extremely relevant capabilities: AI spend finally has a home in your cost dashboard, and AI adoption finally has a metric. Cloud and AI Cost Management now tracks AI infrastructure as a first-class spend category alongside traditional cloud costs. AI DLC Insights now correlates AI assistant adoption against the productivity outcomes it is supposed to drive. Read the announcement.
Harness landed in the Claude Connectors Directory, giving Claude users direct access to pipelines, builds, deployments, security scans, and approvals from inside the Claude interface. Read the announcement.

‍

AI-Native Development: MCP at Pace

Software Delivery Intelligence, Now Inside Claude (Code and Desktop)

The Harness MCP Server is now in the official Claude Connectors Directory. Developers using Claude can now discover and connect to Harness, gaining structured, real-time access to their pipelines, deployments, approvals, and delivery workflows. What makes this different from a typical API integration is what's underneath: the Harness Software Delivery Knowledge Graph, which gives Claude the context it needs to make decisions that are accurate, fast, and safe.

The MCP Server in May: From Early Access to Production-Ready

Our MCP Server is evolving fast! Seven releases across 31 days. The month started with control and safety work: configurable autonomy levels, per-session trust boundaries, human-in-the-loop execution waits, six CVEs patched, and guardrails around destructive operations. It ended with expanded reach: IaCM workspaces, full DBSchema CRUD for database operations, Ansible support, and GPT app readiness with structured output and tool annotations. If you are building agentic pipelines on top of Harness, or want your AI coding assistant to drive deployments, infrastructure changes, and database schemas without leaving the IDE, this is the server to connect to. Read the docs.

Skills Library

A curated library of skills distills common prompt patterns from internal usage into structured instruction files. The library includes security-specific skills and is packaged for use with the MCP Server, Claude Code, Cursor, and GitHub Copilot. The model follows the skill; the engineer describes what they want. Read the docs.

‍Google Code Wiki and Deepwiki Integration

The Harness MCP server is now indexed by Google Code Wiki and Deepwiki (Cognition/Windsurf). Devin and Windsurf users can analyze the MCP server architecture and ask questions about it directly. The Code Wiki updates automatically from commits.

Know What Your AI Is Doing and Keep AI Secure

AI APIs, MCP tools, and models are now first-class assets in the platform, not afterthoughts in a traditional API inventory.

‍Sensitive Data Detection in AI Prompts and Responses

Open any discovered AI API from the AI Assets inventory and see what sensitive data is being processed in prompts and model responses. Exposure trends, data locations, and classifications are surfaced inline. This identifies high-risk AI APIs based on actual runtime behavior, not how they are configured. Learn more.

‍

Service, MCP Server, and Environment on Issue Details

Issue Details now surfaces exactly where an issue is occurring: which service, which MCP server, and which environment, without leaving the side sheet. Previously, pinpointing issue context required navigating across views.

‍Span Attributes for Live Traffic Policy Scoping

Live traffic policies now evaluate only spans that match specific attributes, such as HTTP status codes. Detections are contextual rather than applied universally to all traffic. The evidence in each detection shows which spans actually triggered it. Docs

‍UI for Span-Attribute-Based API Exclusion Rules

Define API exclusion rules based on span attributes directly in the UI. Select status codes or specific headers to exclude APIs from discovery, giving precise control over what appears in the API inventory.

‍Entity Derivation for Bot and Abuse Protection

Extract, transform, and standardize application-specific attributes from API traffic and use them in Bot and Abuse Protection policies. Previously, detection rules were limited to predefined attributes. Custom entities derived from traffic patterns can now feed directly into policy evaluation. Docs

‍Rule Evaluation Point Support in Exclusion Policies

Configurable rule evaluation behavior for exclusion rules enables exclusions to be applied based on your deployment model, whether through a tracing agent or Traceable Edge. Docs

‍Granular RBAC and Environment-Level Scoping

Environment-level scoping now covers APIs, policies, configurations, and security insights consistently across the platform. Access is restricted to authorized environments, and policy management is environment-aware. Docs

Security in the Pipeline

Keyless Artifact Signing

Sign and verify artifacts without managing long-lived cryptographic keys. Identity-based authentication replaces key management, eliminating the rotation burden that makes key-based signing operationally painful at scale. Docs

‍License Family Classification for SBOM

SBOM components are now automatically grouped by license family. Teams get a portfolio-level view of open-source license risk without reviewing individual component licenses one by one. Docs.

‍Typosquatting and Malicious Package Detection

Two new risk signals are now checked during OSS dependency scanning: packages named to look like popular libraries (typosquatting) and known malicious packages. Added to the existing supply chain risk checks. Docs

Faster, More Reliable Builds

Flaky Test Detection (Beta)

Test Intelligence now identifies tests that pass and fail intermittently without consistent code changes as the cause. Flaky tests can be quarantined, removing them from pipeline gate decisions while tracking their instability over time. Previously, flaky tests failed pipelines with no actionable root cause. Read the docs.

‍Docker Connector Support for Custom Build Images

Bring Your Own Image (BYOI) workflows in Harness Cloud now support Docker connectors pointing to private registries. Teams with custom build container images hosted in private registries can use them for Harness Cloud builds without pushing to a public registry first. Release notes

‍Network Egress Restrictions in UI

Configure egress allow lists for Harness Cloud Linux and Windows build VMs directly from the Harness UI. Previously required manual configuration outside the product.

‍Test Splitting Accuracy

Test Intelligence now uses historical average durations for more balanced test parallelism. The split_tests binary previously required timing data in a specific format; it now also supports average-based timing, making accurate splitting available to more test suites.

Connector validation tasks and SCM tasks for proxy-enabled connectors are now routed through Harness Cloud delegates, ensuring both validation and source code operations work correctly for PrivateLink setups. These are behind feature flags.

Deploy More Safely

OIDC Delegate Selectors for AWS

Pass delegate selector information as AWS session tags in OIDC tokens. IAM policies can now restrict which Harness delegates execute which tasks, providing environment-level secret isolation without relying on environment naming conventions. Works across connector validation, deployment stages, and custom stages. Release notes

‍Dry Run Validation API

A new API endpoint validates pipeline YAML changes before they are committed to Git. Runs schema validation, template expansion, and OPA policy evaluation without executing the pipeline. Useful for pre-commit checks in IDEs or CI gates on pipeline repositories.

Artifact Registry

Soft Delete for Packages

Deleting a package or version now moves it to a recoverable state rather than removing it immediately. Teams that accidentally delete an artifact a running deployment still depends on can recover it before anything breaks. Permanent deletion is available from the same dialog when that is the intent.

‍Swift and Raw Package Support

Two new formats are now supported. Swift packages work with full SwiftPM compatibility: authenticate, publish, and resolve dependencies using the registry URL with no changes to existing workflows. Raw artifact storage handles arbitrary files by path: binaries, archives, reports, configuration files, anything that does not belong to a package manager ecosystem.

Dependency Firewall: Exemptions and Notifications

The Dependency Firewall now supports exemptions and policy action notifications. Whitelist trusted dependencies that should bypass firewall rules, and configure alerts that fire when the firewall blocks or flags a package. Teams get granular control over what gets blocked without having to audit the firewall log manually to know when it acted.

Audit Dashboard for Package Uploads and Downloads

A new dashboard records every package upload and download across all registries with full attribution: who performed the action, when, and on which package and version. Provisioned automatically for accounts with Artifact Registry enabled. Useful for compliance reviews, security investigations, and understanding artifact consumption patterns across teams. Release notes

Database DevOps Updates

Harness Code Repositories as a schema source

Harness Code Repositories can now be used as a source during DB Schema configuration and execution workflows.

Tagging Behavior

‍Enhanced tagging for database changesets improves consistency and traceability during migration workflows. Release notes here.

Purchase Credits API reliability

‍Database operations in the Purchase Credits API are now atomic, with enhanced logging for overage details during credit resets.

Know What Your AI Costs

Software Engineering Insights is now AI DLC Insights (Development Lifecycle Insights). Cloud Cost Management is now Cloud and AI Cost Management. Both capabilities reflect an expanded scope for the existing products: AI is now a first-class dimension in both products, not a filter you apply after the fact. Read the announcement

‍Cost Explorer with AI/ML Workload Visibility

Cloud and AI Cost Management's Cost Explorer now surfaces AI/ML spending alongside traditional cloud costs in a unified view. As teams add GPU instances, inference endpoints, and model API spend, that usage now appears in the same dashboards as the rest of the cloud bill. Docs

‍Data Job Status

Real-time visibility into the cloud cost data pipeline. When billing data from AWS, Azure, or GCP is delayed, failed, or stale, the Data Job Status page now shows the actual state. Previously, stale billing data produced incorrect recommendations and anomaly alerts with no indication that the underlying data had a problem. Docs

‍Cost Settings for Recommendations

A rebuilt, tabbed configuration experience for AWS and Azure recommendation cost preferences. AWS supports Passthrough Cost for both uniform and mixed account configurations, with per-account cost-type visibility. Azure adds selectable options for Amortized and List Price views of recommendation costs. Release notes

Engineering Metrics That Reflect Actual Human Work

AI Summaries and Insights Dashboard Enhancements

AI DLC Insights dashboards now surface AI-generated summaries alongside DORA metrics, productivity data, and workflow visualizations. The goal is to reduce the gap between "here is the chart" and "here is what to do about it." Docs

‍PR Cycle Time Excludes Bot-Generated Review Comments

The Productivity Insights dashboard now strips bot-generated review comments from PR Cycle Time calculations. Cycle time now reflects human reviewer activity only, which is the number that matters for understanding team throughput. Release notes

‍Custom Date Range on Dashboards

All dashboards on the Insights page now support a custom date range beyond the default presets. Analyze metrics over any time window, useful for quarterly reviews, incident post-periods, and year-over-year comparisons. Docs

‍Enable or Disable Developer Filtering for Lead Time for Changes

Control whether Lead Time for Changes honors developer filters at the team level from Team Settings. Gives engineering teams more precision in how DORA metrics are calculated and attributed across distributed or shared-team structures. Docs

‍ServiceNow Integration

ServiceNow is now a data source for engineering insights. Ingest, normalize, and analyze ITSM data directly within dashboards. DORA metrics can be calculated from ServiceNow incident and change management records for teams where ServiceNow is the system of record. Docs

qTest Integration

Test management data from qTest Cloud now flows into AI DLC Insights via API key authentication. Docs

Feature Flag Governance

FME Policy as Code: Environments and Segments

The OPA-based policy framework for Feature Management now covers environments, segments, and segment definitions. Teams can enforce consistent governance standards across the full FME configuration surface, not just flag-level rules. Release notes

Your Software Catalog, Smarter

Catalog Roundup: Modeling, Connections, and Surface Area

A set of enhancements expands what the developer portal catalog can model, connect, and display. The changes are incremental, but together they close gaps that platform teams have been routing around.

‍Integrations Overview on Entity Pages

The entity details page now includes a dedicated card showing key integration data directly on the overview. Platform engineers and developers can see the health and status of an entity's connected integrations at a glance rather than navigating to a separate integrations view. Docs

‍GitHub Integration: Secondary Entity Kinds

When configuring GitHub integration, you can now select secondary entity kinds to map discovered repository entities to. The data from those kinds surfaces directly on the entity details page, giving platform teams more flexibility in how GitHub content is represented in the catalog. Docs

‍AI Asset Instructions Tab

Entity pages for AI Assets now include a dedicated Instructions tab that renders the associated documentation file from GitHub directly within the portal. Teams discover and read AI asset documentation without leaving the catalog. Docs

‍Blueprints at Organization and Project Levels

Environment Blueprints can now be created and managed at the Organization and Project scope levels, in addition to the Account level. The blueprint listing page shows the scope for each blueprint, and managed roles have been updated with the appropriate permissions at each scope.

Resilience Testing

Kubernetes Load Testing

Load tests can now run against Kubernetes infrastructure. Previously load tests required Linux infra, meaning chaos testing and load testing needed different tooling and separate infrastructure even when targeting the same cluster. Resilience testing is now fully Kubernetes-aware end to end. Docs

‍Chaos Enhancements

A set of improvements landed across the chaos platform this month: filtering support for chaos experiment lists in the REST API, step name editing in Chaos Studio, NOT_EQUAL_TO operator for ChaosGuard namespace label selectors, tag-based filters on the DR Tests screen, probe chain logic, DR Test ACL permissions and audit events, user-based filters in the Experiments API, support for output variables in chaos resources, and the Chaos NG experience reaching general availability. Release notes

AI Test Automation

Playwright Execution Service (Beta)

Harness AI Test Automation now runs native Playwright test suites directly on the platform. Your playwright.config, spec files, and package.json scripts work as-is: connect your repo, point to your project root, and run. No grids to configure, no browser images to maintain, no infrastructure to scale. Tests run in cloud with parallel workers out of the box.

When tests fail, Harness automatically classifies the failure as regression, flaky, performance, or environment issue, so engineers spend time fixing problems instead of determining whether a problem is real. Playwright runs are first-class pipeline steps: results live in the Tests tab alongside build and deploy stages, and tests block deployments by default. Existing Playwright investments stay intact; scripts can evolve into AI-generated intent-based tests gradually when teams are ready.

Available now in beta. Release notes | Docs | Blog

AI SRE

CEL Expression Engine

Common Expression Language is now the full expression engine for AI SRE runbook conditions. Write dynamic conditions using regex matching, datetime formatting, list comprehensions, and math anywhere logic is evaluated or data is transformed. Docs

Google Chat Integration

Teams using Google Workspace can now run incident response from Google Chat: dedicated incident spaces, bidirectional message mirroring between the AI SRE UI and Google Chat, automatic responder adds, and real-time incident timeline sync. Built on Pub/Sub for reliable message delivery. One-time admin setup per organization. Docs

Platform-Level Updates

Service Account Token Notifications

Configure alerts for service account token events: creation, rotation, updates, expiration, deletion, and upcoming expiration. Delivered across notification channels already configured in your account. Expiring service account tokens are a common cause of silent pipeline failures; this makes them visible before they cause an outage. Docs

Platform Alerts

An in-app notification framework now surfaces important account-level events automatically within the Harness UI: approaching resource limits, system release announcements, and other account-wide signals. No external configuration required. Docs

In Closing...

The teams compounding fastest on AI are the ones where the whole system accelerated, not just the part that writes code. May brought 60+ feature releases, a Skills Library that makes any AI coding assistant fluent in Harness, artifact registries that know what they are serving and to whom, and the first dashboards that connect AI spend to AI output. The bottleneck keeps moving. We help you unblock the bottleneck in your software delivery.

See you in June.

‍

Software Delivery Context, Now Inside Claude

Harness is now available in the Claude Connectors Directory, giving teams real-time AI access to pipelines, deployments, approvals, and software delivery context.

Key Takeaway: The Harness MCP Server is now in the official Claude Connectors Directory. Developers using Claude can now discover and connect to Harness, gaining structured, real-time access to their pipelines, deployments, approvals, and delivery workflows. What makes this different from a typical API integration is what's underneath: the Harness Software Delivery Knowledge Graph, which gives Claude the context it needs to make decisions that are accurate, fast, and safe.

‍

AI agents are only as good as the context they operate in. That's not a design philosophy. It's a practical constraint. An AI agent that doesn't understand how the underlying software delivery entities relate to each other, or what the data actually means, will get things wrong. In software delivery, wrong looks like a botched deployment, a misread failure, or an approval granted when it shouldn't have been, which directly affects your users.

Today, we're announcing that the Harness MCP Server is in the official Claude Connectors Directory, making Harness discoverable and connectable for every team using Claude. But the announcement isn't really about the directory listing. It's about what Harness + Claude can actually do in your delivery system.

What You Can Do with Claude and Harness

Claude can work across the full Harness delivery platform:

Capability	What Claude can do
Pipeline execution	Trigger and monitor builds across GitHub, GitLab, Bitbucket, or Harness Code
Deployment management	Promote services across environments with approval gate verification
Failure diagnosis	Pull structured execution context and surface root cause analysis
Approval workflows	Retrieve pending approvals and take governed delivery actions
Environment state	Query what's deployed where, in real time
Security posture	Review SBOMs, vulnerability scan results, and SSCA compliance status
Resilience testing	Initiate chaos experiments and retrieve structured results
Cost signals	Surface cloud cost anomalies tied to deployment activity

‍

All of it is grounded in the Knowledge Graph, not raw API responses, but a structured model of your delivery system that Claude can reason over precisely.

The Problem With Giving AI Agents Raw API Access

MCP lets AI models call external tools by reading API descriptions and deciding which to invoke. That flexibility is useful. But when you're building an agent that needs to reason across an entire software delivery lifecycle, CI, CD, security scans, approvals, feature flags, cost signals, and environments, raw API access creates a deep reliability problem.

Consider a question a platform engineering lead might ask:

‍"Show me the pipelines with the highest failure rate over the last 30 days, and for each one, tell me which services they deploy and whether any of those services have open critical vulnerabilities."

That question spans four domains: pipeline execution history, service-to-pipeline relationships, environment state, and security scan results. An agent working off raw APIs has to discover which APIs exist across each domain, call them in the right order, paginate correctly, infer how field names correspond across systems, and synthesize the results without misinterpreting nested objects or guessing at relationships.

The result is 5+ sequential LLM calls, hundreds of thousands of input tokens, high latency, and an agent that had to guess at every join. Guessing is where hallucinations happen.

What the Harness + Claude Integration Changes

The Harness Software Delivery Knowledge Graph is a purpose-built model of everything that happens after code is written: builds, test runs, deployments, approvals, security scans, environment states, feature flags, infrastructure changes, cost signals, and rollbacks. Not as raw data but as a connected, typed, semantically annotated graph of entities and relationships.

Every field in the graph carries metadata that tells an agent exactly how to use it: whether a value is a number or a string, whether it can be aggregated or only filtered, what its unit is, and how it joins to related entities. Cross-module relationships, between a pipeline and the services it deploys, between a deployment and the security scan results for that artifact, between an environment change and the cost anomaly that followed, are explicitly declared, not inferred.

This is the difference between an agent that can access your delivery system and one that understands it.

When Claude connects to Harness via MCP, it doesn't receive a set of API endpoints. It's getting access to a structured model of your entire delivery organization, one where the relationships are known, the data types are enforced, and the agent can construct precise queries rather than guessing at field semantics.

‍The practical effect with Harness + Claude: that same cross-domain question above becomes 2–3 structured queries against a known schema. The agent selects the right entity types from the graph, generates queries with exact fields and declared relationships, and returns a deterministic answer. No guesswork. No hallucinated field names. No silent wrong answers.

What This Looks Like in Practice

Debugging a failed pipeline without context switching

A build has failed. Normally, you'd open the Harness UI, navigate to the execution, copy the relevant logs, paste them into a conversation, and wait for analysis. The AI reasons over whatever you managed to capture.

With the Harness MCP connection active in Claude, you ask what failed. Claude doesn't just pull logs; it queries the Knowledge Graph to understand the structure of that pipeline, which stage failed, what services were involved, whether similar failures have occurred before, and what changed since the last successful run. The answer it surfaces reflects the full delivery context, not just the stack trace you happened to copy.

Promoting a deployment through governed gates

Your team is ready to move a service from staging to production. Claude checks the current environment state, verifies that required approval gates have been satisfied, confirms the security scan passed for the artifact version you're promoting, and initiates the deployment — with every action running through your existing RBAC policies and logged for audit.

The agent isn't guessing about whether conditions are met. It's querying a graph where those conditions are modeled as typed relationships with known states. The answer is deterministic because the data is structured to make it so.

This Is Not AI Without Guardrails

The natural question when Claude can trigger pipelines and manage deployments: what stops it from doing something it shouldn't?

The same controls that govern everything else in Harness. Every action taken through the MCP server runs through your existing RBAC permissions, OPA policy enforcement, approval gates, and audit logging. Claude operates with exactly the permissions you have, nothing more. Every action is tracked. Nothing bypasses the governance layer.

The Knowledge Graph reinforces this: because Harness AI understands your delivery system structurally, it also understands the constraints within it. Approval gates aren't just optional steps the agent might skip; they're modeled as typed relationships with state. The agent can't promote past a gate that hasn't cleared because the graph reflects that clearly.

Speed and governance aren't a tradeoff. They coexist by design.

Why the Claude Connectors Directory Matters

The Claude Connectors Directory is a curated, reviewed set of integrations. Anthropic evaluates each server before listing it. Being approved is a signal of trust that carries weight for enterprise teams deciding which AI integrations to enable.

It also means discoverability at scale: engineering teams using Claude for DevOps workflows will find Harness natively. One-click OAuth connection, no API key management, no manual configuration.

This fits a broader pattern. The Google Cloud partnership brought Harness into Google's AI ecosystem through Vertex AI and Gemini CLI. The Cursor plugin brought it into the IDE. The Claude Connectors Directory brings it into conversational AI. In each case, the goal is the same: wherever developers are doing their best thinking and wherever AI is being asked to help with software delivery, Harness should be present with the right context for that AI to act reliably.

Getting Started

If you're already a Harness customer:

Open Claude and then the Connectors page
Search for Harness in the MCP directory
Authenticate with OAuth, no API keys, no manual configuration
Start asking Claude about your pipelines, deployments, and delivery workflows

If you're new to Harness, sign up for free and connect from day one. Detailed steps are listed in the documentation.

The Harness Connector gives Claude the ability to act in your delivery system. The Knowledge Graph gives it the understanding to act well. Together, that's what reliable AI in software delivery actually looks like.

‍

BigQuery CI/CD and Database DevOps with Harness

Automate BigQuery schema deployments with Harness using secure OIDC authentication and CI/CD pipelines.

Modern data platforms are evolving rapidly, and Google Cloud BigQuery has become a core part of analytics, AI, and large-scale reporting architectures. Teams (including Harness) rely on BigQuery to process and analyze massive datasets, but managing schema changes in a secure, repeatable way can still be challenging.

Today, we’re excited to announce BigQuery support for Harness Database DevOps, enabling teams to bring the same automation, governance, and reliability they expect from application DevOps to their BigQuery deployments.

With this release, organizations can now manage BigQuery schema changes using pipeline-driven Database DevOps workflows directly within Harness, while also leveraging secure OIDC-based authentication for keyless access.

The Challenge: Managing BigQuery Changes at Scale

BigQuery helps organizations move fast with data, but database change management often remains manual and fragmented.

Common challenges include:

Manual schema deployments that slow down releases
Limited visibility into schema changes across environments
Inconsistent promotion workflows between development, staging, and production
Managing long-lived service account keys
Difficulty enforcing governance and approvals

Without a standardized deployment process, teams struggle to balance speed, reliability, and security.

Bringing Database DevOps to BigQuery

Harness Database DevOps now supports BigQuery as a first-class database platform, allowing teams to manage schema changes through automated, pipeline-driven workflows.

This means BigQuery schema changes can now be treated just like application code versioned, tested, approved, and promoted through environments using Harness pipelines.

With BigQuery support, teams can:

Automate schema deployments using Harness pipelines
Version control database changes alongside application code
Promote changes consistently across environments
Enforce approvals and governance policies before production releases
Track and audit deployments with full visibility
Eliminate static credentials using OIDC authentication

The result is a modern Database DevOps workflow for BigQuery that helps teams release faster without sacrificing security or reliability.

Key Capabilities

Native BigQuery Integration

Harness Database DevOps can now connect directly to BigQuery environments using BigQuery JDBC connector powered by the Simba BigQuery JDBC driver.

Example JDBC URL:

jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;ProjectId=YOUR_PROJECT_ID;DefaultDataset=YOUR_DATASET;Location=YOUR_REGION;

OAuth access tokens are injected automatically during authentication, removing the need for manual credential management.

Secure OIDC-Based Authentication

Harness supports OIDC authentication using GCP Workload Identity Federation, allowing teams to securely authenticate to BigQuery without storing long-lived service account keys.

During pipeline execution:

Harness generates a short-lived OIDC token
GCP Security Token Service exchanges the token
Temporary credentials are generated dynamically
Harness securely authenticates to BigQuery at runtime

This improves:

Security posture
Compliance readiness
Credential management
Operational reliability

No static JSON keys are stored in Harness or delegate environments.

Automated Database Change Pipelines

Use Harness pipelines to automate BigQuery schema deployments with repeatable workflows across environments.

Teams can:

Trigger deployments from Git changes
Standardize promotion workflows
Validate changes before production releases
Automate schema delivery using CI/CD

Governance and Control

Leverage Harness approval gates, RBAC, and policy enforcement to ensure safe production changes. This helps organizations introduce governance into analytics database deployments without slowing down delivery velocity.

Deployment Visibility and Auditability

Track every BigQuery deployment with:

Pipeline execution history
Deployment logs
Approval records
Change visibility across environments

This creates a more transparent and auditable deployment process for data teams.

Why This Matters

As organizations increasingly rely on BigQuery to power analytics and AI workloads, database changes require the same level of automation and governance as application deployments.

By bringing BigQuery into Harness Database DevOps, teams can:

Reduce manual deployment risk
Improve collaboration between platform and data teams
Standardize analytics database release processes
Improve security with keyless authentication
Accelerate delivery of data platform changes

Getting Started

BigQuery support for Harness Database DevOps is now available.

To get started:

Configure a BigQuery JDBC connector in Harness
Enable OIDC authentication using GCP Workload Identity Federation
Add BigQuery change scripts to your repository
Create a Harness pipeline to deploy and promote changes
Automate BigQuery releases with confidence

Learn More on setting up our documentation.

Learn More

To learn more about using BigQuery with Harness Database DevOps, check out our documentation or schedule a demo.

Additional Resource - Warehouse Native BigQuery Integration

Feature Flag Tools Compared: 10 Best Platforms for Safer Releases

Compare 10 feature flag tools across rollout controls, experimentation, governance, self-hosting, and observability. Find the best platform for startups, enterprises, and data-driven teams.

Modern feature flag tools have evolved past simple on/off toggles into full experimentation platforms.
The right platform plugs directly into your CI/CD pipeline and observability stack, so experimentation becomes a daily developer practice instead of an off-to-the-side project.
Choosing a feature flag tool ultimately comes down to scale, governance, and how clearly each release ties to the business KPIs your leadership actually cares about.

The 10 Best Feature Flag Tools for 2026

Releasing new software used to be a big deal. You would set aside a Saturday night, wake up the on-call engineer, push the code, and hope that nothing broke before Monday morning.

Then came feature flags, which changed everything without anyone noticing.

Feature flags let you separate deployment from release, so you can send code to production in a dormant state and turn it on for users when you're ready. No more 1 a.m. maintenance windows. We don't have to ship every feature in a release together anymore, or scramble to pull one back with a hotfix. Just code in production, off by default, and ready when you say so.

But the tools have improved a lot. Feature flag tools these days are more than just on/off switches. The best ones have flag management, progressive delivery, real-time release monitoring, A/B testing, and AI-driven guardrail metrics all built right into your CI/CD pipeline. That changes how a release looks, how a rollback feels, and how confident your team is when they ship.

Here's a look at the best feature flag tools available, along with what each one does well and what to look for when picking the right one for your team.

What Feature Flag Tools Really Do

A feature flag, or feature toggle, is a conditional block in your code that controls whether a new feature is active for a given user. Wrap a flag around a checkout page redesign, and you can push the code to production while keeping the new flow hidden from 99% of users. Set it to 1% as a canary, monitor your metrics, and gradually increase the rollout percentage if everything looks good.

Feature flag tools handle the whole lifecycle: creating flags, targeting users, rolling them out incrementally, monitoring their impact, and retiring flags once they've served their purpose.

Modern platforms add a few more layers on top of that:

Progressive delivery. Instead of releasing everything at once, release features to bigger groups of users over time, based on performance metrics.
Experimentation. Use proper sample size calculations and significance testing to run statistically sound A/B tests.
Release monitoring. Find out how feature exposure affects error rates, latency, and business KPIs in real time.
Governance. RBAC, audit trails, and approval workflows for organizations operating in regulated industries.

The toggle itself isn't worth much. The safety net around it is.

What to Look for in a Feature Flag Tool

Before you start looking at different tools, make sure you know what your team really needs. Some questions you should ask are:

Does it work with the CI/CD pipeline you already have? Your developers will work around a flag platform that is outside of your delivery workflow, not with it.

Can it connect flag exposure to your observability stack? You don't want three dashboards to cross-reference when something breaks at 3 a.m. You want one screen that tells you which feature caused the spike.

Will it scale with your traffic and your team? When you have millions of users, SDK performance, evaluation latency, and offline fallback are all important.

Does it cover governance for regulated environments? In healthcare, fintech, or anything touching PII, RBAC, approval workflows, immutable audit trails, and Policy as Code aren't optional.

How does it handle flag lifecycle management? Stale flags are technical debt. The best platforms include ownership assignment, sunset policies, and dashboards that surface flag age and usage frequency.

With those criteria in mind, here are the best tools to consider.

The 10 Best Feature Flag Tools

1. Harness Feature Management & Experimentation (FME)

Harness FME is a developer-first platform that brings feature management, A/B testing, and release monitoring into one unified system. Built on the combined Split and Harness lineage, FME is designed for enterprise teams that want experimentation baked into their CI/CD pipeline not bolted on as a separate workflow.

What makes FME stand out:

Unified flags and experimentation. Feature management and A/B testing share the same flag, SDK, and data pipeline. No parallel systems to reconcile.
AI-driven release monitoring. Release monitoring automatically connects flag exposure to error rates, latency, and business KPIs. You know which feature broke something right away, not hours later.
Warehouse-native experimentation. Run analysis directly on your Snowflake, BigQuery, or Databricks data, so experiment results live alongside the rest of your business intelligence.
Automated rollback and progressive delivery. If p95 latency climbs 10% for 84 seconds, FME handles the rollback automatically while you sleep.
Enterprise governance. RBAC, SAML federation, immutable audit logs, and approval workflows for regulated industries.

Best for: Enterprise engineering teams that want a single platform for feature flags, experimentation, and release monitoring, with deep CI/CD integration.

2. LaunchDarkly

LaunchDarkly is one of the oldest feature flag platforms on the market. It's a popular choice for teams that want a flag-first product with mature SDK support for most major languages.

Some of its strengths are that it has a lot of SDK support, good targeting options, and a long history of managing features. Some teams may prefer other vendors for bundled analytics or warehouse-native analysis. Teams that do a lot of A/B testing often use LaunchDarkly with a separate analytics or stats engine, which makes things more complicated.

Best for: Teams whose primary need is feature flag management, with separate tooling for testing and observability.

3. Statsig

Statsig has become a popular platform for product-led growth teams. Statsig is a popular platform for product-led growth teams because it has a free tier that includes feature flags, experimentation, and product analytics all in one place.

The platform's statistical engine is good. It can do sequential testing and has a good way of testing for significance. With warehouse-native mode, you can analyze your own data infrastructure. Statsig is still growing in enterprise governance, but its RBAC and audit features aren't as strong as those found in regulated industries.

Best for: Product-led growth teams that want flags, experiments, and analytics in one system without heavy enterprise requirements.

Ownership note: Statsig announced in September 2025 that it would join OpenAI. OpenAI said Statsig would continue operating independently and serving current customers, so buyers may want to watch how the roadmap evolves under new ownership.

4. Optimizely Feature Experimentation

Optimizely's roots are in web-based A/B testing, and it brings that history of experimentation into its feature flag product. The platform's statistical methods are well-established, and marketing teams that have used other Optimizely products are likely to choose it.

The downside is that you can see where Optimizely came from in some places. The product is more useful for web and front-end use cases and less useful for the kind of deep backend, infrastructure-level flag management that engineering teams often need. More developer-native tools tend to work better for product engineering teams that only work on products.

Best for: Marketing-engineering hybrid teams already invested in the Optimizely ecosystem who want to extend it to product feature testing.

5. PostHog

PostHog is an open-source platform that bundles product analytics, feature flags, experimentation, and session replay together. It's a popular pick for early-stage companies that want a lot of capability without paying for multiple platforms.

The all-in-one approach works well at a smaller scale. As you grow, you may find that specialized tools go deeper on individual capabilities particularly enterprise-level flag management and statistical rigor. The self-hosted option is a meaningful advantage for teams with strict data residency requirements.

Best for: Startups and growth teams that want product analytics and feature flags in one place, with a self-hosting option.

6. Flagsmith

Flagsmith is a feature flag platform that is completely open source and can be hosted in the cloud or on your own server. It's a good choice for teams that need open-source flexibility (or strict self-hosting) but don't want to lose the polished product experience.

The platform does a good job of covering the basics, like targeting, segmentation, multivariate flags, and SDK support for most languages. It's not as heavy as enterprise platforms when it comes to advanced experimentation, AI-driven release monitoring, and deeply automated guardrails.

Best for: Teams with privacy requirements, self-hosting mandates, or a strong preference for open-source software.

7. Unleash

Unleash is another open-source option with a strong following in Kubernetes-native shops. It's known for being straightforward to set up, easy to understand, and well-suited to teams that want full control over their tooling.

Like Flagsmith, Unleash handles flag management well but doesn't extend as far into experimentation or release intelligence. If your team primarily needs to safely gate features and host the platform yourself, Unleash is a solid choice.

Best for: Open-source-first teams, especially those running Kubernetes infrastructure.

8. ConfigCat

ConfigCat markets itself as a simple, inexpensive feature flag service with clear prices and an easy setup. A lot of small to medium-sized teams choose it because they want to manage flags without the extra work that comes with a bigger platform.

The product includes the basics, such as targeting, segmentation, percentage rollouts, and connections to popular tools. It wasn't made to be a testing platform, so teams that need statistical analysis will have to use it with something else.

Best for: Small-to-midsize teams that want light-weight, budget-friendly flag management without enterprise complexity.

9. GrowthBook

GrowthBook is an open-source feature flag platform originally built around warehouse-native experimentation. The premise: your experiment data is already in BigQuery, Snowflake, or Redshift, so it should be analyzed there rather than piped to a separate vendor.

For data teams that have invested heavily in their warehouse, GrowthBook is a strong fit. The statistical methods are rigorous. Bayesian and frequentist options, sequential testing, CUPED variance reduction, and the open-source model gives you full control over the platform.

Best for: Data teams that want serious warehouse-native experimentation with open-source control.

10. AWS AppConfig

AWS AppConfig is Amazon's native configuration and feature flag service for teams operating entirely within the AWS ecosystem. It integrates cleanly with Lambda, ECS, EKS, and EC2, and runs as a fully managed service under your existing AWS account.

The trade-off is depth. AppConfig treats flags as part of broader application configuration. It isn't a purpose-built platform for experimentation or release intelligence. Teams that need advanced targeting, A/B testing, and release monitoring at the level of a dedicated tool will outgrow it quickly.

Best for: AWS-native teams with modest flag requirements who want to stay within the AWS ecosystem.

How to Pick the Right Feature Flag Tool for Your Team

Once you've narrowed down your list, here are a few things to think about.

Match the tool to your scale. A platform that works for a 10-person startup probably won't work for a business with 500 engineers, and the other way around. Check how well the SDK works when it's under load, how deep the governance is, and how the platform handles thousands of flags across hundreds of services.
Look for pipeline-native integration. If turning on a flag means a developer has to stop what they're doing and do something else, that flag won't be used as much. The best platforms let you manage flags like GitOps and trigger updates with CLI commands or pipeline steps.
Build in flag hygiene from day one. Old flags are a type of technical debt. Look for dashboards that show the lifecycle of a project, policies about when to end a project, and who is responsible for what. Amazon requires flag removal tasks to be done when the task is created, which is a good idea to copy.
Plan for governance before you need it. RBAC, audit trails, approval workflows, and policy-as-code may seem like too much for a small project, but they cost a lot to add later. Get the governance bench set up early.
Run a two-week pilot with one team before rolling out company-wide. You can learn more about a platform in two weeks with just one engineering team than you can with a dozen vendor demos. Don't just look at how well it works on its own; make sure it fits with your current tools.
Tie your tool choice to KPIs. You should be able to measure the tool you choose by how often it is deployed, how often it fails to change, how long it takes to recover, and (ideally) how it affects business outcomes for specific experiments. It's hard to explain why you spent the money if you can't connect it to those numbers.

Stop Guessing and Start Shipping with Confidence

Feature flag tools started as a clever way to ship code that wasn't quite ready without breaking production. They've grown into something much larger: the foundation for safer releases, faster experimentation, and a development culture where shipping doesn't feel like gambling.

The best platforms bring feature flags, progressive delivery, real-time monitoring, and AI-driven guardrails together in one place integrated with your CI/CD pipeline so every release becomes a controlled experiment rather than a leap of faith.

Harness Feature Management & Experimentation brings flags, experimentation, and release monitoring into a single enterprise-grade platform, with AI-driven guardrails and deep CI/CD integration built in. Every deployment becomes a measurable, recoverable experiment instead of a gamble.

Feature Flag Tools: Frequently Asked Questions (FAQs)

What's the difference between a feature flag and a feature toggle?

They mean the same thing. "Feature flag" and "feature toggle" are used interchangeably across the industry. Some teams use "toggle" for simple on/off switches and "flag" for more complex multivariate or targeted releases, but most platforms and engineers treat them as the same concept.

Are open-source feature flag tools production-ready?

Flagsmith, Unleash, and GrowthBook are all capable of running in production at scale. The trade-off is usually in advanced experimentation, AI-driven release monitoring, and enterprise governance. If those aren't requirements, open source is a legitimate path. For teams where they are requirements, a managed enterprise platform typically saves more in engineering time than it costs.

Can I use feature flags without a dedicated platform?

Yes. Many early-stage products start with homegrown approaches using config files or environment variables. The cracks show later: targeting becomes hard to manage, there are no audit trails, and stale flags accumulate as silent technical debt. Most teams hit a threshold (usually around 20 to 30 active flags) where a dedicated platform pays for itself in saved engineering time.

How do feature flag tools integrate with CI/CD pipelines?

The best platforms integrate directly with your CI/CD pipeline so flag updates can flow through GitOps workflows, CLI commands, or pipeline steps. That keeps flag changes in the same review and audit flow as code deployments. During an incident, you have one place to look: what changed, when, and who changed it.

Do I need separate tools for A/B testing and feature flags?

You can run them separately, but you'll spend ongoing effort keeping data consistent across two systems. Unified platforms like Harness FME use the same flag, SDK, and exposure pipeline for both flag management and experimentation which eliminates an operational pain point that most teams don't appreciate until they've lived with the split-system version.

How do you prevent feature flag debt?

Three habits cover most of it:

Assign an owner and an expiration date when you create a flag.
Maintain a flag hygiene dashboard that surfaces age, usage frequency, and removal candidates.
Treat flag removal as a normal engineering task, not an afterthought. File the removal ticket before the flag goes live.

Anthropic’s Mythos, Glasswing, and how the industry must move forward

This is not a security problem. As we’ve settled into the speed of AI, it’s become clear that security isn’t a job solely for the security team. Here’s why.

When Anthropic broke the news of Mythos and Project Glasswing, the security community did what it always does. It published a flurry of papers asking "What does this mean for security?" It's a reasonable instinct, but it's the wrong question.

The real question is who actually owns the problem?

The Advice Is Right. The Audience Is Wrong.

Even Anthropic's own guidance on preparing your security team for the AI era, comprehensive and well-reasoned as it is, lands squarely on steps that security teams can influence but cannot execute. Maintaining accurate inventories of exposed systems, decommissioning legacy services, and minimizing API exposure. These are all the right steps. They are also, unambiguously, engineering steps.

Security teams have owned these conversations for years, not because they were ever truly equipped to act on them, but because engineering was remarkably effective at passing the responsibility to someone else. That era is over.

The Eng & Sec Silos Have to Go

Take attack surface reduction as a concrete example. Anthropic's recommendations are sound: know what you're exposing, shut down what you don't need, lock down your APIs. But a security team cannot decommission a legacy service. They cannot refactor an API. They can nag, escalate, and document, then watch the ticket sit in a backlog for six months.

Engineering has to take this on. Not reluctantly, not after repeated escalations, but as a core ownership responsibility. The framing of "security's job" versus "engineering's job" is a liability the industry can no longer afford.

The Path Forward Is Uncomfortable — But It Starts Now

This transition won't be easy. Changing ownership models inside organizations is political, slow, and often painful. But the alternative means maintaining siloed teams while AI-accelerated vulnerability exploitation scales faster than any manual process can respond. That isn't a strategy. It's a countdown.

Here's what needs to happen immediately:

Security and engineering must jointly review what we know about threats like Mythos and the recommendations Anthropic has put forward — together, in the same room, with shared accountability.
Joint planning sessions aren't optional. Shared war-gaming, shared roadmaps, shared ownership of remediation timelines.
Cross-industry knowledge sharing is no longer optional. Threat actors collaborate, share tooling, and iterate in the open. The industry has to build the same sharing culture attackers already have.

The Wave Is Already Here

This isn't a theoretical future risk. The wave is already forming offshore, and most organizations are still debating whether to build a seawall.

AI hasn't just made attackers faster, it has fundamentally changed the economics of exploitation. What once required a skilled threat actor, weeks of reconnaissance, and significant resources can now be automated, scaled, and deployed by someone with a capable model and a motivated prompt. Zero day vulnerabilities that previously had a window of days or weeks before widespread exploitation are now being weaponized in hours. The asymmetry between attack and defense has never been more extreme.

Here's the uncomfortable truth: the traditional security model was never built for this speed. It was built for a world where humans attacked and humans defended, where there was time to deliberate, escalate, and patch. That world is gone.

Mythos doesn't wait for your quarterly security review. GlassWing doesn't care that your legacy service decommission is "on the roadmap for H2." AI-powered exploit tooling operates at machine speed. And right now, the defense side of that equation is still running on organizational clock time.

Two Futures

Organizations that recognize this moment and act on it will look very different in three years. Security and engineering will share OKRs, not just Slack channels. Remediation won't be a ticket handed off between teams, it will be a joint sprint. Attack surface reduction will be an engineering hygiene standard, not a security audit finding.

Organizations that don't adapt will face a different outcome. It won’t be a gradual decline, but a sudden, forced reorganization triggered by a breach that exposes exactly how brittle the old model was. The silo walls won't come down in a planned migration. They'll come down in an incident post-mortem.

This Is the Moment

Industry inflection points rarely announce themselves clearly, but this one is. The research is public and the threat models are documented. Anthropic, and others, have laid out precisely what needs to happen. The gap between knowing and doing is entirely organizational — and that gap is where the real risk lives.

The teams that start the hard conversations now about ownership, accountability, and shared responsibility are the ones that will be positioned to respond when the wave hits. And it will hit. The question isn't whether your organization needs to change. The question is whether you'll choose the terms.