The Hard Part Begins After The Lab Experiments End

Bridging the "valley-of-death" for advanced materials for electrolysis

Recently, I got a chance to present the work we are doing at the 249^th ECS meeting in Seattle. The depth and breadth of research being presented there always surprises me. However, there was one aspect of the conference which left me with mixed feelings. Most presentations on electrocatalysts for water electrolysis were about novel materials tested at lab scale under ambient conditions. This is valuable fundamental work, but it's only part of the story. Very few presentations addressed what happens when you have to scale up these materials and test them in a real electrolyzer stack.

As alkaline electrolyzer capacity scales from megawatts to gigawatts, catalyst performance and durability are turning from lab metrics into a test of whether the project will pass Final Investment Decision (FID). Green-hydrogen plants are financed on multi-year off-take agreements that assume the stack will maintain its efficiency for 10-15 years, and any drastic reduction in efficiency can derail a viable project. Therefore, it is not enough for a novel catalyst material to just show high performance at the start of operations. It also needs to be stable enough to survive for the lifetime of the plant. This is the often-talked-about "valley-of-death" for advanced materials or any new technology which needs to be scaled from TRL 2-3 to TRL 7-8.

This is a problem we think about every day at Newtrace. We deal with the challenges of developing best-in-class catalysts and scaling them from cm² to m² scales. These advanced materials not only need to demonstrate high efficiency but also need to survive for the lifetime of a commercial electrolyzer plant. The gap between what works in the lab and what survives in industrial conditions is the central challenge of our work.

The first gap between studies done in the lab and those needed to be successful at industrial scale is the catalyst deposition process. A catalyst coating optimised on a 1 cm² coupon behaves very differently when deposited on a 10,000 cm² electrode. At these scales, material properties are much harder to reproduce. Uniformity across the substrate becomes difficult to achieve. Repeatability across batches drops. Much of this comes down to complex, interdependent process parameters that become difficult to control at production scale.

The preparation and treatment of the substrate also starts to play a role. At lab scale, you can prepare and clean a small coupon to near-ideal conditions. At production scale, pre-treatment of the substrate is its own engineering problem: surface consistency and cleanliness both affect how the coating adheres and whether it will finally survive or not. A coating process that works reliably on a coupon will show inconsistent results on a full-size electrode if the substrate pre-treatment processes aren't controlled to the same standard as the coating process itself.

Solving these problems takes quality control and process engineering, and both depend on material-property data to understand why the coating behaves as it does at scale, the 'process-material-electrochemistry trifecta' as I like to call it.1More on this in a future post. Understanding this trifecta is where deep expertise in electrochemistry and materials characterisation becomes essential — along with the ability to analyse large volumes of process data.

The second gap is between lab testing conditions and real-world operations. A typical lab-scale electrocatalyst test runs in a glass beaker with a standard reference electrode in 1M KOH at ambient conditions. Compare that to an actual commercial alkaline electrolyzer: a bipolar stack running at 80–90°C, in 30–32 wt% KOH, at 16 barg. These industrially relevant conditions usually lead to better short-term performance improvements but affect the long-term stability of the catalyst-coated electrodes, leading to early failure, even though the same material showed great promise at lab scale.

For example, higher temperature lowers the overpotential and improves cell efficiency, but affects the stability of various stack components including the catalyst layer.2Lohmann-Richters et al., reports cell-voltage reductions of 3.4-4 mV/K between 100 and 200°C, set against accelerated degradation of catalysts, diaphragms, and other components in the more corrosive high-temperature electrolyte. For the 60-90°C range typical of commercial operation, Zeng et al. document the same fall in cell voltage with rising temperature. Higher KOH concentration reduces ohmic losses and lowers cell voltage, but is far more corrosive.3Gilliam et al., reported that specific conductivity rises with KOH concentration to a maximum that depends on temperature — near 20 wt% at room temperature, shifting toward ~32 wt% at 80-90°C. Higher operating pressure leads to reduced bubble-related overpotential but carries a thermodynamic penalty.4Rox et al. quantifies the trade-off: at -25 mA/cm² pressure adds a ~23 mV Nernstian penalty at 6 bar, but at 100 mA/cm² the smaller bubbles cut the overpotential by up to ~60 mV, outweighing it. What this means is that several failure modes only appear when testing under these conditions at the stack level.

The most persistent of these is iron contamination. The bipolar plates and other structural components in an alkaline stack are typically made from carbon steel (CS) rather than stainless steel (SS), because carbon steel is cheaper per kilogram and easier to machine. But CS is also far more prone to corrosion in 30 wt% KOH at 80°C when compared to SS. To counter this, CS stack components, such as bipolar plates and end plates, are coated with nickel which is stable in concentrated alkaline environments.

The Ni coating prevents rapid corrosion, but it cannot hold up for the full lifetime of the plant. Pinholes and peel-off of the nickel layer means iron inevitably leaches into the electrolyte, deposits on the cathode, and poisons the HER catalyst, causing a gradual increase in voltage over time. This is a slow efficiency loss that accumulates over thousands of hours. It doesn't appear in a lab test with analytical-grade KOH because there is very little Fe contamination to start with, and no sources that add to it over time.5Huo et al. identifies iron electrodeposition as the dominant degradation mechanism in industrial AWE, showing that just 3 ppm Fe concentration leads to approximately 10% voltage increase over a 10-year operational period through cathode catalyst surface area reduction and membrane porosity loss.

Reverse and shunt currents are two more problems that appear only at stack level. When a bipolar stack shuts down or undergoes load changes, current can briefly flow in the reverse direction through the cells. This, in turn, accelerates catalyst degradation. Unlike Fe poisoning, which is gradual, reverse current damage can appear during these discrete events and lead to abrupt cell or stack failure.6Kim et al. showed that reverse current after shutdown irreversibly oxidises the Ni cathode to NiO_x which hampers HER performance.

Shunt currents affect the system differently. They may not directly lead to catalyst damage, but reduce the efficiency of H₂ production nonetheless. In a bipolar stack, shunt current flows through the electrolyte manifold between cells, reducing the effective current at the centre of the stack due to parasitic losses. The effective current in the central cells can be 10-20% lower than the applied current, which shows up directly as lower faradaic efficiency.7Sakas et al. reported a faradaic efficiency of 86% at nominal load, with shunt currents as the main contributor to the losses.

The third gap is in studying the durability of an electrode material. For catalyst-coated electrodes in an alkaline electrolyzer, end-of-life (EoL) is defined as a 10% increase in cell voltage.8DOE Technical Targets for Liquid Alkaline Electrolysis which sets the 10% voltage rise as end-of-life threshold and the average degradation rate basis for electrode lifetime. The required durability in alkaline electrolysis is 80,000 hours. That's over nine years of continuous operation. Most materials that show good initial activity cannot sustain it for that long. And no academic study runs that long. In fact, very few industrial tests do either. Regardless, you need at least 1,000+ hours of demonstrated stability across multiple runs, at a degradation rate of <5 µV/hr, and at a relevant size, to convince customers that the electrode is commercially viable.

Even running these long-duration tests is its own engineering problem. Keeping the BoP stable enough to run continuously for 1,000+ hours at high pressure and temperature is not simple. Temperature fluctuations affect reaction rates and material stability. Pressure swings affect gas crossover and safety margins. Electrolyte purity degrades over time as components corrode and contaminants accumulate.

Beyond these technical gaps, there is the question of cost. A material that is both efficient and stable, but is 10x more expensive than the current incumbent, prices itself out of the market. Therefore, scaling the material is not only about which elements to use in the catalyst coating (PGM vs non-PGM), but also about the final impact on the levelised cost of hydrogen (LCOH), considering both capex and opex.

This does not mean that academic labs need to do the industry's job and expend massive resources to purchase and test MW-scale electrolyzer systems. But some things would help.

First, we need closer collaboration between academia and industry. Industrial partners can help define the operating conditions, failure modes, and performance targets that matter. Academic labs can bring the fundamental tools and analytical rigour to understand why materials fail.

We also need testing at lab-scale to be done under industrially relevant conditions, using standardised durability protocols, even if the electrode area is a few cm². Reporting catalyst performance measured only in 1M KOH at room temperature reveals little about how it will perform in a stack. Thankfully, steps are being taken in this direction. For example, the EU JRC has published harmonised testing protocols for low-temperature water electrolysers, but adoption remains slow.9EU harmonised protocols for testing of low temperature water electrolysers This needs to become standard practice.

More importantly, the electrolysis community needs standardised accelerated stress test (AST) protocols, comparable to what already exists for PEM fuel cells.10Kuhnert et al. reviewed AST for PEM Water Electrolysis Cells. The PEM AST protocols developed by the US DOE Fuel Cell Technical Team (FCTT) exist largely because automotive companies like Toyota, Honda, GM, and Hyundai were commercialising PEM fuel cell vehicles and needed standardised ways to benchmark the durability of the catalyst, membrane, and other stack components.11DOE Fuel Cell Program: Durability Technical Targets and Testing Protocols," US DOE FCTT

That commercial pressure drove the creation of common testing frameworks. No equivalent push has happened for electrolysis yet, and without a common framework for testing durability, it's difficult to compare results across labs or translate them to real-world expectations, although recent work has made progress in this direction.12Shviro et al. established standardized testing procedures for single-cell liquid alkaline water electrolysis cells, demonstrating reproducibility through round-robin testing with harmonized protocols.

These are the questions I keep returning to at conferences like ECS. The academic community does fundamental work that matters. Catalyst discovery, mechanistic understanding, and materials innovation all start in the lab. The question is how to make that work more relevant to the challenges of scaling green hydrogen technologies to gigawatt scale.

If any of this resonates with you, or if you would like to collaborate with us in driving the scale-up of novel catalyst materials, let's talk.