System overview
To carry out the present experiments, several key upgrades have been made. We provide here an overview of the experimental system (Extended Data Fig. 1).
A cloud containing millions of cold 87Rb atoms is loaded in a magneto-optical trap inside a glass vacuum cell. The Rb atoms are then loaded stochastically into programmable, static arrangements of 852-nm traps generated with an SLM (Hamamatsu X13138-02), and then rearranged with a set of 852-nm moving traps generated by a pair of crossed acousto-optic deflectors (AODs, DTSX-400, AA Opto-Electronic) to realize defect-free arrays61,62,63. We use D1 lambda-enhanced grey-molasses cooling to achieve a loading efficiency of 75% (ref. 64). Atoms are imaged with a 0.65-NA (numerical aperture) objective (Special Optics) onto a CMOS camera (Hamamatsu ORCA-Quest C15550-20UP), chosen for fast electronic readout times. The qubit state is encoded in mF = 0 hyperfine clock states in the 87Rb ground-state manifold, with T2 > 1 s (refs. 35,65), and fast, high-fidelity single-qubit control is executed by two-photon Raman excitation35,66. A global Raman path illuminating the entire array is used for global rotations (Rabi frequency of about 0.5 MHz, resulting in around 5 μs rotations with composite pulse techniques35) as well as for dynamical decoupling throughout the entire circuit (typically 1 global π pulse per movement). For this work, we upgrade our microwave source (Rohde and Schwarz, SMW200A) and increase our intermediate-state detuning to 550 GHz (measured scattering error of 5 × 10−5 per robust SCROFULOUS pulse). Fully programmable local single-qubit rotations are realized with the same Raman light but redirected through a local path, which is focused onto targeted atoms by an additional set of 2D AODs. To realize high-fidelity, programmable single-qubit pulses, we have made upgrades to our single-qubit addressing to use direct Raman X-type rotations (see section ‘Local single-qubit gate details’). Entangling gates (270-ns duration) between clock qubits are performed with fast two-photon excitation using 420-nm and 1,013-nm Rydberg beams to n = 53 Rydberg states, using a time-optimal two-qubit gate pulse67 detailed in ref. 36, in this work, with the 420-nm laser red-detuned by 4.8 GHz from the intermediate state. During the computation, atoms are rearranged with the AOD traps to enable arbitrary connectivity35. An important upgrade in this work is the ability to perform non-destructive qubit readout, enabling loss detection as well as qubit reuse. We realize this with a 1D optical lattice16, which pins one of two spin states, use optical tweezers to separate the pinned and unpinned states and then image the atom position. To further enable mid-circuit qubit measurement and reuse on large arrays, we develop methods of low-loss, high-fidelity qubit readout and re-initialization, while only needing moderate trap depths (see below). We use these techniques here for reusing atoms and extending the depth of error-corrected computation.
The quantum circuits are programmed with a control infrastructure consisting of five arbitrary waveform generators (AWG) (Spectrum Instrumentation), as shown in Extended Data Fig. 1b, synchronized to less than 10-ns jitter. The two-channel rearrangement AWG is used for real-time rearrangement, the two channels of the Rydberg AWG are used for entangling gate pulses and for local SLM detunings, the four channels of the Raman AWG are used for IQ (in-phase and quadrature) control of a 6.8-GHz source35,66 (the global phase reference for all qubits) and pulse-shaping of the global and local Raman driving, the two channels of the Raman AOD AWG are used for displaying tones that create the programmable light grids for local single-qubit control, and the two channels of the Moving AOD AWG are used for controlling the positions of all atoms during the circuit.
In this work, we realize circuits as long as 1.1 s for the experiments in Figs. 5 and 6. To realize this with the AWGs, we generate a memory segment for one circuit layer for the Moving AWG, Rydberg AWG and Raman AOD AWG, and then loop these identical memory segments for each layer. This is complicated for the Raman AWG as phase continuity needs to be ensured, and so, for simplicity in this work, we program the whole Raman waveform directly. We fill the entire memory of the Spectrum AWG, and this is what limits our experiments to 27 layers here (and then we choose an appropriately sized reservoir to have atoms for that many layers). Future work will benefit markedly from improved waveform streaming.
Details of processor configuration
Our approach to quantum processing is highly programmable. However, we find that each new atomic layout design behaves slightly differently11,35,68. A close analogy here is that we can design and print a new chip every time we change our processor design, but each one requires its own specific characterization and calibration. We observe that—although each configuration we create can be slightly different and can have its own specific challenges—with sufficient characterization and optimization, we can recover ‘nominal’ performance (that is, consistent with a simple single-qubit and two-qubit error model) and that such a configuration is stable and reproducible once it has been properly set up.
For example, we detail some example circuit configurations that required different degrees of characterization in this work. In the repeated QEC rounds on the surface code, we were careful to engineer the circuit structuring such that the time would perfectly echo on each qubit. This was greatly facilitated by the symmetric four-gate structure of the stabilizer syndrome extraction circuit. For example, although the local Raman pulses are applied row by row, we ensure that the overall amount of time in superposition—although different for each atom—echoes around a central global π pulse. However, although this enabled us to ensure the total time echoed, the specific structuring and parity of pulses prevented us from ensuring the overall atomic trajectory echoed11. As such, we had to be more careful with homogenizing the AOD trap power over the surface code region. Conversely, to realize the programmable hypercube codes, we confined ourselves to the general encoding circuit in which even the total time did not echo on each qubit, which thereby greatly affected performance. These illustrate that each circuit we realize is different and, although we find it is always possible to achieve correct, ‘nominal’ fidelities, sometimes our layout and circuit design require multiple iterations to find a suitable approach.
We now detail some more specific aspects of the processor designs used in this work.
Surface code
For surface code experiments (Figs. 1–3), the same static traps are used for mid-circuit storage of ancilla blocks and for readout of all qubits at the end of the computation. The readout zone is 12 rows tall (55 μm) with two rows of traps per atom for the lattice readout (Fig. 2a). Six blocks of qubits are interlaced horizontally for storage, corresponding to five 6 × 6 ancilla blocks and one 5 × 5 data block in Fig. 2 and four 6 × 6 ancilla blocks and two 5 × 5 data blocks in Fig. 3. This interlacing ensures that the dimensions of each qubit block is the same in both the readout and entangling zones, preventing heating from AOD intermodulation effects that we observe when compressing or expanding the AOD grid. Two additional columns of traps form a small reservoir used for initial rearrangement, resulting in a total array width of 165 μm.
The 420-nm and 1,013-nm Rydberg tophat beams cover 7 rows of gate sites in the entangling zone and are homogenized to about 1% peak-to-peak variation over a vertical extent of 60 μm. The entangling zone is separated by 40 μm from the storage and readout zone (overlapping in these measurements) to ensure negligible error on stored qubits from the tails of the Rydberg beams.
Deep circuits
For deep-circuit experiments (Figs. 5 and 6 and the same configuration used in Fig. 4), we choose the same 60 μm vertical extent for the entangling zone as above. Within this zone, entangling gates are performed simultaneously on up to 256 qubits across 8 rows and 16 columns of gate sites with a horizontal extent of 175 μm. Below the entangling zone is the readout zone, used for measurement and re-initialization of up to 128 atoms arranged in four rows. This region is illuminated by counterpropagating imaging and cooling beams (beam waist 50 μm) as well as the 1D lattice beams (average waist 60 μm) as shown in Extended Data Fig. 1d.
During mid-circuit imaging, atoms are always held in the storage zone, 50 μm from the entangling zone. To preserve coherence of qubits during the imaging, the storage zone is illuminated by a 1,529-nm shielding beam with a beam waist of 35 μm, matching the vertical extent of the zone. These design parameters ensure both that stored atoms do not pick up errors from Rydberg beams, as described above, and also that negligible 1,529-nm light reaches the readout zone and so does not cause spurious lightshifts on the imaging and cooling transitions (see ‘The 1,529-nm shielding beam’). Finally, the reservoir is located directly below the readout zone and contains up to 196 atoms in six rows.
The trap intensities in the entangling and storage zones are set to half of those in the readout and reservoir zones to improve qubit coherence. This is achieved by modifying the target trap intensities in the trap generation algorithm63. We centre the array on the zeroth diffraction order of the trap SLM to maximize the deflection efficiency.
In our first attempt at deep-circuit processing, we made multiple processor design decisions that are suboptimal and affected our fidelity, which we list here. There is no fundamental reason for these, and after the implementation of the first circuits here, these can be readily improved for future experiments.
-
Re-initialization with local Raman led to an overly sensitive re-initialization procedure that complicated calibration.
-
Imperfect echoing due to a lack of symmetry in the generalized hypercube encoding circuit led to high sensitivity to trap depth variations.
-
We shifted the trap path, and this led to an exacerbated AOD intermodulation effect that seems to cause significant heating and a reduced T1.
-
Our SLM array had trap depth inhomogeneity, affecting cooling performance (Fig. 5c) and exacerbating improper echoing issues.
-
The specific Rydberg tophat beams we used here had an exacerbated inhomogeneity of about 2–3% peak-to-peak variation.
-
We found that performance is sensitive to the 1,529-nm beam profile, requiring homogeneous coverage in the storage zone due to complex resonances, while preventing illumination on the readout zone.
-
Magnetic field noise from the current supply affected coherence preservation during the between-layer idle times.
These imperfections limited these deep-circuit measurements in particular, and by fixing them, we can then recover nominal performance, corresponding to operating at about 2× below threshold as measured in Fig. 2. The section ‘Error budget and path to 10× below threshold’ describes expectations on how we can improve this nominal performance further to scales approaching 5–10× below threshold.
Spin-to-position conversion with a 1D optical lattice
We realize non-destructive qubit readout throughout this work through spin-to-position conversion16,69,70,71 (Extended Data Fig. 2). A 1D optical lattice is formed by two 795-nm counterpropagating local beams, both sourced from the same titanium:sapphire laser (M Squared) and operated at 50–200 GHz blue-detuned of the D1 line. Both beams are σ− polarized such that |F = 2; mF = −2⟩ is a dark state and |F = 2; mF = +2⟩ experiences a maximum lightshift of approximately 6 MHz, corresponding to approximately 300 kHz trap frequency in one axis. The close detuning is a balance between minimizing off-resonant coupling to the D2 line for the dark state and reducing scattering and heating from the lattice light. As the clock state qubit is used for computation, for readout, we first optically pump |F = 2; mF = 0⟩ into the dark state with 780-nm σ−-polarized light resonant to F = 2 to F′ = 3, which is co-propagating with one port of the lattice. To suppress the probability of scattering into the dark state during readout, we further transfer |F = 1; mF = 0⟩ to |F = 2; mF = +2⟩ (bright state), which also increases the trap depth. This is achieved either by a coherent Raman transfer or with incoherent σ+-polarized 780-nm repumper from F = 1 to F′ = 3. We use the former approach in all surface code experiments (Figs. 1c, 2 and 3) and the latter in deep-circuit experiments (Figs. 1b, 4 and 6), finding comparable performance from both methods.
Following these state transfers, the lattice is ramped up adiabatically over approximately 100 μs. AOD tweezers pick up atoms in the dark state and move them by approximately 2 μm over approximately 500 μs; during this, atoms in the bright state are pinned in place by the stronger confinement of the lattice. Finally, the lattice is ramped down and a conventional camera-based readout then images the position of the atom, allowing identification of the spin state as well as loss detection. Using the data in Fig. 1b, we measure an error probability of 0.87(7)% for the dark state, 0.05(5)% for the bright state, and a 0.24(2)% probability of loss. The asymmetric error arises from trade-offs when simultaneously optimizing for loss and readout fidelity and can be tuned to be more balanced. Typically, owing to the pumping fidelity, the dark state error is at least about 0.3% higher than the bright state.
We remark that non-destructive qubit readout has previously been realized using stretched states72,73; however, it also requires several times deeper traps than the present approach. More recently, related protocols for fast, high-fidelity readout have been realized across several platforms74,75,76,77.
One-dimensional and finite-field operation for imaging and cooling
For local cooling and imaging, we use two counterpropagating 780-nm beams with opposite circular polarization47,78 (Extended Data Fig. 3). The beams are red-detuned from F = 2 to F′ = 3 and have a variable relative detuning; the σ+-polarized beam additionally contains a small repump component. Conventional methods based on polarization-gradient cooling (PGC) require zero magnetic field; however, mid-circuit operation requires a finite magnetic field to maintain the quantum state of active qubits. To this end, we develop a scheme for 1D PGC in a finite magnetic field. PGC is based on a linear polarization rotating along the beam propagation direction, which produces a population imbalance within the hyperfine levels47; in a finite field, this imbalance is disturbed, and the cooling mechanism breaks down47. By transforming to a frame in which the polarization rotates in time, a fictitious field appears that cancels the external field and restores the cooling effect. This condition is achieved by detuning the two counterpropagating beams—which are parallel/antiparallel to the external magnetic field—by two times the Zeeman splitting of adjacent mF levels. As shown in Extended Data Fig. 3e, this detuning method works across the full range of magnetic fields studied (up to 8.6 G). Furthermore, it is broadly applicable to finite-field implementation of any 1D technique based on the same polarization configuration used here, for example, grey-molasses cooling64,79.
Although this finite-field PGC is sufficient to image without loss, we add a second stage of EIT cooling to further reduce the atom temperature48. The scheme, shown in Extended Data Fig. 3f, uses the same beams as the PGC imaging and requires only changing to be blue-detuned of F = 2 to F′ = 2 (by about 80 MHz) and reducing the power in one of the beams. As the cooling is uniaxial, it is a priori unclear if all three motional degrees of freedom can be cooled with these techniques. Using both drop-recapture measurements and adiabatic ramp-down measurements of the atom temperature80, we probe the radial and axial atom temperature and find them both to be comparable to 3D techniques (Fig. 5c and Extended Data Fig. 3f). Furthermore, the steady-state temperature and loss is set only by the EIT cooling fidelity and is independent of the degree of heating introduced from the previous circuit.
The 1,529-nm shielding beam
To preserve the coherence of qubits in the storage zone, we illuminate them with a single beam of 1,529-nm light (Extended Data Fig. 4). By coupling the 5P3/2 state to the 4D5/2 state, we impart a strong Stark shift on the excited 5P3/2 state49. This causes probe light in the readout zone to appear off-resonant to the storage-zone atoms while maintaining qubit information in the hyperfine manifold of the ground state, see Extended Data Fig. 4a. The beam is generated by a Connet CoSF-D series 10W fibre laser and is focused down to an elliptical waist of 35 μm × 65 μm. The shorter waist of the beam is aligned vertically to the centre of the storage zone. We image the beam in a 4f system and apply a knife-edge in the image plane, approximately four beam waists from its centre, to suppress its Gaussian tail. Stray 1,529-nm light, even at low powers, can degrade the imaging quality in the readout zone. We find, therefore, that beam shaping is important for maintaining stable imaging quality and coherence on the storage-zone atoms for the layout of our array.
We measure dephasing of the storage-zone qubits, as a function of detuning from the bare transition, whereas readout-zone qubits are illuminated with local probe and repumper light (Extended Data Fig. 4b). We capture the key features of the spectrum with a simple model in which the additional dephasing at each drive power scales as \(\exp \left(-\frac{{\varOmega }_{{\rm{probe}}}^{2}{\varGamma }_{{\rm{probe}}}t}{4{\varDelta }_{{\rm{LS}}}^{2}}\right)\) where Ωprobe and Γprobe are the Rabi frequency and scattering rate of the local imaging beams, respectively, t is the illumination time and ΔLS is the calculated lightshift of 5D3/2 due to the coupling to 4D5/2 and 4D3/2. More complex on-resonance or multi-level features are not captured by this simple model and are particularly sensitive at detunings between the resonances of the 4D levels81. During all experiments with qubit reuse and local imaging, we address the storage zone at 1,529.49 nm with about 1.2 W. This corresponds to an approximate lightshift of 6 GHz on the 5P3/2 state. To further characterize the 1,529-nm laser, we explore varying the detuning of the local imaging light and observe a clear Autler–Townes splitting (Extended Data Fig. 4c). We find that, as expected, the separation of two fitted Lorentzian peaks scales linearly with the square root of the drive power.
Repeated rearrangement from reservoir
The mid-circuit image identifies the qubit state as well as which atoms are lost. Before rearrangement, the atoms are recombined into their original tweezers, balancing the trap depth between the AOD and SLM tweezers to minimize loss and using cooling throughout. After this recombination, we fill empty sites using the reservoir.
In each round of rearrangement, target rows are refilled sequentially with one parallel step per row. All atoms in each step are sourced from a single reservoir row. We choose efficient horizontal moves and optimize the reservoir-to-target row pairings to minimize travel distance. Finally, as the local imaging beams do not cover the full extent of the reservoir, the reservoir site occupancies are stored from a global image before the circuit begins and used reservoir atoms are tracked in software. This leads to a slowly growing rearrangement infidelity.
Mid-circuit re-initialization
After qubits have been measured and atom loss refilled, the spin state is re-initialized to reuse the qubit. This local state preparation is performed in the readout zone using a Raman-assisted optical pumping scheme35,82. Local Raman is used for the coherent π-pulses, and the local probe beams are used for resonant depumping of the F = 2 manifold. Owing to the close horizontal spacing of traps in the readout zone, we minimize crosstalk between local Raman tweezers by alternating the applied π-pulses between odd and even columns. We perform 24 cycles of pumping per atom over a few hundred microseconds.
Local single-qubit gate details
Single-qubit gates are performed using Raman transitions as described in ref. 11, with several changes to allow X(θ) rotations to be directly implemented with high fidelity. The key challenge for local X gates is ensuring polarization homogeneity, as the Rabi frequency is sensitive to the degree of circularity. We find inhomogeneity both across the array, introduced by a sharp dichroic cut-off noted in ref. 11, as well as inhomogeneity within each optical tweezer due to polarization breakdown near the tweezer focus. To reduce the first effect, we add a second copy of the dichroic into the path with a half-waveplate between the pair, such that any angle-dependent phase shifts on reflection from the dichroics are equally applied to both the s- and p-polarized components and the polarization remains close to circular. Second, polarization breakdown of a circularly polarized tweezer results in an off-axis fictitious field with components both parallel and perpendicular to the external magnetic field (in the plane of the tweezer focus)83; the parallel components can drive Raman transitions and result in dephasing of the clock qubit. As the magnitude of the maximum off-axis field falls off linearly with tweezer waist, we mitigate this by increasing the waist to 2.5 μm. Finally, to increase the projection of the Rabi frequency drive along the magnetic field axis, we displace the Raman beam by roughly 1.5 mm within the back aperture of the objective whose size is 5.5 mm, so that the Raman beam comes in at an angle. For all single-qubit gates in this work, we use robust SCROFULOUS pulses84.
AOD intermodulation effects
We observe several intermodulation effects from the AODs that can result in degraded performance for specific AOD moves. First, it is important to ensure that the frequency tones in a given AOD axis are in an exact frequency comb, as intermodulation can lead to interference and beating near trap frequencies. Second, we observe here that the relative frequencies of the X-frequency spacing and Y-frequency spacing are also important and that when beat notes of these are near trap frequencies, it can also lead to heating. As such, we now primarily use the AODs for translations, avoiding compressions or expansions of the grid when possible, and choose incommensurate spacings for X and Y to avoid accidental cross-resonances.
Analysis of error correlations
Correlations in errors, in either space or time, can have important implications on QEC. Here we explain various correlation analyses in our system.
De-correlation of global coherent errors by projective measurement
Parallel control enables us to, for example, realize a transversal entangling gate with a single global pulse of our entangling laser11. We may be concerned that such a global control can lead to globally correlated errors that can affect error correction performance. However, error correction natively de-correlates these errors.
Consider a code block of qubits with X and Z stabilizers. Applying a global θ will map each of the X operators to → (X + iθY) = X ⋅ (1 − θZ). Consequently, measuring the X-basis component of this qubit will probabilistically lead to a Pauli Z error on this site with probability θ2. Note that, for global rotation θ, the logical operator XL = XXXX … maps to → (X + iθY)(X + iθY)(X + iθY)(X + iθY) … = XXXX … + iθYXXX …. + (iθ)dYYYY … . As such, for small θ, logical rotations are exponentially suppressed with the code distance d. As such, although all the physical qubits receive a global rotation θ, the logical qubit state does not receive that same rotation, and after syndrome measurements, these errors are converted into incoherent-type errors and can be corrected. This is the basis behind the observed suppression in Fig. 1. In Extended Data Fig. 7b, we further show that the error correction prevents an unintended logical rotation. The logical rotation here is even further suppressed by the random stabilizer signs (below).
Role of stabilizer signs under coherent errors
Stabilizer signs affect the response of the logical qubit to global coherent rotations. For transversal non-Clifford gates, deterministic stabilizer eigenvalues (for example, = +1) are necessary to correctly implement the logical gate (Fig. 4). By contrast, Clifford circuits allow the eigenvalues to be either +1 or −1, as the signs can be simply tracked through the circuit, giving freedom to engineer how coherent errors interfere. For example, choosing negative signs can generate decoherence-free subspaces85 and give greater robustness of the logical operator against coherent errors. The same principle also suppresses logical coherent errors during computation38. In particular, stabilizer measurement projects the logical state onto a specific stabilizer configuration with random ±1 values, which corresponds to a random configuration of physical X and Z flips, which do not commute with the coherent rotation. On top of the exponential suppression of logical coherent errors, this further results such that the specific rotation angle of scale about θd is random on each shot, effectively turning these again into incoherent errors on the logical level.
Decay to Rydberg P states
In ref. 36 (extended data figure 7), we analysed the presence of weak correlations between CZ gate errors seen in a repeated randomized benchmarking sequence, and speculated that the origin of these may be due to decay to atomic Rydberg P states. Concretely, during Rydberg gates, roughly 0.07% of the error budget is decay of Rydberg atoms to adjacent Rydberg P states. These states have a strong, long-ranged interaction with the Rydberg S states that are used for the gates and can thereby affect gates occurring in a different site, and moreover can have lifetimes of more than 100 μs. During repeated benchmarking sequences, such as those in ref. 36, we have only 4 μs between gates, and consequently, Rydberg P atoms can survive for many layers of gates and corrupt gates in distant sites.
In Extended Data Fig. 6g, we plot the CZ gate fidelity in a repeated benchmarking sequence as a function of the duration between the gates and find that the gate fidelity in this array increases from 99.3% to 99.5% by increasing the duration between gates to 100 μs, as the Rydberg atoms decay or eject during that time. This also implies that the reported gate fidelity in ref. 36 may have been affected by this effect and thereby underestimating the maximum gate fidelity. Analogously, we observe that this gate fidelity reduction is removed by reducing the atom density. In quantum circuits based on atom motion, the duration between gates is sufficiently long (for example, 400 μs for surface code repeated stabilizer measurements) for these Rydberg P states to decay or eject86, which natively fixes this issue, and consequently we do not observe these effects during quantum circuits.
Surface code measurements
In our repeated QEC on the surface code, we search for unexpected error correlations by plotting the distribution of detector errors in a shot. We find these are closely consistent with the expected distribution as seen by Clifford simulations that assume uncorrelated one- and two-qubit errors. Although these data are composed of only approximately 105 detector rounds (14,855 shots, 96 detectors per shot across five rounds), it is nevertheless indicative of the absence of these events. We have yet to observe error burst events such as those observed in solid-state systems7.
Logical teleportations and deep-circuit measurements
In a physical system, diverse physical errors and imperfections can cause complex correlations. For instance, a leakage event can lead to complex correlations that—without its knowledge—can greatly affect QEC performance. As discussed in the next section, incorporating logical teleportations in an architecture can ensure that these errors are removed. In Fig. 6, we verify that these teleportations rapidly remove errors and ensure that errors are not correlated in either time or space. It is important that the atomic qubits are re-initialized properly for this to work. In Extended Data Fig. 10c,d, we plot the correlations when turning off cooling and sorting and find that correlations in this case do not rapidly decay.
Loss detection for improved QEC
Leakage types with neutral atoms
Leakage errors, which take the qubit out of the two-level computational subspace, are important to account for in error correction. The three dominant leakage errors with neutral rubidium (or other alkali) atoms are as follows:
-
Loss events: these events are when the atom is physically lost from the optical trap. Owing to the blockade nature of the gate, doing a gate with a lost atom simply turns off the gate while still applying gate error (it is identical to the atom being in state |0⟩, which is also dark to the Rydberg laser).
-
Leakage to other hyperfine states in the ground-state manifold: in the limit of a large magnetic field, these states are off-resonant and behave the same as a lost atom (turning off CZ gates). However, they are not detected through loss detection. Moreover, in the practical operating conditions of 8.6 G, the level spacings of 6 MHz (compared with the Rabi frequency of 4.6 MHz) mean that adjacent hyperfine states can still off-resonantly couple to the Rydberg state and could lead to repeated errors.
-
Leakage to Rydberg states: population left in the Rydberg manifold can affect subsequent gates and can lead to large error correlations. For example, many-body Rydberg evolution in dense systems observes the so-called avalanche errors in which a macroscopic fraction of the system has an error87. In our approach, with a low atomic density and several hundred microseconds between gates, the Rydberg atoms (theoretically) either decay to the ground state or are expelled from the tweezer. In this way, such Rydberg leakage converts into an error within the computational subspace, a leakage into adjacent hyperfine states, or a loss event.
We observe that with several hundred microseconds between gates, the effects of Rydberg leakage are not apparent, and during our repeated QEC data, we observe that our leakage is at least 80% loss (Extended Data Fig. 5b).
Effect of loss during repeated QEC
Although losses simply turn off subsequent gates, these lead to distinct signatures that are important to account for in the QEC design. Whereas ancilla loss is detected in the projective measurement, losing a data qubit corresponds to unknown loss of a degree of freedom from the system39,88. Without adjusting the stabilizer measurement pattern to account for such a loss, the ancilla qubits now are measuring operators that anti-commute with each other, and thereby lead to a ‘flickering’ pattern around the lost data atom. This flickering pattern is akin to the expected behaviour in a subsystem code89 and means that the flickering can continue for arbitrarily long times. Without accounting for the loss, this then appears as strong time correlations that we observe in Extended Data Fig. 5b.
Erasure information and superchecks
It is useful to detect atom loss for two reasons. First, knowing about the lost atom greatly enhances the decoding performance. Although bit-flip and phase-flip errors can be inferred by stabilizers, direct detection of qubit errors—or so-called erasures—means that we already have direct information about where the errors are. This erasure information can thereby greatly improve decoding performance8,9,20,31,90,91. For example, although only (d − 1)/2 Pauli-type errors can be corrected, up to (d − 1) erasure-type errors can be corrected. We do not detect erasures as soon as they occur; instead, we detect them at the final qubit measurement, constituting delayed-erasure information.
Second, although lost atoms lead to anti-commuting stabilizer measurements and a flickering error pattern, these can be accounted for with the use of so-called superchecks39,88 (Extended Data Fig. 5a). Although individual stabilizer checks around a lost atom are anti-commuting, taking products of multiple checks creates superchecks, which again commute with each other. We find in Fig. 2b that these superchecks are able to remove the sharp rise in detected error that occurs with increasing data loss.
Decoding
MLE and error-model tuning
To decode the surface code experiments in Figs. 1–3, we use the delayed-erasure MLE decoder described in ref. 9, augmenting the MLE decoder developed in ref. 40 to leverage loss information. In particular, the MLE decoder takes as input the stabilizer measurements and the probabilities of the physical error sources in the circuit, and outputs the most likely combination of errors consistent with the syndrome. We construct the circuit error model using Stim92 to initially contain information about the Pauli error sources in the circuit, then update it for each shot to reflect the detected atom losses. In particular, after an atom is lost, all subsequent gates are cancelled, generating different potential errors depending on when the loss occurred. We, therefore, consider all potential locations a qubit loss could have originated (for example, initialization, gates, movement, or idling before measurement), then add each of the resulting error patterns and their probabilities to the error model for that shot. Errors producing the same syndrome are combined into a composite error mechanism, and their probabilities are correspondingly reweighted, as in ref. 92. Note that this process explicitly accounts for both propagated Pauli errors from the gate cancellations and the invalidation of stabilizers, which are handled using superchecks.
To optimize the performance of the MLE decoder, we fine-tune the probabilities of different error sources in the circuit error model. In particular, we associate each physical operation with both a Pauli and a loss error rate. The error probabilities in these channels are then treated as variables, which we optimize using the covariance matrix adaptation evolution strategy93 to minimize the logical error rate on a dataset of approximately 10,000 shots (different shots from the final dataset used for evaluating the fidelity).
To quantify the benefit of using loss information, in Fig. 2 the ‘bare MLE’ decoder does not update the circuit error model based on the loss information, and assigns each loss event to a |0⟩ measurement. We find the loss information improves the measured d = 3/d = 5 error ratio from 1.24(5) to 1.69(8).
Finally, we quantify the confidence of the MLE correction for each shot by comparing the probability of the most-likely error P0 and the probability of the most-likely error that gives the correction to the logical Pauli observable P1 (refs. 94,95,96,97). The more similar these two error probabilities are, the less confident the decoder is in its correction. We can, therefore, postselect on increasing P0/(P0 + P1) to improve the accuracy of the results, which we use in Fig. 3d when studying repeated logical gates.
Machine learning decoder for surface code
We use a fully connected neural network to decode measurement outcomes from the Fig. 2 surface code experiment using machine learning32,98,99,100. The decoding task is formulated as a supervised binary classification problem: the input features are measurement outcomes from the experiment, and the output is a label indicating whether the initial state was |0L⟩ or |1L⟩. The machine learning architecture is a fully connected feedforward network comprising four linear layers, each followed by batch normalization and a Gaussian Error Linear Unit (GELU) activation, as shown below:
decoder = nn.Sequential(
nn.Linear(input_size, 1024),
nn.BatchNorm1d(1024),
nn.GELU(),
nn.Linear(1024, 512),
nn.BatchNorm1d(512),
nn.GELU(),
nn.Linear(512, 256),
nn.BatchNorm1d(256),
nn.GELU(),
nn.Linear(256, 1),
nn.Sigmoid()
)
Training proceeds in three stages: raw training, ensembling and fine-tuning.
Raw training
We begin by training the decoder on simulated data generated through circuit-level simulations that incorporate both Pauli and loss errors. Measurement outcomes take values of 0, 1 or 2, corresponding to the qubit being in the |0⟩ state, the |1⟩ state or being lost, respectively. These are one-hot encoded, so the feature vector of a given shot is 3 × (number of measurements). To create balanced training data, random software flips are applied with probability 1/2 along the relevant logical operator, yielding ensembles of |0L⟩, |1L⟩ for the Z memory and |+L⟩, |−L⟩ for the X memory. Apart from the raw {0, 1, 2} measurement values, we provide the neural network with calculated detector outcomes and logical operator values. These additional features help the model learn from structured correlations in the data. Detector values are computed as binary parities (0 or 1) over specified stabilizer regions; if a measurement gives a loss (2), it is assigned a value of 0 when computing detector parities. Logical operator values are calculated along each row or column, depending on the basis. We use a hidden layer size of 1,024, the Adam optimizer with an initial learning rate of 10−3, and a weight decay of 10−2. The learning rate is decreased by a factor of 0.3, if the validation loss does not improve for 10 epochs. Training is performed independently for 10 total experimental configurations: two with code distance d = 5 (in the Z and X bases) and eight with d = 3 (covering four spatial quadrants in both bases). In the pre-training phase, each model is trained on 200 million simulated shots and validated on 20 million simulated shots. We find that decoder performance is largely robust to small perturbations in the error model, and thus, precise tuning of simulation parameters is not necessary. For a batch size of about 104 shots, the inference time per shot is 0.33 μs on a GPU (NVIDIA-A100).
Although the machine learning decoder used here is not directly scalable to high-distance codes—for example, requiring re-training for each code distance and specific circuit, with the number of training samples growing exponentially—exploring different extensions of these neural network architectures for scalable decoding is an interesting direction of ongoing research32,50,100.
Ensembling
To account for training variability and enhance robustness, we repeat the full training procedure with 10 different random seeds, resulting in 10 independently trained models per experiment. These are ensembled together by computing the geometric mean of their output probabilities. The resulting ensembled machine learning decoder achieves a logical error per round (LEPR) of 0.78(4)% for d = 5 and 1.37(3)% for d = 3.
Fine-tuning
To improve decoding performance, we fine-tune each pre-trained decoder on experimental data taken from designated training sets (independent of the final dataset). For the d = 5 decoders, we fine-tune on approximately 37,000 shots per basis. For the d = 3 decoders, we use approximately 2,500 shots per basis, per quadrant. The neural network architecture remains unchanged, and fine-tuning is performed using the Adam optimizer with a learning rate of 10−3 and a weight decay of 8 × 10−2. The resulting ensemble of fine-tuned machine learning decoder achieves an LEPR of 0.71(4)% for d = 5 and 1.33(4)% for d = 3.
Hybrid
When comparing the MLE and machine learning decoders, we find they do not predict the same logical state on all shots and, in particular, differ on shots in which one of the decoders has low confidence in its prediction. To further enhance performance, we, therefore, construct a hybrid decoder that combines the output confidences of the ensembled machine learning decoder with those from the delayed-erasure MLE decoder, in which the MLE confidence is derived from comparing the probabilities of the most-likely error and the most-likely error that gives the opposite logical outcome. The final prediction is given by the weighted geometric mean of the two confidence values with weights of 0.4 and 1 for the MLE and machine learning, respectively. This results in a final value for the reported LEPR of 0.62(3)% for d = 5 and 1.33(4)% for d = 3, which corresponds to the machine learning with loss decoder reported in Fig. 2.
MLE decoder for lattice surgery
In the lattice surgery experiment (Fig. 3c,d), we perform a joint ZZ measurement using additional stabilizer checks along the common vertical edge between the two surface codes (‘seam’). We start with both codes prepared in |+L⟩ and perform two rounds of stabilizer checks on the effective d × 2d surface code lattice, measuring the stabilizer checks of both the codes and the new seam checks in each round. To measure the ZZ parity of the resulting logical Bell state, with the delayed-erasure MLE decoder9,40, we use two decoding procedures. First, we use only the ancilla measurements to obtain the result of the lattice surgery \({Z}_{1}^{L}{Z}_{2}^{L}\) measurement given by the product of the seam Z stabilizers. Second, we obtain \({Z}_{1}^{L}{Z}_{2}^{L}\) directly from the data qubit measurements, using the previous ancilla measurements in decoding. This measures the ZZ parity of the logical Bell state obtained using lattice surgery. Note that the seam checks from the final data qubit measurement are not included. A shot is counted as an error if these two decoding procedures disagree.
To obtain the XX Bell state parity, we measure all data qubits in the X basis and decode the joint \({X}_{L}^{1}{X}_{L}^{2}\) operator spanning both codes. The final logical error probability is given by the mean of the XX and ZZ parities.
Machine learning decoder for deep circuits
To decode the 1D and 2D cluster states of logical [[7, 1, 3]] and [[16, 6, 4]] codes in Fig. 6, we use a convolutional neural network. As error correlations in the cluster state do not propagate beyond two CZ gates (Fig. 6e), a convolutional window of size 3 is sufficient to capture the relevant correlations. The decoder architecture comprises three components: an encoder, a convolutional block and a readout module. Both the encoder and readout are constructed from linear layers interleaved with GELU activations, with hidden_size = 128.
encode = nn.Sequential(
nn.Linear(input_size, 1024),
nn.GELU(),
nn.Linear(1024, 512),
nn.GELU(),
nn.Linear(512, 256),
nn.GELU(),
nn.Linear(256, hidden_size)
)
readout = nn.Sequential(
nn.Linear(hidden_size*8, 512),
nn.GELU(),
nn.Linear(512, 256),
nn.GELU(),
nn.Linear(256, 128),
nn.GELU(),
nn.Linear(128, out_size)
)
The convolutional block applied between the encoder and readout modules is defined as follows:
conv = nn.Sequential(
nn.Conv2d(hidden_size, hidden_size*2,
kernel_size=3, padding='same'),
nn.GELU(),
nn.BatchNorm2d(hidden_size*2),
nn.Conv2d(hidden_size*2, hidden_size*4,
kernel_size=3, padding='same'),
nn.GELU(),
nn.BatchNorm2d(hidden_size*4),
nn.Conv2d(hidden_size*4, hidden_size*8,
kernel_size=3, padding='same'),
nn.GELU(),
nn.BatchNorm2d(hidden_size*8),
)
For 1D cluster state decoders, we replace the 2D convolutions with 1D convolutions and omit the batch normalization layers.
Training is performed using circuit-level simulations. The decoder is tasked with inferring the signs of the logical cluster state stabilizers, which are of the form X on a given qubit and Z on its neighbours. By performing measurements in alternating X and Z bases, half of the stabilizers can be reconstructed. The remaining stabilizers are recovered by repeating the experiment with the measurement bases swapped. The decoder predicts the stabilizer signs by inferring the initial state of the qubits measured in the X basis. All logical qubits are initialized in the |+L⟩ state, and software logical Z flips are applied with probability 1/2 to those measured in the X basis, to generate a balanced training set.
The decoder input includes the raw measurement outcomes (0, 1 or loss), detector values computed from the measurements, and the raw logical operator values, similar to the input format used in the surface code decoder. We train four distinct decoders: one for each combination of code type ([[7,1, 3]], [[16, 6, 4]]) and cluster state geometry (1D, 2D). Each model is trained on more than 100 million simulated shots.
For further details on this decoder architecture, and on machine-learning-based decoders for general quantum algorithms, see ref. 50.
Benchmarking surface code performance
NZNZ stabilizer gate pattern
Here we describe the effective distance, defined as the minimum number of physical errors required to create a logical error, of d rounds of repeated syndrome extraction using alternating N or Z movement patterns (Extended Data Fig. 6d). By alternating gate orderings, the effective distance is close to the optimal value. To see this, note that without alternating orderings, the effective code distance in the rotated surface code is reduced by a factor of 2 because of hook errors6 (from a theoretical perspective, see discussion at the end). A hook error is a physical error on the ancilla qubit halfway through the stabilizer measurement cycle that propagates onto two data qubits oriented parallel to the corresponding logical operator (for example, XL for a physical X error). One of these propagated data qubit errors is immediately detected, whereas the other is detected in the following round by the next-nearest stabilizer along the direction of error propagation. As a result, if the same gate ordering is used for each round of stabilizer measurements, a sequence of \(\lceil \frac{d+1}{4}\rceil \) hook errors, one occurring in each round along the direction of error propagation, can generate a logical error on correction. This issue is circumvented by alternating gate orderings between rounds, as only every other round has the unfavourable propagation. In this case, \(\frac{d-1}{2}\) physical errors on consecutive rounds are needed to generate a logical error.
In our experiments in particular, we choose an ordering of NZZrNr, where r represents performing the reverse ordering (Extended Data Fig. 6d). Apart from this structuring helping preserve fault-tolerance against hook errors, we also note that the dominance of Z-type errors means that most errors do not lead to propagated errors between the middle two CZ gate layers. Owing to these reasons, in simulations, we do not observe that having spatially alternating N and Z patterns helps performance (not plotted). Although the d = 3 colour codes studied here could also suffer from hook errors under repeated stabilizer measurement, we similarly expect that they would be robust to these errors with increased code distance. Moreover, we note that as studied in ref. 11, Steane-style QEC can be effectively used for fault-tolerant syndrome extraction on colour codes in neutral atom systems.
Simulations
We perform simulations using the Stim simulation package92. We sample both Pauli errors and qubit losses. Pauli errors are generated using the sampling routines of Stim, based on circuit-level noise models. Qubit losses are sampled according to the loss probabilities associated with each instruction, and when a loss occurs, subsequent gates acting on the lost qubit are removed to reflect the absence of the qubit. The simulations detect {0, 1, loss} during qubit readout, similar to that in our experiments. For each set of physical parameters, we estimate the logical error rate by Monte Carlo sampling. Logical errors are declared when the prediction of the decoder for the logical observable differs from the true value. See Supplementary Information for details of the noise model, as well as the discussion below.
Analysis of below-threshold performance for deep circuits
In Fig. 2, we perform four rounds of repeated QEC as a benchmark. However, increasing the circuit depth can affect the threshold in various ways, depending on the particular circuit. Extended Data Fig. 8a shows how the LEPR ratio r changes for a single logical qubit as we increase the number of QEC rounds using a theory error model, showing a roughly 17% decrease in r from 4 rounds to 20. Similarly, Extended Data Fig. 8b plots the same quantity for a single logical qubit with an approximate experimental error model, showing an analogous 9% decrease in r from 4 rounds to 50. Furthermore, by interspersing 1 transversal gate every 1 QEC round under an approximate experimental error model, we find the ratio r changes by 2% at 25 QEC rounds. Similarly, previous work with a theory error model has shown that the threshold can change by about 10% with 1 gate per QEC round40. These simulations indicate that the benchmark studied in Fig. 2 is representative, but depending on context, it can be different on the scale of about 15% for deep circuits. We note, however, that in transversal architectures, the prevalence of logical gate teleportations (for example, in magic state distillation and angle synthesis) makes it such that there are typically only several stabilizer measurement rounds before transversal measurement.
Our benchmark results are comparable to those in ref. 7. For instance, although a one-to-one comparison is not direct because of the presence of loss information, using the supercheck metric shows a 9.04% mean detector error, comparable to the 8.5–8.7% mean detector error in ref. 7.
Error budget and path to 10× below threshold
To get to algorithmically relevant error rates of about 10−10 (refs. 101,102), a factor of 5–10× below threshold can achieve the required errors with several hundred qubits in a code block103. Our performance is captured by the error budget in Fig. 2f, which we now describe in further detail.
We first list our single-qubit errors and their possible improvements:
-
Local single-qubit gates have approximately 99.9% fidelity, arising from a 0.05% scattering error and residual miscalibrations. Increasing Raman detuning to 2.5 THz will further reduce scattering errors and miscalibrations from the Raman differential light shift, and improving calibration routines can thereby achieve 99.99% fidelity.
-
Our coherence time in 852-nm traps is approximately 1–2 s, depending on the dynamical decoupling sequence applied. Comparable systems have achieved coherence times of 12.6 s with further tweezer detunings57.
-
We experience a total loss from movement of roughly 1% on the ancilla atoms, arising from transfers and moves between and within the zones. We have previously observed performance in ref. 11 with transfer-limited loss that would correspond to 0.2% movement loss here, which we speculate arises in the present work from using too high an AOD radiofrequency power. Our repeated QEC sequence also experienced 0.6% background loss from vacuum that can be readily reduced to <0.01% using improved vacuum lifetime and a shorter sequence (for example, about 4 ms cycle times in Fig. 5b).
-
Our lattice readout is now operating with a loss rate of 0.3% and a 99.5% bit-flip error rate. Although a new technique, similar methods in purely lattice systems have achieved fidelities of 99.94% (ref. 16).
To improve two-qubit gate performance, an example approach can be :
-
Improve system stability, homogeneity and fast, automated calibration. Although we achieve CZ fidelities of 99.6%, drifts since the last calibration (several days in the context of the surface code benchmarking) often contribute 0.05–0.1% error during final data taking.
-
Use the smooth-amplitude gate, higher magnetic fields or the 6P1/2 intermediate state to suppress coupling to the adjacent mj = +1/2 state, reducing the error from about 0.06–0.15% to near-zero.
-
Increase both 420-nm and 1,013-nm Rydberg laser power by a factor of 4×. This can allow for simultaneous (numbers are from simulation, see ref. 36):
-
increase Rydberg detuning from 4.8 GHz in the present work to 9.6 GHz, reducing scattering error from 0.094% to 0.052%; and
-
decrease gate time from 270 ns to 135 ns, reducing Rydberg T1 error from 0.113% to 0.057% and reducing dephasing error from 0.134% to 0.034%.
-
These can reduce the two-qubit gate error from roughly 0.5% to 0.15% through simple system improvements. The AOM pulse profile should be compensated for realizing these gate times, and the fractional inhomogeneity of the 1,013-nm beam needs to be improved by the corresponding increase in power.
Altogether, by reducing single-qubit gate errors by a factor of 5× and improving two-qubit errors from 0.5% to 0.15% through the improvements listed, operation at roughly 8× below threshold would be achieved. The two-order-of-magnitude increase in cycle rate shown in Fig. 5b will be instrumental to enabling these improvements. These estimates highlight that straightforward improvements can lend the performance required for large-scale computation. We also emphasize that this performance needs to be tested and optimized in deep-circuit settings.
Processor clock speed
Future operation will eventually be affected by the speed of operations, once algorithms with, for example, trillions of operations need to be realized101,102. In the present work, we do not optimize for clock speed, and often choose slower speeds for our components so that they can function reliably without detailed characterization on existing infrastructure. However, we here report multiple measurements of our circuit durations.
In the repeated surface code experiments in Fig. 2, each QEC round was 4.45 ms. This originated from a 0.47 ms time between gates, and a total of 2.57 ms from moving the ancilla atoms to the storage zone and bringing in the next group to the entangling zone. In the transversal CNOT experiments in Fig. 3, we fix the overall circuit duration (independent of the number of CNOTs) at 17.7 ms, corresponding to the time of the longest circuit of 27 total transversal CNOTs. This corresponds to 0.655 ms per transversal CNOT on average. In the deep-circuit experiments in Fig. 6, our cycle rate was bottlenecked through the use of desktop computers for all data processing for the mid-circuit image analysis and rearrangement, and so we did not attempt to reduce any times. For this reason, each logical teleportation layer was 41.9 ms.
In the repeated Rabi calibration in Fig. 5b, we optimized for speed and achieved a cycle time of 4 ms. Although the imaging here was global as a demonstration of fast calibration, we expect that these speeds can also be achieved in a zoned manner. Destructive measurement can be faster than the qubit reuse approach used here, but the absence of loss information degrades QEC performance (Fig. 2) and further increases required qubit reloading rates for continuous operation. Although non-destructive readout is slower, this may or may not bottleneck operations depending on, for example, when the next non-Clifford operation occurs. Comparing these holistically in the context of a whole architecture is an important avenue of future research.
We thereby expect that, with optimization for speed in the deep-circuit context and various simple improvements, we should be able to achieve a logical teleportation cycle with a cycle time comparable to the 4-ms repeated Rabi calibration. We emphasize that in a planar architecture, this logical teleportation step involves multiple logical gates and can require several hundreds of QEC rounds for large-distance codes, for example, 200–300 stabilizer measurement rounds. As such, we estimate the present methods are slower by a factor of about 10–20 relative to a conventional planar architecture, associated, for example, with superconducting qubits, with 1 μs speed per stabilizer measurement cycle7,101,104.
Physical entropy removal
Types of entropy
QEC enables removing entropy from the physical qubits, and this entropy can take on many different forms. As discussed above, error correction such as stabilizer measurement, serves the role of converting generic quantum errors, such as coherent ones, into incoherent bit- and phase-flip errors. Detecting and tracking these errors further removes entropy from the system. Finally, physical systems such as atoms have entropy in other degrees of freedom such as loss, leakage or atom heating. We would like to design our QEC strategy to remove all of these entropy types.
Overview of entropy removal methods
Ancilla-based stabilizer extraction, as used in Figs. 1–3, is one form of entropy removal, in which stabilizer information is mapped onto the ancilla and then the ancilla is measured. Shor-style syndrome extraction operates by entangling ancillas into a GHZ state and extracting the stabilizer in a single step1, Steane-style syndrome extraction operates by creating an ancilla logical qubit and extracting stabilizers by a transversal CNOT105, and measurement-based quantum computing (MBQC)-style syndrome extraction operates by sequential entanglement with adjacent layers19,20,106. Leveraging teleportation native to the algorithm is another related method of entropy removal without ever ‘directly’ correcting the initial logical qubit block after it was used in computation. Although these methods all vary in their specific implementations, their core mechanism of entropy removal is similar. These methods can be used interchangeably depending on specific practical considerations, such as those discussed in the next section.
Use of teleportation for ensuring error removal
Logical teleportations are a method to ensure an architecture natively removes physical errors such as bit- and phase-flips, but also physical errors such as loss, leakage and heating19. In particular, by teleporting a logical qubit from one block to another, the logical information propagates but the physical errors—both Pauli-type and other complex errors—are all left behind. This method ensures errors of all types are removed. Owing to transversal gates leading to only O(1) QEC rounds per logic gate, algorithms can be composed of a high density of logical gate teleportations. This highlights that, as shown in Fig. 6, teleportation can perform logical operations while natively removing all these errors without additional overhead. We note that the same behaviour can be achieved in codes with transversal CNOTs with appropriate preparation of basis states or transversal Hadamards, such as in conventional MBQC with surface codes, and is not predicated on having a transversal CZ gate in the code.
An alternative method for removing physical errors is teleportation at the physical level—specifically, by swapping quantum information between a physical data qubit and a physical ancilla. This approach underlies various implementations of leakage reduction units9,20,107,108,109. However, it necessitates pairing each data qubit with a dedicated ancilla, which can present challenges. For example, in high-rate quantum LDPC codes encoding many logical qubits, this one-to-one pairing can become increasingly impractical. In general, the number of unpaired data qubits in each round of error correction is lower bounded by k = number of data qubits − number of independent checks, where k is the number of encoded qubits. For example, in hypergraph product codes constructed from (u, v)-biregular expanders—bipartite graphs in which checks have degree u and bits have degree v—the compact rearrangement scheme of ref. 110 implies that there will be O((v − u)d) unpaired data qubits per error-correction cycle, where d is the distance of the code. By contrast, logical-level teleportation is directly accessible in all CSS codes (as they all have a transversal CNOT), as demonstrated in the high-rate [[16, 6, 4]] code in Fig. 6. This analysis highlights that leveraging the transversal teleportations native to an algorithm lends to a robust, low-overhead procedure that ensures all physical errors are removed independent of the specific code.
Feedforward in universal processing
Once bit- or phase-flip errors have been detected, a natural question is if they need to be physically corrected in-hardware to return back to a configuration with all stabilizers equal to +1. For conventional computation based on transversal (or planar) Clifford gates, stabilizer measurements, and universality achieved by teleportation of |TL⟩ states (realized by physical Clifford gates), we do not have to apply these physical qubit corrections. This can be most directly seen by the fact that universal computation on the logical-qubit level is realized by physical Clifford gates18,103, and so the Pauli corrections can be deterministically tracked in-software as a Pauli frame update without additional overhead on the decoding.
When realizing transversal non-Cliffords, such as the transversal T gate in the [[15, 1, 3]] Reed–Muller code111, X Pauli corrections do not commute through, and so in such a case the stabilizers do need to be returned to a deterministic +1 eigenvalue. However, in the results here, for example, we realize deterministic initialization of the Reed–Muller code with +1 eigenvalues as a method of ensuring constant entropy operation, and in this case, mid-circuit correction of individual physical qubits is not required.
In both of these settings, feedforward is required, but only on the logical-qubit level (feedforward S for T teleportation, and feedforward X for H teleportation). We implemented this logical feedforward in figure 4 of ref. 11, in which feedforward logical S gates were realized to entangle two qubits that did not directly interact.
Transversal logic with O(1) stabilizer measurements per gate
As in the transversal setting, the role of stabilizer measurements is simply to remove entropy, we do not require the conventional d rounds of stabilizer measurement per logic gate, as shown in Fig. 3. We note that these techniques directly apply for universal computation. Concretely, universal computation is implemented by a transversal Clifford circuit, in which T gates are realized by a transversal teleportation circuit with |TL⟩ inputs that have already been prepared fault-tolerantly. It has been shown that this universal processing can proceed with O(1) stabilizer measurements per transversal gate and that the decoding can also be done efficiently with a decoding complexity that can be even less than the conventional lattice surgery setting12,40,43,112,113,114. As such, our experimental results directly apply to universal computation, under the assumption that the |TL⟩ inputs are prepared to high quality.
Methods of universality
Universality, transversal gates and Eastin–Knill
Universality means that any unitary can be closely approximated by using sequences of gates from a universal gate set24. An example universal gate set is {H, T, CNOT}. The 2D topological codes can have a discrete gate set of {H, S, CNOT}, but cannot transversally implement the T gate. 3D topological codes can have a transversal T gate115, and the [[15, 1, 3]] 3D colour code, in particular, has a transversal gate set of {CZ, CCZ, CNOT, T}. The Eastin–Knill theorem forbids having a unitary transversal gate set that is universal44. This is expected, as this would, for example, allow realizing a transversal logical θ rotation by a sequence of transversal operations on the underlying physical qubits, and thereby could not be protected, as it would be sensitive to small imperfections in the physical rotation.
The Eastin–Knill theorem is easily circumvented simply by the introduction of logical measurement, which breaks unitarity and enables universality. This is directly achieved with 3D codes, as realizing a CZ gate between state |ψL⟩ and |+L⟩, followed by logical measurement and feedforward, teleports a Hadamard gate directly onto |ψL⟩. As such, X-basis preparation and X-basis measurements (guaranteed in all CSS codes), combined with transversal CZ gates, can be used to straightforwardly implement a universal gate set of {H, T, CNOT} using fully transversal operations. This is the basis behind our implementation of universality in Fig. 4.
We note that in many protocols, universality is directly generated by the measurement of a 3D code. Code switching is an example, in which we switch between codes that have T and H transversal gates116. For example, we realize a code switching protocol in Extended Data Fig. 9e, in which we teleport a logical T from a 3D [[15, 1, 3]] colour code117 onto a 2D [[7, 1, 3]] colour code118. These operations between codes of different dimensionality can often be realized, and here it just involves entangling the 2D surface of the 3D pyramid with the 2D colour code face. Although teleportation onto the 2D colour code now admits transversal H gates, this is anyway already accomplished by the transversal measurement and feedforward from the 3D code.
Although these techniques are easily understood in the context of topological codes, they can also be used for qLDPC or general high-rate codes. In particular, as shown in Fig. 6f, the teleportation protocols are effective for high-rate codes, and in principle, teleportation-based small-angle synthesis could be also used here as well. At the same time, efficient, parallel generation of logical magic in these high-rate codes is an outstanding problem and an active area of theoretical research.
Connection to magic state distillation
We note that the protocols studied here are similar to those underlying magic state distillation111,119,120. In the conventional 15-to-1 magic state distillation, 16 surface code logical qubits are entangled in a manner in which the first surface code qubit is entangled with the logical qubit of a [[15, 1, 3]] code made out of surface codes. Subsequently, noisy T gates with some error p are realized on the surface codes by teleportation, which the outer [[15, 1, 3]] code distils into about p2 with correction or about p3 with postselection121. By measuring the Reed–Muller code, the resulting distilled \(| T\rangle \) state is teleported onto the first surface code.
The protocol in Fig. 4 is a more compact representation of the same magic state distillation circuit, but with replacing the inner surface codes with unencoded physical qubits. Whereas in conventional distillation the |T⟩ is teleported onto the surface code, we note that, for example, in small-angle synthesis with sequential HTHT… gates, we do not even need to do the step of teleporting onto the surface code—we can simply leave the |T⟩ encoded in the Reed–Muller code and then realize a transversal CZ gate between the two concatenated Reed–Muller codes.
Small-angle synthesis
Arbitrary logical unitaries can be approximated using a sequence of discrete gates, as stated by the Solovay–Kitaev theorem. Considering the single-qubit gate set {H, T, X} as implemented in Fig. 4, T gates are transversal in the [[15, 1, 3]] and Hadamard gates are implemented by teleportation. Without logical feedforward, this teleportation protocol, using N > 0 T gates, randomly synthesizes one of 2(N−1) possible rotations with equal probability. The remaining angles plotted in Fig. 4c are related by a final Clifford. Adding the appropriate feedforward at each teleportation step would render this protocol deterministic.
To quantify the fidelity of the produced logical states, here we calculate tr(ρ|ψ⟩⟨ψ|), where ρ is the logical density matrix obtained from full state tomography and |ψ⟩ is the target pure state. Averaged over all generated angles, we find fidelities of 98.98(7)%, 98.2(2)%, 98.9(5)%, and 93.7(1.3)% for N =0, 1, 2 and 3 T gates, respectively. The corresponding acceptance fractions are 29%, 28%, 5.2% and 0.54%.
These protocols are scalable to larger codes and can broadly be understood as the same protocol as 15-to-1 magic state distillation but with the inner code surface codes being of distance d = 1. To improve the performance of this protocol and further suppress errors, we anticipate a path of using concatenated surface codes for building each Reed–Muller code.
Physical resources for QEC
In this work, we study the relationships of many different physical resources and how they are used in QEC. We overview here some of our observations and discuss how these can be useful for developing future QEC protocols and architectures.
Logical entanglement and physical entanglement
In a transversal gate setting, logical entanglement can be generated using only physical entanglement between the code blocks. This is in contrast to lattice surgery, in which entanglement within the blocks is necessary to mediate interaction between non-overlapping logical operators, and so we need a robust entanglement. This is the origin of the sensitivity to measurement errors in the lattice surgery context and the insensitivity to measurement errors in the transversal gate context. Logical entanglement within the code block also plays an interesting part. The [[16, 6, 4]] codes, for example, contain many logical qubits within the block, which can be entangled, but only with a sufficient degree of physical entanglement present (discussion below).
Motivated by these observations, one way to re-frame efficient encodings is to find methods that generate the target logical entanglement with the minimum amount of physical entanglement. To this end, we first explore how the amount of logical entanglement—even generated with techniques such as permutation gates—is bounded by the amount of physical entanglement.
Operator entanglement quantifies the maximum entanglement a gate can produce on separable inputs. For any gate acting on k qubits, the operator entanglement is bounded above by ⌊k/2⌋. Thus, for a quantum code with parameters [[n, k, d]], the logical operator entanglement satisfies SLO ≤ ⌊k/2⌋. To characterize the physical entanglement entropy, note that logical operators cannot be supported on any set of d − 1 or fewer physical qubits. Therefore, for any region A with |A| ≤ d − 1, all logical codewords yield identical reduced density matrices on A (not necessarily maximally mixed). We consider these regions to remove state dependence. In stabilizer codes SPS(A) = |A| − rA, where rA counts stabilizers fully supported in A (ref. 122). We assume rA = 0 for all |A| ≤ d − 1 (no stabilizer fully inside any correctable region); this holds, for example, for every size—(d − 1) subregion in the [[16, 4, 4]] code. Then SPS(A) = |A|, so for |A| = d − 1, we have SLO ≤ SPS(A) whenever ⌊k/2⌋ ≤ d − 1 (that is, k < 2d − 1).
Thus, for a [[n, k, d]] code with k < 2d − 1 and rA = 0 for all |A| ≤ d − 1 (for example, [[16, 4, 4]]), the logical operator entanglement cannot exceed the physical entanglement available in any region of size d − 1.
These observations can have applicability to finding efficient algorithm compilations with high-rate codes and transversal operations, both of which we observe here can reduce the amount of physical entanglement to realize the target logical entanglement. For instance, each transversal CNOT in the [[16, 6, 4]] code generates 16 physical CNOTs worth of entanglement and 6 logical CNOTs worth of entanglement, but realizing in-block permutation CNOTs can generate an additional 4 × 2 logical CNOTs, totalling 14 logical CNOTs worth of entanglement, close to the bound of 16 physical CNOTs.
Physical entanglement and logical magic
Although physical entanglement is the underpinning of logical entanglement, it is also the underpinning of logical magic. In particular, we find here that states with logical magic require more in-block entanglement than states without any logical magic. This can be understood by the fact that, whereas logical Pauli states such as |+L⟩ are represented by operators XL = X1X2X3… (in CSS codes), which is a tensor product of physical operators, states such as s |TL⟩ are represented by \(\frac{1}{\sqrt{2}}({X}_{1}{X}_{2}{X}_{3}\ldots +{Y}_{1}{Y}_{2}{Y}_{3}\ldots )\), involving a macroscopic superposition of operators spanning the code that is necessarily entangled123,124. Analogously, any physical product state that has deterministic X-type stabilizers must have zero expectation value for YL. The need for well-defined stabilizers in both bases is thereby another way to see that the code must be entangled. Similarly, CSS codes are constructed from two classical codes1,2,23,125, and Pauli states are ‘classical’ in that they store 1 bit of information in one of the two classical codes (and 0 bits in the other), whereas |T⟩ states truly require both codes.
These observations suggest a potentially more fundamental mechanism of what algorithmic outputs do and do not need full protection. For example, consider making a remote entangled Bell pair. To probe its fidelity with XLXL and ZLZL entanglement witnesses, then with correlated decoding methods we do not need a high degree of entanglement within the individual code blocks—just between the blocks. However, if we would instead like to perform an error-corrected Bell inequality test46, to provide evidence that quantum mechanics is real, then we have to measure in the |T⟩ basis and require the full entanglement within the block. It has been argued that so-called quantum contextuality, which arises from measurements in non-Pauli bases, is the core aspect of quantum mechanics that cannot be described by classical theories126. Relatedly, theoretical work has shown a connection between contextuality and computational hardness51, and in this work, we find that both of these are also linked to the minimum amount of entanglement required to perform the requisite error correction. Understanding the essence of these connections may hint at further avenues to reduce resource requirements for protecting the relevant algorithmic outputs. An experimental error-corrected Bell inequality test is shown in Extended Data Fig. 9f.
Logical gate fidelity and physical entropy
With physical qubits, which are two-level systems, fidelity is a descriptive and accurate concept, as noise can often be decomposed into realizing either the correct operation or the exact opposite (for example, a bit-flip error). Conversely, logical qubits are many-level systems, and so this property does not hold. This fact is related to our observation in Fig. 3d that the error per logical operation is not constant as a function of the number of applied logical gates. Instead, there is a logical fidelity associated with the probability of decoding correctly103, which depends on the internal density of errors p.
Theoretically, the per-step logical error from decoding scales approximately as \({P}_{L}\propto {(p/{p}_{th})}^{(d+1)/2}\) (ref. 103). The results shown in Fig. 3d indicate that quantifying logical gate performance should encapsulate \(\{{F}_{L}({p}_{\det }),\Delta {p}_{\det }\}\), which capture how the logical fidelity FL depends on the internal density of errors p, as well as the increase of the gate to local error density Δp. We study this quantitatively in Extended Data Fig. 7.
Additional experiment and data analysis details
In Fig. 1d, we prepare either \(| {+}_{L}\rangle ={| +\rangle }^{\otimes 25}\) or \(| {0}_{L}\rangle ={| 0\rangle }^{\otimes 25}\) and apply up to five rounds of stabilizer measurement followed by measurement in the X or Z basis, respectively. A global Z(θ) rotation is applied to the data qubits at every gate layer (20 time steps in total). For fewer than five stabilizer measurement rounds, the relevant CZ gates are removed, but single-qubit rotations are still applied. One QEC round has gates in the first round only, 2 QEC rounds has gates in the first and fourth rounds, and 5 QEC rounds have gates in all five rounds. The data are averaged over both initial states and use MLE decoding with a 50% acceptance fraction for visual clarity, as well as pre-selection on perfect initial qubit filling. The right plot uses an injected error of θ/2π = 0.016, and additional error rates are shown in Extended Data Fig. 7b.
No postselection is used in the analysis of the surface code in Fig. 2. Pre-selection of initial qubit rearrangement (standard in the literature) is used. Data in Fig. 2b–d are averaged over |+L⟩ and |0L⟩, and the distributions in Fig. 2e,g aggregate the two bases. Figure 2b,c plots the detector error probability averaged over all rounds. Figure 2b uses the same data set with shots binned according to data qubit loss. The four metrics are (1) ‘bare’, where loss is converted to qubit state 0, effectively corresponding to no loss detection; (2) ‘detect loss’, where projective measurements whose value is ‘loss’ are not counted erroneously; (3) ‘supercheck’, where stabilizers with a lost data qubit are formed into superchecks for all prior rounds; and (4) ‘postselected’, where detectors involving any lost atoms are ignored. The plots show the mean error of all deterministic detectors (96 per basis). The supercheck error is calculated over all samples per round per basis, and the mean of these eight values is plotted. Superchecks paired to the boundary are removed from the averaging as these return no error by construction; if included, the mean error decreases from 9.0% to 8.8%. The contribution of each supercheck is normalized by the supercheck weight, for example, a weight-6 supercheck contributes an error of 4/6 (to account for the greater amount of information in the check—for example, multiplying checks even in the absence of loss raises the detector error without reducing the amount of information). Without reweighting, the error probability increases to 9.6%.
In Fig. 2d,e, the LEPR is calculated as \({\rm{LEPR}}=0.5(1-{(1-2{p}_{L})}^{1/r})\) where pL is the final logical error after r = 4 rounds, same as the definition in ref. 7. The d = 5 dataset contains 9,021 shots in the X basis and 5,834 in the Z basis. The d = 3 dataset contains 2,523 shots for X and 2,534 for Z (on average per quadrant). To make a d = 3 surface code in each of four possible quadrants, we only remove atoms and do not modify the circuit. The specific circuit for the repeated stabilizer measurement is shown in Extended Data Fig. 6c. Not shown are local Y(π) and Y(π/2) gates on the boundary ancillas (see Supplementary Information and Stim circuit). Additionally, local detunings127 are applied to the lowest row of gate sites (where there are only isolated ancilla qubits) to mitigate inhomogeneity in the 1,013-nm lightshift during entangling gates.
In Fig. 2f, the error budget shows the contributions to the detector error (with loss detection) and is obtained by removing error sources individually from the simulation error model. We obtain a similar error model breakdown by simulating the relative contribution to the logical error. Figure 2g shows the detector distribution with loss detection. Extended Data Fig. 6f compares the bare detector error with the simulation.
See Supplementary Information for the error model, including quantitative error budget and pseudocode for simulation, an animation showing the moves realized experimentally and an annotated version of the raw experimental command strings used to realize the circuit. See ref. 41 for all raw experimental shots, the analysis notebook and trained machine learning decoders.
All data in Fig. 3 use MLE decoding and are pre-selected on perfect initial qubit filling. In Fig. 3c,d, we prepare logical Bell states using either transversal gates or lattice surgery and measure the mean error in the resulting XX and ZZ parities. The error per logical operation is defined as \(\varepsilon =0.5(1-{(1-2{p}_{L})}^{1/3N})\) for N transversal gates per round and Bell state infidelity pL, and ε = pL for the lattice surgery logical product measurement. In Fig. 3c, the transversal CNOT is shown for three CNOTs per QEC round. For each experimental shot, the measurement error is applied in-software with probability P independent of every ancilla qubit (in which an error on a lost atom does nothing). The modified measurements are then decoded, with the ancilla measurement error in the MLE noise model increased appropriately. An acceptance fraction of 1 is used for the transversal CNOT plots unless otherwise stated. In Fig. 3d, the lattice surgery point uses error detection on the middle three ancillas, each having the same value in both rounds of stabilizer measurement, to compensate for having fewer than d rounds of repeated syndrome measurement for this result. We find in numerical simulations using our experimental error model that the optimal number of QEC rounds for this circuit is approximately 3 (as opposed to 5), and that by using error detection with two rounds, we recover a similar performance to this optimal value found in numerics (Extended Data Fig. 8d). Although transversal operations are sensitive only to space-like errors with correlated decoding, additional time-like errors in lattice surgery lead to the higher logical error and the sensitivity to ancilla measurement studied in Fig. 3c.
To modify the stabilizer signs in Fig. 4a, local π pulses are applied at the end of the encoding circuit. Negative stabilizers correspond to flipped qubits on the four corners of the Reed–Muller tetrahedron. We use a lookup table for decoding and plot all three 3D colour code curves with an acceptance fraction of 46% and the 2D colour code with 74%, corresponding to a rescaling by the number of physical qubits in the code. This slightly sharpens the plateaus, which are otherwise smoothened in the presence of logical Z errors. For postselection, the shots are ordered by the weight of the detected error. To highlight the key features, the curves are further normalized by the purity (Extended Data Fig. 9a, top) and maximum stabilizer expectation value (Extended Data Fig. 9a, bottom), with unnormalized data shown in Extended Data Fig. 9a. Figure 4c uses error detection and plots the angles for ≤N T gates. All plots are postselected on no loss and perfect initial qubit filling.
In Fig. 5c, we study the atom temperature and loss as a function of cycle using the circuit for state preparation of Steane codes (Fig. 6b) with only entangling gates removed. In the fifth cycle, we turn off all imaging and cooling light. For comparison, the same measurements are repeated with conventional 3D PGC imaging and cooling in place of the local techniques. To extract the atom temperature shown in Fig. 5c (top), we use a drop-recapture measurement after N cycles and fit the resulting loss to a Monte Carlo simulation80. Shaded regions indicate the range of fitted temperatures due to uncertainty in trap parameters.
The [[7, 1, 3]] and [[16, 6, 4]] codes in Fig. 6, as well as the [[15, 1, 3]] in Fig. 4, are members of the family of quantum Reed–Muller codes based on the hypercube encoding circuit shown in Extended Data Fig. 10a (see also Supplementary Videos; ref. 128). For each code, a different pattern of local Y(π/2) pulses is applied, whereas the entangling gate structure is the same; for the 2D [[7, 1, 3]] code, the fourth layer of gates is turned off. Although the encoding circuit alone is not fault-tolerant, a verification protocol128 or ancilla flag qubits11 can be directly added for future experiments. In Fig. 6b, groups of 16 independent [[7, 1, 3]] codes are prepared in parallel in each time layer, repeated for 27 layers. The stabilizer error probability as a function of layer is plotted for no loss in the code block.
To characterize the propagation of physical and logical information in deep circuits, we further entangle the codes into 1D and 2D cluster states129. Starting with two groups of logical qubits, group A and group B in Fig. 6a, these are entangled to form the first two time layers of the cluster state. Owing to the local entanglement structure of a cluster state, group A undergoes no more entangling gates and is idle until its measurement (in the appropriate basis), and thereby the measurement can be performed and the same physical qubits reused to form the third layer of the cluster state (in typical MBQC fashion). Group B can then be measured and reused to form the fourth layer of the cluster state, and so on. This alternating structure is typical in MBQC using cluster states19.
The physical correlations in Fig. 6c,d,g are calculated as the covariance between errors (stabilizer = −1) on the same stabilizer between codes at different coordinates in the cluster state. The covariance is then averaged across all co-propagating cluster states and the different stabilizers (three for [[7, 1, 3]] and five for [[16, 6, 4]]). The logical correlations in Fig. 6c,g are calculated as the appropriate product of cluster state stabilizers between the two target coordinates (cluster states have stabilizers corresponding to XiΠjZj where i is a specific site and j is its neighbour). For example, we define ⟨Z0Z4⟩ ≡ ⟨(Z0X1Z2) ⋅ (Z2X3Z4)⟩. The single-qubit expectation values ⟨Zi⟩ are calculated using a lookup table decoder for [[7, 1, 3]] and raw values for [[16, 6, 4]]. Owing to the underlying assumption of time and space invariance, that is, that correlations depend only on relative coordinates, we truncate time layers in which the reservoir begins to be depleted, and this assumption breaks down. This corresponds to 16 layers for Fig. 6c, 13 layers for Fig. 6d (correlations plot only) and 12 layers for Fig. 6g. This has only a small effect on the measured logical correlations but otherwise leads to a longer tail of physical correlations because atoms are not properly refilled once the reservoir begins depleting.
All logical operators in Fig. 6 are decoded with machine learning, which directly predicts the cluster state stabilizers. The 2D cluster state stabilizers in Fig. 6d use a global acceptance fraction of 0.24%. In Fig. 6c,g, we instead use a global confidence threshold for each curve, which is then converted to a mean acceptance fraction. In this way, each curve corresponds to a constant effective error rate (equivalently, constant effective entropy) for the logical operator independent of its weight, resulting in a reduced acceptance fraction for higher-weight operators. By contrast, fixed-acceptance postselection would bias higher-weight operators to a higher entropy compared with lower-weight operators. The confidence for products of the weight-3 logical stabilizers is given as the geometric mean of the constituent confidences. Figure 6g uses a mean acceptance fraction of 3.4% (same data for both curves). The 2D [[16, 6, 4]] cluster state in Fig. 6i also uses the confidence-based postselection, in which the confidence per cluster state stabilizer is the geometric mean of the six decoded co-propagating 2D cluster states. On top of this decoding postselection, the logical stabilizer expectation value is shown as a function of the minimum number of co-propagating operators, N, with the same measurement outcome. We take the mean of all combinations of choosing N out of 6 such operators.
The permutation CNOT in Fig. 6g is applied in software, and its effect here is to increase the weight of the operator connecting coordinates ti and tj, labelled as an effective separation i − j. Following the definitions in ref. 25, two permutation CNOTs (swapping a pair of rows and a pair of columns) convert the cluster state stabilizers supported on logical qubits 3–6 from four weight-3 to one weight-3, two weight-6 and one weight-12 operator.
See Supplementary Information for an annotated version of the raw experimental command strings used to realize the circuit.