Abstract
Memristive in-memory computing has demonstrated potential for solving matrix equations in scientific computing. However, the inherent inaccuracies of analog mechanisms create challenges in achieving high-precision solutions while maintaining low-energy consumption. This study introduces a memristive matrix equation solver that considerably accelerates solutions by performing mathematical iterations directly within an analog domain. Our approach facilitates rapid approximate solutions with a scalable circuit topology and expedites the high-precision refinement process by substantially reducing the digital-to-analog conversion overhead. We experimentally validated this methodology using a heterogeneous computing system. We performed simulations of multiple scientific problems on these circuits, including solving the diffusion equation and modeling equilibration in silicon P-N junctions. Notably, our memristive solver, combined with digital refinement, achieved software-equivalent precision (with an error of 10−12). Compared to conventional digital processing units, this approach offered a 128-fold improvement in solution speed and a 160-fold reduction in energy consumption. This work establishes a foundation for future scientific computing using imprecise analog devices.
INTRODUCTION
Modern scientific and engineering simulations span a wide range of fields, from microscale phenomena to macroscale systems, including density functional theory, technology computer-aided design, and computer-aided engineering. These simulations inherently involve solving partial differential equations (PDEs) (1, 2), which are typically converted into matrix equations using mathematical discretization methods (Fig. 1A) (3). Consequently, solving matrix equations efficiently is central to contemporary computational modeling techniques. In high-precision simulations, modern computing systems face increasing demands for solving complex, high-dimensional matrix equations. However, mainstream digital computers are constrained by the high polynomial-time complexity of digital solution methods, such as iterative algorithms [typically O(N3), where N represents the problem size]. In addition to the von Neumann bottleneck—stemming from the separation architecture of memory and processing units further exacerbates processing time and energy consumption (Fig. 1B) (4–6).

Fig. 1. Concepts of the memristor-based fully analog iteration.
(A) Computing simulations from the macro to micro scale—such as computer-aided engineering, technology computer-aided design, and density functional theory—are commonly realized by solving matrix equations. (B) Architecture of the von Neumann computer and processing workflow of the MVM. (C) Analog MVM using memristor array. (D) Comparison of the processing time complexity of the digital computer and the memristor array. (E) Workflow of the previous hybrid iteration-solving concept; while capable of achieving high-solution precision, it suffers due to enormous data conversion consumption and is constrained by the convergence speed of digital iteration. (F) Workflow of the previous analog inverse-solving concept; while enabling high-efficiency processing with low time complexity, it faces challenges due to high-precision multilevel programming requirements and restricted scalability. (G) Workflow of the proposed fully analog iteration-solving concept. This concept overcomes the limitations of existing memristive matrix equation solvers and can substantially accelerate the solution of matrix equations while maintaining remarkable energy efficiency.
Recently, memristive in-memory analog computing has emerged as a promising solution for these challenges (7). Leveraging Ohm’s and Kirchhoff’s laws, memristor-based crossbar arrays facilitate highly parallel analog matrix-vector multiplication (MVM) (Fig. 1C) (8, 9). This O(1)-complex analog computing mechanism substantially outperforms traditional digital computing platforms, such as central processing units (CPUs) and graphical processing units, in terms of data-intensive matrix processing efficiency (Fig. 1D). Memristor technology has demonstrated its effectiveness in diverse applications, including signal and image processing, deep neural networks, and data mining (9–17).
Although memristor arrays hold the potential for accelerating the solutions of matrix equations (18–20), they encounter unique challenges compared to the other applications where analog computing concepts achieve high efficiency without severe accuracy degradation. Solving matrix equations requires balancing energy efficiency, time complexity, and solution precision. This trade-off arises primarily from the intrinsic imprecision of analog MVM concepts (21). A promising strategy to mitigate analog computing errors involves a memristive hybrid iteration approach (Fig. 1E) (18, 19, 22). This approach uses memristor arrays to accelerate imprecise MVM processing, combined with high-precision mathematical iterations using peripheral digital processors. While recent hybrid approach enhances the energy efficiency of analog MVM operations through improved circuit structures and programming methods, it still relies heavily on frequent analog-to-digital conversion, leading to additional energy overhead (23). Furthermore, the processing speed remains constrained by the digital iteration procedure, limiting the complete utilization of the advantages offered by analog computing. In addition to this architecture, the solution of matrix equations can also be accelerated entirely in the analog domain using traditional analog matrix inversion techniques (24). Recent studies have demonstrated that this approach can achieve O(1) solution time complexity (20, 25–27). However, these analog computing implementations require precise multilevel conductance programming of memristors, which poses scalability challenges and limits solution precision (Fig. 1F). Despite these drawbacks, fully analog processing still offers a pathway for accelerating matrix equation solutions.
To address these challenges, we developed a concept integrating hybrid iteration approaches into the analog domain to solve matrix equations (Fig. 1D). This approach offers the following advantages: (i) The proposed analog computing approach achieves accelerated solutions with high efficiency and O(1) time complexity. In addition, it emphasizes hardware scalability and enables fully analog solutions using binary memristors, eliminating the need for stable multilevel devices. (ii) This approach supports integration with digital refinement procedures, creating a rapid, high-precision solver with sufficiently reduced analog-to-digital conversion overhead. We experimentally validated analog computing concept using fabricated W/AlOx/Al2O3/Pt memristor arrays. The analog iteration method provided approximate solutions of the matrix equation. Furthermore, the integration of heterogeneous computing system demonstrated the effectiveness of analog iterations with digital refinement (AIDR) for achieving high-precision results. Our method was applied to solve multiple scientific problems, including modeling diffusion phenomena and equilibrium in semiconductor P-N junctions. The AIDR solver achieved software-equivalent solution precision (solution errors below 10−12) and delivered remarkable improvements in solution speed (approximately 41× to 128×) and energy efficiency (approximately 82× to 160×) compared to state-of-the-art digital processors. We believe that the superior performance of the AIDR indicates its potential as a promising memristor-based computing platform for the efficient acceleration of scientific computing.
RESULTS
Experimental implementation of fully analog iteration with memristor arrays
Our concept stems from applying a mathematical iteration formula to solve the matrix equation (where A is the known coefficient matrix, b is the known target vector, and x is the unknown solution vector) within the fully analog domain (25). Specifically, the mathematical iteration involves finding a matrix B and a vector f such that I-B is invertible (I being the identity matrix) and constructing a convergent sequence of the approximate solution , which converges to the precise solution , where
For example, in the Jacobi method, the matrix equation is converted into Eq. 1 using and , where (refer to Materials and Methods) (28). Notably, both matrices B and f remain fixed during iterations, and the convergence condition requires the spectral radius of B to be less than one [].
Typically, numerical iteration methods may require thousands of iterations for convergence, primarily when the matrix scale or condition number is high. In contrast, as illustrated in Fig. 2A, the proposed circuit accelerates (Eq. 1) in the analog domain using a single-step operation, substantially reducing time complexity. On the basis of the structures in Eq. 1, the analog iteration hardware comprises of two components: the memristor cores used to calculate the product B·xk and the peripheral analog circuits used to calculate the accumulation with vector f and feedback the result for subsequent iterations. To minimize the need for high-precision multilevel conductance programming of the device, we introduced a binary shift-and-add operation in the analog domain. Consequently, matrix B was sliced into multiple bits, with each bit mapped to an individual memristor core. The analog shift-and-add (aSAA) operation accumulates partial products and delivers . Because memristor conductance states can only map positive values, we used two arrays (connected through analog inverters) as the memristor core to map the signed matrix values (see Materials and Methods).

Fig. 2. Experimental demonstration of the fully analog iteration concept.
(A) Schematic design of the memristive fully analog iteration circuit for matrix equations solution. The matrix equation is converted to the iterative solution format, and the analog iteration is performed by mapping matrix B to the arrays and using vector f as the circuit excitation. The inset shows the detailed design of the aSAA circuit, where 2·R1 = R2. (B) Image of the constructed analog iteration hardware with eight memristor arrays. (C) Cross-sectional TEM image of the W/AlOx/Al2O3/Pt-stacked memristor. (D) Robust retention characteristics of the memristor (plotted over 3 × 104 s) can be achieved at 125°C. (E) Device-to-device variation is approximately equal to 1% for both LRS and HRS among 64 devices in the eight arrays achieved by the coarse-to-fine write-with-verify strategy. (F) Experimental demonstration of the analog iteration. The analog iteration outputs converge within approximately 30 μs, on average, over 10 tests. (G) Comparison between the measured analog iteration outputs and the analytical solution. The average vector residual error is evaluated to be 1.2 × 10−3 over 10 tests. (H) Measured output voltages are a function of the parameter , which controls the input vector f by . The experimental results agree with the analytical solutions. (I) Experimental solution of the matrix inverse A−1 using the analog iteration hardware. The matrix product approaches the identity matrix (inset), and the evaluated square absolute solution error is 0.013.
The fully analog iteration concept was demonstrated using our custom memristor hardware platform, as shown in Fig. 2B. This platform consists of eight fabricated 4 × 2 memristor arrays, each featuring two independent bottom electrode lines to prevent sneak-path currents. Figure 2C shows the transmission electron microscope (TEM) image of one W/AlOx/Al2O3/Pt-stacked memristor in the arrays. The AlOx layer served as a source of oxygen vacancies, while the stoichiometric Al2O3 layer acts as an oxygen vacancy receiver (fig. S1) (29). The memristors exhibit remarkable endurance (1010), high speed (25 ns), and low-energy consumption (11.99 fJ for set operations, 49.40 pJ for reset operations), as shown in fig. S2. Furthermore, conductance stability, high-precision conductance programming, and linear readout regions are crucial for analog MVM methods. Figure 2D presents robust retention of the low-resistance state (LRS) and high-resistance state (HRS) observed for more than 3 × 104 s at 125°C. High device-to-device uniformity (among the 64 devices in the eight arrays) was obtained using a high-precision coarse-to-fine write-with-verify strategy (fig. S3, A and B, which provide details on the conductance programming methods and high-precision binary-state programming). The device-to-device variation was 1.018% for LRS and 1.013% for HRS, as shown in Fig. 2E. Analog conductance programming, as depicted in figs. S3C and S4C, proved to be less programming efficiency and carried the risk of misreading due to the nonlinearity of the readout. Furthermore, our device supports a linear readout region spanning approximately 0 to 1 V, which indicates a broad dynamic voltage range for analog iteration computing (fig. S4, A and B).
Figure 2F presents an example of solving a 4 × 4 matrix equation using hardware and the Jacobi iteration method (test bench details are provided in table S1). The analog iteration procedure converges within approximately 30 μs (average latency based on 10 tests). The average residual solution error for the analog iteration solution is approximately 1.2 × 10−3 when compared to the analytical solution (Fig. 2G). This result confirms that the hardware can provide an accurate approximate solution. To further analyze the solution, the input voltage vector f was linearly varied according to , where β was changed uniformly in the range of −1 to 1. Figure 2H compares the measured output voltages to the analytical solution. The average relative error remains below 0.3% with , demonstrating the effectiveness of analog iterations in achieving accurate solutions under large-input voltage conditions. Notably, the matrix equations were solved physically using the hardware in a single step.
The concept of a fully analog iteration was extended to compute the inversion of matrix A, satisfying . The ith column of was obtained by setting the ith column of I as the target vector b in the equation Ax = b. Consequently, matrix inversion was processed in four steps, whereas classical digital methods exhibit O(N3) time complexity. Figure 2I displays the experimentally measured product , which approximates I, where M represents the measured analog iteration solution. The average squared solution error of the experimental results across 10 cycles was calculated using . The square absolute error was 0.013 compared to the analytical solution, verifying the ability to solve matrix inverse in approximation.
Because the analog iteration circuit functions as a feedback circuit, analysis reveals that satisfying the numerical convergence conditions [] is imperative for stable operation in the analog domain (text S2). This implies that the analog iteration concept ensures absolute stability of the feedback circuit, allowing the convergence of Eq. 1. Following this guideline, additional benchmarks show reduced convergence speed when approaches one (fig. S6). Moreover, the condition number k of matrix A substantially affects the stability of solutions to Ax = b (30, 31). We also evaluated the effects of k on the solution precision of the analog iteration by using various A values under an increasing number of conditions. The results indicate that solution error increases as a function of the condition number of the matrices, which is critical for practical applications (fig. S7).
Framework and experimental demonstration of the AIDR solver
In the aforementioned fully analog iteration demonstration, the solution error was limited to the order of 10−3, and solution precision was influenced by matrix condition numbers. However, for practical applications, the solution error must be less than 10−7, comparable to the numerical precision of single-precision floating-point processing (32). To address this, we introduced the AIDR solver, as shown in Fig. 3A (the internal connection of the hardware is illustrated in fig. S8). This framework achieves high-precision solutions using the Richardson refinement algorithm (refer to Materials and Methods) (18, 33). This algorithm computes the error correction term by solving Ad = r within the analog iteration process, where the residual r = b – Ax is calculated using a digital processor. Iterations continue until the norm of the residual r falls below a predefined tolerance, tol (Fig. 3B).

Fig. 3. Experimental demonstration of the AIDR solver.
(A) Image of the constructed AIDR solver. (B) Operational flowchart inside the AIDR solver for a high-precision solution. (C) Measured hardware iteration procedure for solving a matrix equation. The residual error was eliminated to be less than 10−12. (D) Measured iteration procedure for high-precision solution of the matrix inverse. The solution error is less than 10−12 for each input. (E) Absolute error of the hardware measure matrix inverse compared to the analytical results. The average solution error was calculated to be 1.07 × 10−11.
As r converges throughout the iteration process, a voltage signal with finite resolution (e.g., 0.01) becomes insufficient to represent residuals smaller than this value. Consequently, the precision of the solution generated by the AIDR solver is constrained by the voltage resolution for the analog iteration (refer to fig. S9). To overcome this constraint, we introduce a scaling method for the Richardson refinement approach. This scaling involves adjusting vector f to attain the desired voltage-resolution range Vmax using fs = s/f, where s is determined by . Vector fs was used in the analog domain to derive an approximate solution ds, and the error correction term d was calculated as d = ds/s (refer to Materials and Methods). Figure 3C illustrates the high-precision capabilities of the scaling method. Solution error converged to values below 10−12 within 10 cycles of digital refinement, meeting practical application requirements.
The efficacy of the AIDR solver was further validated by solving the inverse matrix using the test bench shown in Fig. 2I. For each input column of the identity matrix, the residual error converged to values below 10−12 within eight refinement cycles (Fig. 3D). Figure 3E compares the absolute error of the measured inverse M to the analytical error. The average absolute error was approximately 1.5 × 10−12, and the norm of the absolute error was ~1.1 × 10−11, confirming the high precision of the matrix inverse solution.
Solving diffusion equation with a simulated AIDR solver
To evaluate the overall performance of the AIDR system in practical scientific computing problems, we used the one-dimensional (1D) diffusion equation, a fundamental PDE commonly found in semiconductor physics, as a benchmark (34). The diffusion equation is expressed as follows
where D = 1 is the diffusion coefficient (), and the Dirichlet boundary condition is specified as . Figure 4A illustrates the solution to the diffusion equation. This time-evolving PDE was converted into a matrix equation format using implicit finite difference grids (refer to text S4) (35, 36). Subsequently, the AIDR solver was used to solve the matrix equations, providing solutions at each time point.

Fig. 4. Simulated solution of the diffusion equation.
(A) Schematic of solving the diffusion equation using the AIDR solver. (B) Simulated analog iteration outputs during the solution procedure for the 128-point 1D grid. The analog iteration procedure converged within approximately 1 μs. (C) Obtained digital refinement procedure at the time points of 0, 5, 15, and 19.9 s. The solution residual error drops below 10−12 with 12 cycles. (D) Simulated AIDR outputs for initial condition (at t = 0 s) and for t = 5, 15, and 19.9 s. (E) 3D reconstruction of the solution, showing the changes in substance concentration over time. (F) Comparison bar charts of the fully analog iteration and AIDR with CPU regarding processing speed and energy consumption in the cases of approximate and high-precision solutions.
The obtained matrix equations contain a sparse coefficient matrix A, which yields a sparse iteration matrix B for the analog iteration. Directly mapping the entire matrix B onto the crossbar array can lead to unpredictable hardware overhead and compromise the system’s overall energy efficiency (37). To enhance the performance of the analog iteration procedure for large-scale sparse problems, we incorporated a matrix slice processing method into our analog iteration approach, as shown in fig. S10 (19). Because the fully analog iteration circuit uses an uncoupled analog multiplication and accumulation concept, multiple MVM units can precisely map nonzero subblocks of sparse matrices, thereby accelerating the solution.
Figure 4B depicts the simulated analog iteration procedure for a 128-point 1D grid, indicating convergence in approximately 1 μs. This result illustrates the rapid convergence capabilities of the analog iteration concept in diffusion modeling. Concurrently, we assessed the convergence speeds with various grid point densities, revealing that the analog iteration speeds remained consistent across various scales in the diffusion modeling tasks (fig. S11). Further analysis indicated that iterative matrix B, corresponding to diffusion modeling at different grid points, resulted in slight variations in the spectral radius . As observed in the tests, the convergence speed of the analog iteration process is related to the value of . Thus, by excluding the effect of the matrix spectral radius through the diffusion modeling, we evaluated the time complexity of our analog iteration concept as approximately O(1), significantly lower than the time complexity of traditional digital computing, O(N3).
Using the AIDR architecture, the initial approximate analog solution was further refined to obtain a high-precision solution. Figure 4C illustrates the simulated digital refinement process to achieve a high-precision solution for diffusion modeling tasks. Across different time points, the residual norm reached a preset tol of 10−12 within 12 cycles. The simulated solution vectors for the initial conditions (at t = 0) and at t = 5, 15, and 19.9 s are shown in Fig. 4D. In addition, Fig. 4E presents a three-dimensional reconstruction of the solution vector at different time points, providing a snapshot of the changes in substance concentration over time. These results validated the system’s ability to execute high-precision scientific computing tasks, particularly those involving time-evolving phenomena.
In the diffusion modeling task, we systematically assessed the overall system time complexity of AIDR across various grid point densities. Using different linear and sublinear models (as shown in fig. S13) to establish a correlation between the refinement iterations (preset tol = 10−12) and problem scale, we identified a sublinear time complexity of O(Nb) for the AIDR system, with b approximating 0.38. This represents substantial acceleration compared to digital computers (38). Following the assessment of time complexity, the benchmark in Fig. 4F reveals that the fully analog iteration offers an approximate reduction in solution latency by a factor of 380× and an approximate reduction in energy consumption by a factor of 2.5 × 104 times compared with the state-of-the-art CPU. Furthermore, the overall AIDR solver shows a 128× improvement in processing speed and achieves energy consumption levels that are approximately 160 times lower (refer to text S4 for additional evaluation details).
Modeling of carriers’ equilibration in P-N junctions with a simulated AIDR solver
In the previously discussed diffusion modeling, the spectral radius of the corresponding iterative matrix B plays a pivotal role in facilitating the rapid convergence of the analog iteration. However, in more complex real-world tasks, the corresponding may approach one, where a larger spectral radius can substantially impede the processing speed and increase energy consumption. To further evaluate the performance of the AIDR system in handling these tasks, we modeled the potential profile of an equilibrium silicon P-N junction by solving Poisson’s equations
(3)
where p and n denote the hole and electron densities, respectively, and Nd and Na denote the ionized donor and acceptor densities, respectively. The doping concentration is (39). The solution framework for this static 1D ordinary differential equation is depicted in Fig. 5A, which outlines the iterative procedure used to establish the built-in electric field (40). In this framework, the AIDR system is used to solve the matrix equations derived from each iteration using the finite difference method (refer to text S5 for detailed insights into the solution algorithm).

Fig. 5. Simulated solution of carriers’ equilibration in the P-N junction.
(A) Schematic of modeling the equilibrium silicon P–N junction using the AIDR solver. (B) Simulated analog iteration outputs during the solution procedure for the 64-point 1D grid. The hardware convergence latency is less than 40 μs. (C) Digital refinement procedure for high-precision solution during the modeling. In this problem, nearly 300 iterations are required to reach a preset tolerance of 10−12 under the worst convergence condition. (D) Establishment of the built-in electric field in the equilibrium P-N junction during the modeling procedure. (E) Simulated built-in field of the equilibrium P-N junction. (F) Comparison of the fully analog iteration and AIDR with CPU in processing speed and energy consumption for approximate and high-precision solutions, respectively.
Figure 5B illustrates the analog iteration procedure for the 64-point 1D grid in P-N junction modeling, indicating a solution latency of approximately 34 μs according to SPICE simulations. Subsequently, the solution error decreased below the preset within a 300-step digital refinement procedure based on the approximate analog iteration (Fig. 5C). Notably, in this test bench, the spectral radius of iteration matrix B gradually approached one as the problem size increased. Consequently, the convergence speed of AIDR exhibited exponential degradation as the problem size increased, resulting in an overall solution latency of approximately 1 ms for the 64-point grid case (fig. S15). The establishment of the built-in electric field as a function of the number of iterations is shown in Fig. 5D, requiring approximately 822 rounds to achieve a precise solution. Figure 5E shows the simulated built-in electric field, which accurately aligns with the analytical results. This simulation highlights the capability of the AIDR system to handle high-precision scientific computing tasks, even under challenging convergence conditions.
In addition, we measured the solution latency and energy consumption by using a digital Jacobi iteration for equilibrium P-N junction modeling as a comparative method (fig. S16). The benchmark results in Fig. 5F demonstrate that fully analog iterations provide an approximate 1654× increase in solution speed and a 2.15 × 105 reduction in energy consumption compared with the CPU when delivering approximate solutions. Furthermore, the AIDR solver exhibited an approximate 41× speedup compared with the CPU for high-precision solutions and reduced energy consumption by 82× (refer to text S5 for further details). Given the performance of the AIDR system in this fundamental problem, the AIDR solver shows excellent promise for efficiently processing advanced technology computer-aided design tasks, particularly in the simulations of emerging 2D material devices.
DISCUSSION
Our fully analog iteration concept has been validated and shown to substantially outperform digital processors in processing speed and energy consumption while maintaining comparable precision. To further assess the effectiveness of our approach, we benchmarked it against state-of-the-art in-memory matrix equation solvers using equilibrium P-N junction modeling tasks.
The AIDR solver was initially compared to the hybrid in-memory iteration solver outlined in (19) for high-precision solutions. This high-precision solver employs a memristor array for fast matrix multiplication, while a digital processor handles solution updates. To achieve a high-precision solution, this solver incorporates an inner precision improvement approach known as the bit-slice concept (fig. S17A) (41). Similar to our analog iteration concept, this mapping approach employs multiple devices representing a given bit. However, in this hybrid iteration solver, the shift-and-add operation for a partial product relies on digital circuits rather than analog ones (42). For equilibrium P-N junction modeling, we implemented the Jacobi iteration algorithm using the matrix slicing concept (fig. S10B). This process required 20,755 iterations to reach a preset tolerance of 10−12 for a 64 × 64 problem (fig. S17B). Additional benchmarks revealed that this high-precision solver requires 1.67 × 10−2 s to obtain a high-precision solution, with an energy consumption of approximately 0.26 J (refer to text S6 for detailed evaluations). Consequently, our AIDR solver provided a 1.7× improvement in solution speed and exhibited a 2× reduction in energy consumption (table S4).
Subsequently, we compared our fully analog iteration approach with the previously reported analog inverse solver detailed in (20) for a fast approximation. The analog inverse solver was used for the fast calculation of (fig. S18A), demonstrating a convergence speed of approximately 100 μs (fig. S18B) for the 64 × 64 equilibrium P-N junction modeling problem. Its energy consumption was approximately 13 μJ (detailed calculations are provided in text S6). These findings indicate that our fully analog iteration concept achieves an ~2.94× higher solution speed with ~2× lower energy consumption than the analog inverse solver for approximate solutions (table S4).
Note that while binary conductance states were used in this study to enhance computing precision, we fully understand that analog conductance states can offer superior computational efficiency. Recent research indicates analog conductance states can achieve arbitrary computing precision (23). We anticipate integrating this technology with our circuit topology to develop highly precise, highly efficient, and fully analog computing hardware for practical scientific computing workloads. This endeavor will require further optimization of device performance and experimental validation.
In summary, we proposed and experimentally demonstrated a memristive, fully analog iteration concept for solving matrix equations. This concept exhibits remarkable improvements in processing speed, energy efficiency, and scalability performance for real-world applications. By leveraging the fast-processing capability of an analog iteration circuit combined with digital refinement, the AIDR approach efficiently achieved approximate solutions in a single-step operation, substantially expediting the process of obtaining high-precision solutions. In both the diffusion and equilibrium silicon P-N junction modeling tasks, the AIDR solver achieved software-equivalent solution precision compared to state-of-the-art CPUs. Moreover, it demonstrated impressive improvements of 128× and 41× in solution speed, accompanied by reductions of 160× and 81× in energy consumption, respectively. This study underscores the potential of the proposed approach to provide a high-speed, energy-efficient platform that surpasses existing in-memory computing approaches for solving hyperdimensional matrix equations. We believe this study expands the realm of applications for memristor technologies and paves the way for the future acceleration in fundamental scientific computing and engineering research using imprecise analog devices.
MATERIALS AND METHODS
Fabrication of the memristor array and measurements
The bottom electrodes of the memristor arrays were patterned using ultraviolet lithography. This process involved the deposition of a 10-nm Ti adhesion layer and a 100-nm Pt layer via direct current (dc) magnetron sputtering, followed by a lift-off process using acetone and alcohol. Next, a 100-nm SiO2 isolation layer was grown by plasma-enhanced chemical vapor deposition at 300°C. Electron-beam lithography was then used to form holes with a diameter of 500 nm on the bottom electrode. The SiO2 isolation layer was selectively removed by inductively coupled plasma etching with CF4 and O2. Subsequently, a 2.8-nm Al2O3 layer and a 7.1-nm AlOx layer were deposited using atomic layer deposition and radio frequency (rf) magnetron sputtering, respectively. Following this, 50-nm W was deposited as the top electrode via DC magnetron sputtering. The bilayer resistive switching layers outside the crossbar region were selectively etched using BCl3 and Cl2 to suppress line-to-line leakage and expose the bottom electrode pads. Last, a 50-nm-thick Pt layer was deposited on the top electrode pads for wire bonding. The cross section of the crossbar array was prepared using a focused ion beam (FEI Helios 450S), and the TEM results were obtained using an FEI Titan Themis 200. Energy-dispersive spectroscopy (EDS) was performed using a Bruker Super-X EDS system. Scanning electron microscopy analysis was conducted using a Zeiss Gemini-300 instrument.
Programming the memristor array
The memristor array was programmed using a two-step coarse-to-fine write-with-verification approach to reduce the device variability. The coarse step expedited the convergence of the device resistance toward the target value, whereas the fine step minimized residual errors. Pulse signals for programming were generated using a Keysight B1500A semiconductor parameter analyzer. These voltage pulses were delivered to the packaged memristor array through a custom-built printed circuit board (PCB), which consisted of a dual in-line package switch for device selection and an rf interference connection with the B1500A (shown in fig. S3). The electrical performance of the devices was measured using this platform.
Iterative mathematical method for the solution of matrix equations
The concept of the mathematical iterative method for solving the matrix equation Ax = b involves finding matrix B and vector f such that I-B is invertible and the unique solution x of Ax = b equals the unique solution x of . Starting from any vector , we compute the sequence as follows
where the convergence condition is , and is the spectral radius of matrix B, defined as
(5)
Given a linear system Ax = b with invertible A, we can write A in the following form
where M is an invertible matrix (typically diagonal or triangular, making the inverse of M easy to compute). Therefore, the equation Ax = b is equivalent to
According to Eq. 7, matrix B and vector f in Eq. 4 were determined as and , respectively. To describe various choices of M and N, it is convenient to express A in terms of the three submatrices: diagonal matrix D, lower triangular matrix L, and upper triangular matrix U, such that
Here,
(10)
For typical iterative methods, the matrices M and N were calculated as follows:
1) Jacobi method: .
2) Gauss-Seidel method: .
3) Successive overrelaxation method: .
In this study, we applied the Jacobi iterative method within the analog iteration process. The matrices B and f mapped to the analog domain were calculated accordingly as . On the basis of this relationship, matrix B can be expressed as
(11)
As described in Eq. 9, D is a diagonal matrix, and determining its inverse requires minimal computational effort. Therefore, the inversion process has a negligible impact on the precalculation and can be disregarded.
Note that other iterative methods may converge if the chosen iteration method does not converge for a given equation. For example
The iteration matrix BJacobi calculated with the Jacobi method is
where . Hence, the Jacobi method cannot be converged. However, the BGauss calculated with the Gauss-Seidel method is
where , and the Gauss-Seidel method can converge. Although we chose the Jacobi method for demonstration in our study, fully analog iteration circuits accommodate a wider range of problems by selecting an appropriate iterative method.
Implementation of the fully analog iteration
A custom-built testing PCB was developed to demonstrate the analog iteration. The board can operate the analog iteration for a two-bit precision 4 × 4 scale problem using eight memristor arrays (considering the mapping mixed matrix B with two-bit precision). The analog shift-and-add units and voltage inverters were formed by OP37GSZ-REEL7 operational amplifiers with a power supply of ±12 V. This PCB supports four parallel voltage inputs of vector f and four parallel voltage outputs of solution x. The generation and measurement of the voltage signals were implemented through an ART technology ACTS2110-PCIe5650 signal capture card. The capture card was connected to the host’s personal computer via a PCIe channel and included a software interface to monitor the analog iteration procedure.
Here, we provide a point-to-point example to illustrate the mapping of the iterative matrix B onto our hardware.
Consider a matrix equation, Ax = b, with a coefficient matrix (table S1).
We can obtain the iteration matrix using Jacobi method, expressed as
The decimal value b of the binary-code fixed-point data () is calculated as
(12)
such that the two-bit iteration matrix B can be presented by
(13)
In our hardware schematic, the binary most significant bit (MSB) matrix is mapped to the MSB part of the hardware, while the binary least significant bit (LSB) matrix is mapped to the LSB part of the hardware (Fig. 2A).
Because the analog conductance can only represent positive values, we used two devices to represent an MSB/LSB matrix element: −1 = HRS − LRS, 0 = HRS − HRS, and 1 = LRS − HRS. Meanwhile, the memristor array computes the vector left multiplication , whereas the iterative formula requires the vector right multiplication . Therefore, we must transpose matrix B before mapping, such that .
Accordingly, the MSB matrix in Eq. 13 is mapped onto the memristor arrays using
The MSB part was mapped onto two 4 × 4 crossbar arrays (involving the positive and negative arrays; Fig. 2A).
Similarly, the LSB matrix of B is mapped to the memristor arrays with
Hence, the LSB part was also mapped onto two 4 × 4 crossbar arrays (involving both positive and negative arrays; Fig. 2A). Considering the layout and hardware design, each package contains a 4 × 2 array. By combining two 4 × 2 arrays, we formed a 4 × 4 array (Fig. 2B). Therefore, mapping a two-bit matrix B requires eight 4 × 2–scale packaged arrays.
High-precision solution within AIDR system
The high-precision solution inside the AIDR system was implemented using the Richardson refinement algorithm as follows:
Algorithm 1. Richardson refinement iteration on AIDR architecture.
1: Given the matrix equation ;
2: Given initial solution and ;
3: Set the tolerance tol and for , do;
4: Compute the solution update vector by solving the matrix equation (computed using analog Jacobi iterations);
5: Update the solution with ;
6: Calculate the residual and exit the algorithm if ;
7: Return ;
Matrix A is converted into iterative matrix B using the Jacobi iteration algorithm and mapped to the memristor array. The computation of dk in Algorithm 1 is performed using the following algorithm.
Algorithm 2. Compute dk using analog iteration.
1: Calculate fk based on the rk through the Jacobi iteration method;
2: Calculate , where the scaling parameter s is determined by ;
3: Feed into the analog domain, and measure the output from analog iterations;
4: Calculate ;
As previously mentioned, the scaling operation addresses the challenge of limited voltage resolution, enabling the AIDR system to achieve high-precision solutions.
Hardware demonstration of the AIDR system
The ACTS2110-PCIe5650 signal capture card provides a control program based on Python. Accordingly, we implemented the Richardson refinement algorithm (Algorithm 1) by incorporating the embedded scaling algorithm (Algorithm 2) and the capture card control algorithm using Python (version 3.9). The integrated development environment used was PyCharm (version 2023.2.1), and the hardware connection is illustrated in fig. S9. For the selected test bench, the iterative matrix B was preprogrammed into the memristor array, and we configured a latency of 100 μs to measure the output from analog iterations circuits using the signal capture board. This setup allowed for hardware demonstration of the AIDR system.
SPICE simulations of the fully analog iteration circuits
SPICE simulation of the fully analog iteration circuit was performed using the LTSpice software. The memristor was implemented using a linear resistance model with a stochasticity of approximately 1%. Functional analog circuits, including an analog inverter and an analog shift-and-adder, were constructed using the AD823 operational amplifier model.
Simulated solutions to real-world problems
We developed a Python algorithm to generate a coefficient matrix for diffusion and equilibrium P-N junction modeling. The matrices were then input into the AIDR solver. A simulated solution to these real-world problems was obtained using LTspice (the construction of the fully analog iteration circuit is illustrated in fig. S10) and by transferring the approximate analog solution to the Python simulator for the refinement iteration. We chose the Intel 13th i5-13600K CPU for digital refinement, while the operating power was monitored using the Intel Extreme Tuning Utility (Intel XTU) software (43).
Schematic of the performance comparison
In this study, we selected an Intel 13th i5-13600K CPU as the digital processor for comparison. We executed a Python-based Jacobi iteration algorithm on the CPU to benchmark the performance. The processing latency was recorded using the Python time module, whereas the operating power was recorded using Intel XTU software.
Compared with a state-of-the-art memristive solver, we constructed a high-precision solver based on Python following the method outlined in (19) and developed an analog inverse solver as described in (20) using LTspice. The hybrid solver ran Jacobi iterations with 16-bit fixed-point precision (utilizing a binary-state device model) to achieve better convergence in the equilibrium P-N junction modeling task. The analog inverse solver comprised two 64 × 64 crossbar arrays, whereas the coefficient matrix in the equilibrium P-N junction modeling task was mapped to two-bit conductance states (we simulated this circuit using the same operational amplifier model as our fully analog iteration circuit). Notably, we scaled both implementations to 22-nm nodes according to the approach outlined in (13); both implementations were based on the W/Al2O3/AlOx/Pt memristor model. Further details of the implementation are presented in text S6.
Acknowledgments
J. Li and Y. Xue contributed equally to this work. This work is supported by Hubei Engineering Research Center on Microelectronics and the Chua Memristor Institute.
Funding: This article has received funding from the National Key Research and Development Plan of China (grant no. 2022YFB4500101), STI 2030—Major Projects (grant no. 2021ZD0201201), Fundamental Research Funds for the Central Universities (grant HUST: 5003190006), and the National Science Foundation of Hubei Province (grant no. 2024AFA043).
Author contributions: Y.L., J.L., and X.M. conceived and designed the study. J.L. performed the experiments, simulations, and data analyses. Y.X. contributed to device fabrication and measurements. H.J. and Z.Z. contributed to the construction of the hardware demonstration platforms. J.L., Y.X., and Y.L. wrote the manuscript with input from all the authors. All authors discussed the results and implications and commented on the manuscript at all stages.
Competing interests: The authors declare that they have no competing interests.
Data and materials availability: All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials.
Supplementary Materials
The PDF file includes:
Supplementary Text S1 to S6
Figs. S1 to S18
Tables S1 to S4
Legend for movie S1
References
- Download
- 3.44 MB
Other Supplementary Material for this manuscript includes the following:
- Download
- 38.48 MB
REFERENCES AND NOTES
1
D. Kincaid, E. W. Cheney, Numerical Analysis: Mathematics of Scientific Computing (American Mathematical Society, ed. 3, 2009).
2
Y. Achdou, F. J. Buera, J.-M. Lasry, P.-L. Lions, B. Moll, Partial differential equation models in macroeconomics. Philos. Trans. R. Soc. Math. Phys. Eng. Sci. 372, 20130397 (2014).
3
W. F. Ames, Numerical Methods for Partial Differential Equations (Academic Press, ed. 3, 2014).
4
R. M. Badia, E. Ayguade, J. Labarta, Workflows for science: A challenge when facing the convergence of HPC and Big Data. Supercomput. Front. Innov. 4, 10.14529/jsfi170102 (2017).
5
A. Sebastian, M. Le Gallo, R. Khaddam-Aljameh, E. Eleftheriou, Memory devices and applications for in-memory computing. Nat. Nanotechnol. 15, 529–544 (2020).
6
M. M. Waldrop, The chips are down for Moore’s law. Nat. News 530, 144–147 (2016).
7
D. Ielmini, H.-S. P. Wong, In-memory computing with resistive switching devices. Nat. Electron. 1, 333–343 (2018).
8
M. Hu, C. E. Graves, C. Li, Y. Li, N. Ge, E. Montgomery, N. Davila, H. Jiang, R. S. Williams, J. J. Yang, Q. Xia, J. P. Strachan, Memristor-based analog computation and neural network classification with a dot product engine. Adv. Mater. 30, 1705914 (2018).
9
F. Cai, J. M. Correll, S. H. Lee, Y. Lim, V. Bothra, Z. Zhang, M. P. Flynn, W. D. Lu, A fully integrated reprogrammable memristor–CMOS system for efficient multiply–accumulate operations. Nat. Electron. 2, 290–299 (2019).
10
C. Li, M. Hu, Y. Li, H. Jiang, N. Ge, E. Montgomery, J. Zhang, W. Song, N. Dávila, C. E. Graves, Z. Li, J. P. Strachan, P. Lin, Z. Wang, M. Barnell, Q. Wu, R. S. Williams, J. J. Yang, Q. Xia, Analogue signal and image processing with large memristor crossbars. Nat. Electron. 1, 52–59 (2018).
11
P. Yao, H. Wu, B. Gao, J. Tang, Q. Zhang, W. Zhang, J. J. Yang, H. Qian, Fully hardware-implemented memristor convolutional neural network. Nature 577, 641–646 (2020).
12
C. Li, Z. Wang, M. Rao, D. Belkin, W. Song, H. Jiang, P. Yan, Y. Li, P. Lin, M. Hu, N. Ge, J. P. Strachan, M. Barnell, Q. Wu, R. S. Williams, J. J. Yang, Q. Xia, Long short-term memory networks in memristor crossbar arrays. Nat. Mach. Intell. 1, 49–57 (2019).
13
W. Wan, R. Kubendran, C. Schaefer, S. B. Eryilmaz, W. Zhang, D. Wu, S. Deiss, P. Raina, H. Qian, B. Gao, S. Joshi, H. Wu, H.-S. P. Wong, G. Cauwenberghs, A compute-in-memory chip based on resistive random-access memory. Nature 608, 504–512 (2022).
14
P. M. Sheridan, F. Cai, C. Du, W. Ma, Z. Zhang, W. D. Lu, Sparse coding with memristor networks. Nat. Nanotechnol. 12, 784–789 (2017).
15
F. Cai, S. Kumar, T. Van Vaerenbergh, X. Sheng, R. Liu, C. Li, Z. Liu, M. Foltin, S. Yu, Q. Xia, J. J. Yang, R. Beausoleil, W. D. Lu, J. P. Strachan, Power-efficient combinatorial optimization using intrinsic noise in memristor Hopfield neural networks. Nat. Electron. 3, 409–418 (2020).
16
L. Yang, X. Huang, Y. Li, H. Zhou, Y. Yu, H. Bao, J. Li, S. Ren, F. Wang, L. Ye, Y. He, J. Chen, G. Pu, X. Li, X. Miao, Self-selective memristor-enabled in-memory search for highly efficient data mining. InfoMat 5, e12416 (2023).
17
Y. Yu, L. Yang, H. Zhou, R. Zhao, Y. Li, H. Tong, X. Miao, In-memory search for highly efficient image retrieval. Adv. Intell. Syst. 5, 2200268 (2023).
18
M. L. Gallo, A. Sebastian, R. Mathis, M. Manica, H. Giefers, T. Tuma, C. Bekas, A. Curioni, E. Eleftheriou, Mixed-precision in-memory computing. Nat. Electron. 1, 246–253 (2018).
19
M. A. Zidan, Y. Jeong, J. Lee, B. Chen, S. Huang, M. J. Kushner, W. D. Lu, A general memristor-based partial differential equation solver. Nat. Electron. 1, 411–420 (2018).
20
Z. Sun, G. Pedretti, E. Ambrosi, A. Bricalli, W. Wang, D. Ielmini, Solving matrix equations in one step with cross-point resistive arrays. Proc. Natl. Acad. Sci. U.S.A. 116, 4123–4128 (2019).
21
B. Feinberg, U. K. R. Vengalam, N. Whitehair, S. Wang, E. Ipek, “Enabling Scientific Computing on Memristive Accelerators,” in 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA) (IEEE, 2018), pp. 367–382; https://ieeexplore.ieee.org/document/8416841/.
22
Y. Feng, B. Chen, J. Liu, Z. Sun, H. Hu, J. Zhang, X. Zhan, J. Chen, “Design-technology co-optimizations (dtco) for general-purpose computing in-memory based on 55nm NOR flash technology,” in 2021 IEEE International Electron Devices Meeting (IEDM) (IEEE, 2021), pp. 12.1.1–12.1.4.; https://ieeexplore.ieee.org/document/9720625/.
23
W. Song, M. Rao, Y. Li, C. Li, Y. Zhuo, F. Cai, M. Wu, W. Yin, Z. Li, Q. Wei, S. Lee, H. Zhu, L. Gong, M. Barnell, Q. Wu, P. A. Beerel, M. S.-W. Chen, N. Ge, M. Hu, Q. Xia, J. J. Yang, Programming memristor arrays with arbitrarily high precision for analog computing. Science 383, 903–910 (2024).
24
R. H. Sturges, Analog matrix inversion (robot kinematics). IEEE J. Robot. Autom. 4, 157–162 (1988).
25
T. Song, X. Chen, Y. Han, “Eliminating iterations of iterative methods: Solving large-scale sparse linear system in O (1) with RRAM-based in-memory accelerator” in Proceedings of the 2021 Great Lakes Symposium on VLSI (ACM, 2021), pp. 71–76; https://dl.acm.org/doi/10.1145/3453688.3461510.
26
Z. Sun, E. Ambrosi, G. Pedretti, A. Bricalli, D. Ielmini, In-memory pagerank accelerator with a cross-point array of resistive memories. IEEE Trans. Electron Devices 67, 1466–1470 (2020).
27
Z. Sun, G. Pedretti, D. Ielmini, “Fast solution of linear systems with analog resistive switching memory (RRAM),” in 2019 IEEE International Conference on Rebooting Computing (ICRC) (IEEE, 2019), pp. 1–5; https://ieeexplore.ieee.org/document/8914709/.
28
A. Greenbaum, Iterative Methods for Solving Linear Systems (Society for Industrial and Applied Mathematics, 1997).
29
X. D. Huang, Y. Li, H. Y. Li, Y. F. Lu, K. H. Xue, X. S. Miao, Enhancement of DC/AC resistive switching performance in AlOx memristor by two-technique bilayer approach. Appl. Phys. Lett. 116, 173504 (2020).
30
A. K. Cline, C. B. Moler, G. W. Stewart, J. H. Wilkinson, An estimate for the condition number of a matrix. SIAM J. Numer. Anal. 16, 368–375 (1979).
31
K. Chen, Matrix Preconditioning Techniques and Applications (Cambridge Univ. Press, 2010).
33
V. Kalantzis, A. Gupta, L. Horesh, T. Nowicki, M. S. Squillante, C. Wah Wu, T. Gokmen, H. Avron, “Solving sparse linear systems with approximate inverse preconditioners on analog devices,” in 2021 IEEE High Performance Extreme Computing Conference (HPEC) (IEEE, 2021), pp. 1–7; https://ieeexplore.ieee.org/document/9622816/.
34
M. Renardy, R. C. Rogers, An Introduction to Partial Differential Equations (Springer, ed. 2, 2004).
35
U. M. Ascher, S. J. Ruuth, B. T. R. Wetton, Implicit-explicit methods for time-dependent partial differential equations. SIAM J. Numer. Anal. 32, 797–823 (1995).
36
J. C. Strikwerda, Finite Difference Schemes and Partial Differential Equations (Society for Industrial and Applied Mathematics, ed. 2, 2004).
37
J. Li, S. Ren, Y. Li, L. Yang, Y. Yu, R. Ni, H. Zhou, H. Bao, Y. He, J. Chen, H. Jia, X. Miao, Sparse matrix multiplication in a record-low power self-rectifying memristor array for scientific computing. Sci. Adv. 9, eadf7474 (2023).
38
Z. Sun, G. Pedretti, P. Mannocci, E. Ambrosi, A. Bricalli, D. Ielmini, Time complexity of in-memory solution of linear systems. IEEE Trans. Electron Devices 67, 2945–2951 (2020).
39
J. Moll, The evolution of the theory for the voltage-current characteristic of P-N junctions. Proc. IRE 46, 1076–1082 (1958).
40
F. R. Shapiro, The numerical solution of Poisson’s equation in a pn diode using a spreadsheet. IEEE Trans. Educ. 38, 380–384 (1995).
41
M. Le Gallo, S. R. Nandakumar, L. Ciric, I. Boybat, R. Khaddam-Aljameh, C. Mackin, A. Sebastian, Precision of bit slicing with in-memory computing based on analog phase-change memory crossbars. Neuromorph. Comput. Eng. 2, 014009 (2022).
42
J.-M. Hung, C.-J. Jhang, P.-C. Wu, Y.-C. Chiu, M.-F. Chang, Challenges and trends of nonvolatile in-memory-computation circuits for AI edge devices. IEEE Open J. Solid-State Circuits Soc. 1, 171–183 (2021).
43
E. Rotem, A. Yoaz, L. Rappoport, S. J. Robinson, J. Y. Mandelblat, A. Gihon, E. Weissmann, R. Chabukswar, V. Basin, R. Fenger, M. Gupta, A. Yasin, Intel Alder Lake CPU architectures. IEEE Micro 42, 13–19 (2022).
44
H. Shan, S. Rausch, A. Jou, N. J. Conrad, S. Mohammadi, “A low power programmable dual-slope ADC for single-chip RFID sensor nodes,” in 2017 IEEE 17th Topical Meeting on Silicon Monolithic Integrated Circuits in RF Systems (SiRF) (IEEE, 2017), pp. 118–120; http://ieeexplore.ieee.org/document/7874388/.
45
S. Sarkar, S. Banerjee, An 8-bit low power DAC with re-used distributed binary cells architecture for reconfigurable transmitters. Microelectron. J. 45, 666–677 (2014).
46
Y. Liang, Z. Li, D. Chen, X. Wang, Z. Li, “A 22nm CMOS 2.4–5.25GHz mixer resilient to out-of-band blockers,” in 2021 6th International Conference on Integrated Circuits and Microsystems (ICICM) (IEEE, 2021), pp. 179–183; https://ieeexplore.ieee.org/document/9660241/.