Geometry-preserving vector field reconstruction of high-dimensional cell-state dynamics using ddHodge

52 min read Original article ↗

Introduction

The dynamics of cell states, namely, the temporal changes in cell states, underpin all biological phenomena, including development, regeneration, differentiation, and pathology. One of the primary goals of life sciences is to understand these cell state dynamics. However, these dynamics represent a high-dimensional, multibody system involving interactions among tens of thousands of proteins, nucleic acids, and lipid molecules, constituting a complex, nonequilibrium system1,2. Consequently, traditional hypothesis-driven approaches have focused on small subsystems formed by the interactions among the molecules of interest, creating building blocks for understanding the whole system. Recent advancements in single-cell omics3,4,5,6 have enabled comprehensive evaluations of cell states represented by all gene expression profiles, which could allow a data-driven understanding of complex cell state dynamics. However, single-cell measurement techniques are invasive, and the resulting data are static, lacking temporal changes for each cell. Therefore, pseudotime analysis techniques7,8,9 have been employed to construct temporal information based on of large-scale static snapshot data to analyze gene expression dynamics. Nevertheless, conventional pseudotime analysis methods provide only an order of time, and the speed and acceleration/deceleration cannot be evaluated, making it challenging to analyze detailed dynamics, including the stability of states that represent sensitivity of gene expression responses to perturbations, as well as more diverse types of state changes.

Therefore, RNA velocity10,11 has frequently been used to trace changes in cell states as trajectories in gene expression space by utilizing the time derivative of the gene expression (velocity) of each cell. Based on this time-derivative inference approach involving gene expression data, methods for reverse engineering these dynamics (vector-field reconstruction) have been proposed, allowing analyses of the interactions among genes that drive local dynamics and the least action paths of the cell state12. However, owing to the lack of a sufficient theoretical foundation to ensure accurate reconstruction of vector fields on the basis of high-dimensional data, a methodology for precisely reconstructing cell state dynamics, appropriately summarizing the dynamics occurring in high-dimensional space, and evaluating the stability of each cell state has yet to be developed.

Here, we developed a high-dimensional dynamics analysis framework called ddHodge. Our framework is based on the mathematical theorem of Hodge decomposition, which provides the three fundamental components of vector fields on manifolds13,14. ddHodge offers a unified perspective on the problem of high-dimensional vector-field reconstruction on manifold and provides a practical method to precisely reconstruct dynamics in unprecedented detail. Through simulations and validation using single-cell RNA sequencing (scRNA-seq) data, we demonstrated that ddHodge can provide a summary of the overall unknown high-dimensional dynamics and identify genes associated with fate determination by affecting the stability of cell states.

Results

ddHodge provides a framework for analyzing the high-dimensional dynamics of cell states

The cell states represented by each data point are generated from an unknown differential equation system F(x, y) (Fig. 1A, left). These states are pairs of observed positions (x, y) and their corresponding velocities F(x, y). Each velocity is given as the gradient of an unknown potential function U. The magnitude of the potential value at each point reflects the temporal ordering of the cell states (Fig. 1A, middle). The convex region in the potential landscape, indicates the stability of states that are diversifying; branching is likely to occur, and the concave region is considered to be converged; the cell states adopt specific lineages (Fig. 1A, right). Therefore, the shape of the potential landscape is a critical indicator of dynamics. Here, we used a 2D gradient system as an illustrative example to explain the interpretation of potential. Note that gradient systems are the special cases where vector fields are represented solely by the gradients of a scalar function, and only the potential determines the dynamics. Generally, dynamics are non-gradient systems.

Fig. 1: The ddHodge framework provides a method for analyzing unknown high-dimensional dynamics.
figure 1

A Shape of the landscape: The local shape of the landscape represents the local dynamics of cell states. The underlying dynamics of the cell state generates single-cell data (left), and the dynamics are represented by a potential landscape (middle). The shape of the landscape indicates the type of change in the cell state, such as diversification or convergence (right). A negative divergence (concave shape) indicates a stable state change in which velocities converge toward the center. A positive divergence (convex shape) indicates an unstable state change in which the cell state trajectories become more diverse. A divergence of zero (flat shape) indicates a change that occurs at a constant speed (i.e., no acceleration or deceleration). B Complex dynamics in high-dimensional space (left) contain both rotational and gradient components, shown here, consist of spiral source (S1) and sink (S2) components. After sampling scRNA-seq data, conventional approaches using nonlinear dimensionality reduction (UMAP) and velocity embedding lose geometric information, distorting the original dynamics. The split-and-conquer strategy of ddHodge better represents the resulting 2D representation. This manifold-aware approach maintains the integrity of the original dynamical structure. C ddHodge strategy: Reconstruct dynamics in the low-dimensional tangent space patches and combine these patches smoothly. The low-dimensional tangent spaces at each data point are estimated using a local PCA approach. A complete description of the vector field (components of Hodge decomposition: grad, div, and curl) is reconstructed smoothly along the near-parallel coordinate system designed in the curved space of the data manifold (the details are provided in the Methods). D ddHodge provides rich information about high-dimensional dynamics: the stability of states (Divergence), the best section to visualize (Schur vector), and the changes of involved genes (Grassmann).

Full size image

Reconstructing cell state dynamics after nonlinear dimensionality reduction involving data deformation is a considerably difficult problem15,16,17. Nevertheless, nonlinear dimensionality reduction is useful for visualization. We set the original high-dimensional (3D) dynamical system to have a region of instability (S1) with a large positive divergence and a region of stability (S2) with a large negative divergence (Fig. 1B, left). However, in commonly used nonlinear dimensionality reduction methods such as UMAP18, the density of the data points in original space is equalized in the reduced space; as a result, geometric features such as length and volume are lost. This deformation affects the landscape shape, making it difficult to identify regions S1 and S2 existed in the original space (Fig. 1B, right). Additionally, the velocity embedding approach10,15 used for visualizing velocity information can misrepresent dynamics by embedding high-dimensional velocity data with neighboring data points in the reduced space, potentially presenting nonexistent closed trajectories at the data boundaries. Hu et al. also showed that trajectories on cylindrical manifolds—formed by combining circular cell cycles trajectory with straight path trajectory—lose essential geometric information under nonlinear dimensionality reduction, which makes appropriate dynamical interpretation difficult19.

To address the challenge of reconstructing high-dimensional dynamics, we propose a combination of two approaches. First, we introduce a high-precision vector-field reconstruction method based on Hodge decomposition. A vector field on a manifold can be decomposed into three independent components: a gradient with divergence, a divergence-free rotational component, and a harmonic component as the residual (Hodge decomposition). The goal is to completely reconstruct these three components. Next, to make this method applicable in high-dimensional spaces, we consider approximating the Hodge components on a low-dimensional manifold (Fig. 1C; dimensionality reduction preserving data geometry). By using local principal component analysis (PCA) to approximate tangent spaces20 and patchworking these spaces, we build a low-dimensional curved space embedded in the high-dimensional space (hereafter referred to as the data manifold) (Fig. 1C, left). During the reconstruction process, we smooth the estimated values along the curved axes on the data manifold to compensate for noise and sample size limitations (Fig. 1C, top), enabling the reconstruction of the potential, divergence, and curl in high dimensions (Fig. 1C, bottom).

We implemented this framework as ddHodge.

ddHodge allows us to evaluate the temporal order and stability of data points on the basis of the potential and divergence, respectively. The periodic and oscillatory behavior of dynamic processes, such as the cell cycle and circadian rhythms, can be evaluated using the curl and harmonic components, independent of the gradient component. Even after dimensionality reduction, the Grassmann distance (the degree of bending when tangent spaces are patchworked) allows us to identify the timing of significant changes of dimensions (genes). The Jacobian matrix reconstructed on the data manifold can be used to select cross-sections of the gene expression space via Schur decomposition, where the trajectories undergo significant changes. By extracting and visualizing these rich dynamic indicators (Fig. 1D), we can identify genes associated with the diversification and convergence of dynamic cell states.

ddHodge accurately reconstructs the true dynamics from partial velocity sampling

We evaluated the performance of ddHodge with synthetic data (Fig. 2). The evaluation metric was the mean squared error (MSE) between the theoretical values of the predefined potential U(x, y) and the reconstructed values. Additionally, we assessed the ability of ddHodge to reconstruct the shape of the potential landscape (divergence) and compared it with that of conventional methods.

Fig. 2: ddHodge accurately reconstructs dynamics from a partial sample of velocities.
figure 2

A Workflow of the simulation validation approach. We used the nonlinear system F(x, y), which is completely determined by the gradient of a predefined potential U(x, y) in a low-dimensional flat plane (2D Euclidean space) (left). The initial values were set around the highest potential peak (2nd to 3rd panels). Using 100 data points sampled after random time intervals from 0 to 4 seconds (3rd to 4th panels), the potential values were reconstructed (right). B Recovering the shape of the landscape: The local shape of the landscape (Hessian matrix of U) can be recovered from the 2nd derivative of the piecewise quadratic functions fitted along the paths of the points to the k-nearest neighbors. These two landscapes (middle column, top and bottom rows) have different shapes with the same potential values. The flat shape with zero divergence (top-right) indicates constant changes of states, whereas the bumpy shape with large positive and negative divergence (bottom-right) indicates unstable states. C Comparison with the DEC method in terms of the divergence reconstruction performance: DEC with Delaunay triangular mesh vs. ddHodge with a k-NN graph. The yellow arrowheads at the sampling boundaries indicate the marked difference in the divergence estimated by the DEC and ddHodge methods. D Direct comparison of the performance of the two methods: The x-axis is the predicted value of the divergence by the DEC and ddHodge methods. The y-axis is the analytical (true) divergence of F(x, y). The yellow marker indicates the estimated values at the arrowhead on the left side of Fig. 2C. E Toggle-switch model: A nonlinear mathematical model of cell differentiation for generating multistable states is visualized with streamline plots. F ddHodge produces substantially more accurate reconstruction results than does Dynamo: the divergence, curl, and Jacobian (detJ) results obtained using Dynamo and ddHodge are shown. G Accurate recovery of the Jacobian matrix elements with ddHodge. Each panel indicates the estimated value of the Jacobian matrix elements (J11 J12 and J21 J22).

Full size image

To evaluate the performance of ddHodge in reconstructing dynamic information without dimensionality reduction, we conducted simulations of a dynamical system in which the trajectory of each point on a 2D plane was determined solely by the gradient of the potential U, defined as F(x, y) = −U(x, y) (Fig. 2A, 1st and 2nd panels, Methods). We randomly initialized points near the peak of the potential in the lower left region and simulated their trajectories for random durations between 0-4 seconds, obtaining 100 pairs of position and velocity data (Fig. 2A, 3rd and 4th panels). When ddHodge was applied to estimate the potential via the least squares method introduced in HodgeRank13, the reconstructed values agreed well with the theoretical values (Fig. 2A, 5th panel; MSE: 0.001). This result demonstrates that the proposed method can provide high-precision potential reconstruction results even with small datasets and partial sampling of data points.

We next leveraged the geometric interpretation of the estimated potential to reconstruct the shape of the potential landscape. We viewed the least squares potential estimation method as fitting a piecewise quadratic function to the edges of a graph with given slopes at the ends (Supplementary Notes). This approach can be extended to estimate the shape of the potential function (the Hessian matrix H of U). The shape is approximated by collecting fitted piecewise quadratic functions of each cross-section to the neighboring vertex (Fig. 2B, Methods). The shape of the potential indicates whether the cell state traverses the landscape at a constant speed (flat slope; upper case in Fig. 2B) or experiences acceleration, deceleration, or passes through branching points (changing slope; lower case in Fig. 2B).

Next, we verified the reliability of ddHodge’s results. We compared the divergence reconstruction performance of ddHodge with that of the discrete exterior calculus (DEC) method21,22, which can be realized as a type of finite element method (FEM), a well-established numerical analysis method (Fig. 2C). The results showed that ddHodge achieved a reconstruction accuracy comparable to or better than that of the DEC method, even when the data boundary, where the DEC method had lower reproducibility, was excluded (yellow triangles in Fig. 2C, D) (Fig. 2D, DEC: MSE = 0.502, ddHodge: MSE = 0.026).

Next, to evaluate the reconstruction performance of ddHodge, we compared ddHodge with Dynamo, which has similar vector-field reconstruction capabilities. In this comparison, we used simulation data from the toggle-switch model, a 2D nonlinear differential equation model (Fig. 2E). We compared the reconstruction results of the two methods using N = 5000 uniformly sampled pairs of position (x, y) and velocity F(x, y) data from the toggle-switch model12. The curl was reconstructed using the ddHodge’s divergence of the dual velocity fields (Methods, Supplementary Notes). The results showed that ddHodge significantly outperformed Dynamo in all metrics: the divergence (ddHodge vs. Dynamo: 0.0027 vs. 0.1142), curl (0.0027 vs. 0.0742), and Jacobian (0.0051 vs. 0.1050) (Fig. 2F).

The vector-field reconstruction function in Dynamo, utilizing SparseVFC23, is based on the theory of reproducing kernel Hilbert spaces (RKHS). In the context of numerical analysis, as explained in the comparison of DEC and ddHodge, this kernel method can also be viewed as a variant of the numerical method. Specifically, this is a variant of the Galerkin method that utilizes radial basis functions, which applies to higher dimensions. Hence, we further verified whether both could achieve equivalent reconstruction performance in terms of the numerical method with appropriate parameter settings. We thus anticipated that increasing the number of kernel basis functions (parameter M) in SparseVFC would improve accuracy. By varying M in the 2D toggle-switch model simulation, we confirmed that Dynamo’s reconstruction accuracy is comparable to ddHodge’s around M = 1000 (Supplementary Figure S1). In contrast, ddHodge retained advantages in computational efficiency and exhibited a scaling law with respect to the sampling density (cell number), owing to its use of sparse matrices derived from the local k-nearest-neighbor structure, combined with the efficient Krylov subspace solver.

We also evaluated the robustness of ddHodge to noise by varying the smoothing parameter λ and neighbor count k at different noise levels (SNR = 1000 to 10; see Supplementary Figure S2 and Methods for the definition of SNR). In high-noise environments (SNR = 10), increasing λ from 0.1 to 10.0 improved MSE from 22.3 to 1.27, and increasing k from 6 to 30 under the same conditions dramatically improved MSE from 1.27 to 0.23. These results suggest that increasing k is effective for noisy data.

Furthermore, the Jacobian matrix J can be reconstructed from the accurately reconstructed Hessian matrix and the curl (Methods, Supplementary Notes). We then evaluated the applicability of the proposed method for interaction analyses of gene pairs using J. The clear red and blue bands in the reconstruction results of each element of the Jacobian matrix (∂ẋ/∂x, ∂ẏ/∂x, ∂ẋ/∂y, and ∂ẏ/∂y where represents dx/dt) faithfully reproduced the regions that respond sensitively to perturbations in genes x and y (Fig. 2G).

These results demonstrate the ability of ddHodge to reconstruct unknown nonlinear dynamics with high accuracy and efficiency, even from partially sampled data.

Approximating low-dimensional dynamics on the data manifold

Next, we aimed to develop a dimensionality reduction method suitable for high-dimensional dynamical systems (Fig. 1C, top). For example, the surface of a sphere in three dimensions can be represented as a collection of small two-dimensional planes (Fig. 3A, top). Based on this concept, we hypothesized that by reconstructing and stitching dynamics within local lower-dimensional spaces (tangent spaces) using a local PCA approach20, both dimensionality reduction and faithful reproductions of high-dimensional dynamics could be achieved while preserving the geometry of the data manifold. The number of parameters in the Hessian matrix estimation process is reduced to m2 < d2, where d is the number of genes and m is the dimension of the tangent spaces. We also considered smoothing (regularization) along the curved axes on the data manifold. In the smoothing process, the axes of adjacent tangent spaces should be matched (Fig. 3A, middle; xu and xv, yu and yv); therefore, using idea of parallel transport in differential geometry20,24,25, we developed a method for designing near-parallel coordinate axes for high-dimensional data using a discrete connection Laplacian (sheaf Laplacian)25,26,27,28 (Fig. 3A, bottom; Supplementary Notes).

Fig. 3: Rebuilding high-dimensional dynamics by patchworking lower-dimensional local dynamics.
figure 3

A Designing near-parallel coordinates on curved spaces: The surface of a sphere, as an example of a curved 2D space embedded in 3D, is represented by small patches of flat tangent planes (top). The arrows (local x- and y-axes, xu and yu) to the adjacent tangent planes involves the problem of parallel transport (middle). The solution is to find the minimum eigenvalue of a sheaf Laplacian (bottom). B Near-parallel coordinates on a sphere: A triangulated spherical mesh is used as the input graph. The eigenvalues of the sheaf Laplacian reveal the degrees of freedom in designing near-parallel axes. The insets show two examples of designed near-parallel coordinates. C Near-perfect reconstruction of dynamics on a sphere: Scatter plot of N = 2000 random points comparing the reconstructed (horizontal) and analytical (vertical) values of the potential, divergence, and curl on a sphere. The insets show the analytical values, with the peaks of the potential and divergence on the west and east sides and the peaks of the curl on the north and south sides. D Sampling scheme from the Lorenz system: N = 2000 points and velocities sampled at random intervals from 0 to 5 s, starting from a random initial point. E Lower-dimensional representation of Lorenz system: The dynamic indices (potential, curl, and divergence) reconstructed in estimated tangent planes via a local PCA approach, visualized on a global (i.e., including all sampled points) PCA plane. F Grassmann distance reveals hinge points: The distance indicates the folding angle when patching local PCA planes. G Schur decomposition: The section that highlights critical changes at a specified point is selected. The local dynamics can be approximated by the Jacobian matrix J. The rotating dynamics around x₀ (point with the highest curl) on the right side are highlighted by projecting the velocities onto the section spanned by Schur vectors SV1 and SV2.

Full size image

First, we present a simple example of designing near-parallel coordinates on a sphere (Fig. 3B). We used the graph structure of a triangulated mesh as the input. By adopting the eigenvector corresponding to the smallest eigenvalue of the connection Laplacian of the input graph, we obtained a near-parallel axis that is consistent24,25 on the surface. We use the term ‘near’-parallel to indicate directions of movement close to parallel transport on curved space. Streamlines on the sphere trace the directions obtained from the eigenvector of the sheaf Laplacian. Singularities, the points that consolidate and resolve coordinate distortions across the entire sphere, appeared to be vortices24 (Fig. 3B right). The leftmost six identical eigenvalues indicate that the total amount of coordinate distortion remains unchanged, regardless of which vector in the 6-dimensional eigenspace is used to construct the axis. Furthermore, there was a gap29 in the spectrum of the eigenvalues from the 7th value onward, corresponding to the doubling of the number of visible vortices.

Next, using velocities sampled from the artificial dynamics simulated on a sphere (Methods), we reconstructed the dynamics with smoothing on the data manifold (Fig. 3C). The potential and divergence were reconstructed with peaks at the east and west ends of the sphere, and the curl was concentrated at the north and south ends, reproducing values approximately identical to the theoretical values (Fig. 3C; inset images, MSEs: 1.0 × 10−3, 4.0 × 10−4, and 1.0 × 10−3, respectively). We further verified the importance of the subspace alignment procedure using an asymmetric saddle-shaped potential example (Supplementary Figure S3). While smoothing results with unaligned axes significantly deviate from theoretical values of the Hessian matrix, subspace alignment using the sheaf Laplacian enables estimation that perfectly matches the theoretical values. These results indicate that both the unknown low-dimensional manifold and the dynamics on the manifold can be accurately recovered from the sampled data.

ddHodge enables qualitative interpretation of unknown complex systems

Next, we validated the interpretability of the lower-dimensional approximation of high-dimensional dynamics using the Lorenz system (Fig. 3D). The dynamics of the Lorenz system are characterized by trajectories that irregularly move between two disks in three-dimensional space. Therefore, we attempted to approximate the dynamics in the space of two joint planes as a data manifold and capture the characteristics of the original dynamics (Fig. 3E). As expected, the main characteristics of rotational behavior were identified by the positive (anticlockwise) and negative (clockwise) curl values on the left and right sides of this space. Additionally, the acceleration in the trajectory in the lower half of the space and deceleration in the upper half of the space were confirmed by the sign of the divergence value. In summary, as indicated by the smaller divergence values compared with the curl values, the dynamics of this system were considered dominated by rotations rather than gradients. Furthermore, the Grassmann distance was used to identify the planar region of the trajectory (cyan) and the hinge region at the center (magenta; points of dimensional change) (Fig. 3F).

The representative variables (genes) driving the dynamics can be identified by the Schur decomposition15,30 of the Jacobian matrix (Fig. 3G, left). Here, we used the ddHodge results in three dimensions without approximating the Lorenz system in 2D space (Fig. 3G, middle). As expected, prominent rotational dynamics were observed on the SV1-SV2 plane at the point at which the curl has its maximum value (Fig. 3G right). We also verified the accuracy of Jacobian matrix reconstruction in tangent space using the Lorenz system and a feedback-loop system (Supplementary Fig. S4). When the estimated tangent space lies within an invariant subspace of the true Jacobian matrix (i.e., when PCA well captures the tangent space), the reconstructed eigenvalues matched the theoretical ones (Supplementary Figure S4A–C). Despite information loss due to dimensionality reduction, qualitative consistency with the true Jacobian matrix will be maintained, demonstrating its applicability to gene interaction analysis (Supplementary Figure S4D–F). These results indicate that the Schur decomposition of the Jacobian matrix is effective for identifying the cross-section of the space containing the representative dynamics nearby specified point (in this example, the cross-section along the disk).

Before scRNA-seq data analysis, we conducted a 100-gene dynamics simulation based on the RNA velocity model31 to verify ddHodge’s ability to detect changes in gene-expression dynamics (Supplementary Figure S5A-D). Note that, in standard scRNA-seq analysis, thousands of genes are reduced to less than 100 dimensions through PCA preprocessing, so ddHodge’s local PCA approach naturally integrates with existing analysis workflows. By introducing cell-type–specific transcription-rate changes at \(t=3\) and \(t=7\), we simulated cell-type–dependent expression changes. Unlike the Lorenz example, this RNA-velocity simulation presents a difficult setting for ddHodge: because 100 genes evolve independently, no low-dimensional data manifold exists. Nevertheless, after low-dimensional approximation (\(r=4\)), decreases in potential and changes in divergence (Supplementary Figure S5C-D) captured transitions to equilibrium states, the presence of perturbations (rate changes), and points of fate diversification.

These results demonstrate the utility of ddHodge in inferring the behavior associated with the original dynamics, even in the approximate reconstruction of the high-dimensional dynamics.

The data analysis framework of ddHodge provides a global view of embryogenesis

Next, the scRNA-seq data were analyzed using ddHodge to quantify the potential, which represents the order of cell state transitions, and divergence, which reflects the stability of each cell state (Fig. 4A). In our analysis of scRNA-seq data, the following assumptions are adopted: (1) cells progress through state changes governed by an autonomous dynamical system, (2) both expression levels and velocity measurements are obtainable via existing methods, and (3) the resulting dynamics can be represented on a low-dimensional data manifold. These assumptions are widely used in existing RNA velocity and general scRNA-seq data analysis, including dimensionality reduction processes, and ddHodge operates faithfully within these constraints.

Fig. 4: Potential landscape provides a succinct global view of gene expression dynamics in embryogenesis.
figure 4

A scRNA-seq data analysis framework: ddHodge takes scRNA-seq data paired with velocity information (e.g., RNA velocity) as input (left). The recovered dynamic indices are overlaid on visualizations such as those obtained via UMAP (middle), providing a global view of high-dimensional dynamics through landscape representations. From this global view, cells of interest can be determined by the extracted dynamical indicators (right), allowing detailed investigation of specified regions (local view) in which gene expression state changes are prominent, as well as the genes involved. B Dynamical indicators extracted during mouse embryogenesis. In addition to clusters and embryonic stages, four dynamical indicators extracted by ddHodge are shown. VASA-seq data from mouse embryos from E6.5 to E9.5 were analyzed here. C The majority of the gene expression dynamic indicators can be explained by a potential landscape: The value of the gradient and the residual of the divergence-free (curl + harmonic) components of recovered dynamics data are shown. For comparison, the results of ddHodge based on the FUCCI dataset, which is expected to have a dominant cyclic component, are also shown. D Changes in the stability of the cell state during development: Box plots show the distribution of divergence values for each developmental stage (Q1, median and Q3; whiskers extend to 1.5 × IQR). The sample sizes (n) are 1,277, 7,134, 22,808 and 14,905 individual cells, respectively. The right image shows a canalization-type landscape, where cells transition from diversification (unstable) to convergence (stable). E Summarized dynamics of embryonic development using the height and shape of the potential: For each embryonic day, the vertical axis shows the potential, which indicates the temporal order of the state change, and the horizontal axis shows the divergence (axis is flipped), which represents the stability of the gene expression state. F Simplified landscape view of development: A potential-divergence plot of the surface ectoderm lineages extracted from (E), represented as thin lines. The inset shows the positions of the extracted clusters in the UMAP representation.

Full size image

In this analysis, we focused on VASA-seq data, which provide highly accurate velocity information due to their high degree of intron coverage6. By leveraging this dataset, we attempted to elucidate the hierarchical differentiation sequence of cell populations and the changes in the stability of the cell state for each cell lineage during the embryogenesis.

The VASA-seq data contained N = 46,124 cells from mouse embryos in the stage of E6.5–9.5. Using the gene expression levels and RNA velocity as inputs, the components of the Hodge decomposition—potential, divergence, curl, and Grassmann distance—were reconstructed and mapped to a low-dimensional UMAP representation using different colors (Fig. 4B). We found that the Grassmann distance remained low across the dataset (median value: 0.43), while the reduced dimension was m = 4. The small Grassmann distance can be interpreted as indicating that the trajectory was confined to almost a flat space, meaning there were no significant changes in the utilized gene sets (dimensions). Conversely, if one finds hinge-like points as shown in the Lorenz system, it can be interpreted that there were points where the utilized gene groups changed significantly. This result suggests that the trajectories of the gene expression dynamics associated with embryonic development reside in a nearly four-dimensional manifold.

We also found a correlation between the potential value and developmental stage, i.e., the potential progressively decreases as development progresses, indicating its utility as a pseudotime indicator12 (Supplementary Figure S6). Similarly, the divergence became increasingly negative as development progressed. Notably, the value of the curl, which reflects cyclic/oscillatory dynamics such as those associated with the cell cycle, was substantially lower than that of the divergence. These findings suggest that the majority of the gene expression dynamics during embryogenesis can be described by a gradient system governed by a potential landscape.

To further confirm that the dynamics of embryogenesis is mainly driven by a gradient system, we quantified the proportion of cyclic or periodic components separated from the gradient component (Fig. 4C). In this dataset, the gradient component accounted for 88.2% of the gene expression dynamics, whereas the divergence-free cyclic component accounted for only 11.7% of the gene expression dynamics. In contrast, in the control dataset of FUCCI-labeled cells32, which specifically represents the progression of the cell cycle, the cyclic component accounted for 49.0% of the dynamics (Fig. 4C, magenta; Methods). In the FUCCI data, the cyclic component was dominant (Supplementary Figure S7A, B). In regions where the divergence-free flow dominates, potential cannot provide a faithful temporal ordering (Supplementary Figure S7C). For circular component visualization, subtracting the gradient component from edge weights reveals the distribution of curl and harmonic components, and we observed strong divergence-free components in the G2M region (Supplementary Figure S7D). These findings suggest that embryonic gene expression dynamics can be explained mainly by the potential rather than by periodic components such as the cell cycle. This underscores the importance of understanding the topography of the landscape to elucidate the mechanistic underpinnings of embryogenesis.

To further elucidate the biological significance of this divergence, we analyzed its changes over the course of embryogenesis (Fig. 4D). During the early stages (E6.5–E7.5), most cells exhibited positive divergence, indicating an undifferentiated and unstable (divergent) state. In contrast, from E8.5 to E9.5, the divergence values became increasingly negative, suggesting that the gene expression states of the cells tended to converge and stabilize as development progressed. This finding indicates that early embryonic cells exhibit a high degree of plasticity, which subsequently decreases as differentiation becomes more defined and cell states stabilize.

Thus, we attempted to describe the dynamics of embryogenesis using only the potential and divergence (Fig. 4E). The epiblast, which possesses pluripotency with high divergence, maintained its cellular identity with a shallow slope (gradient) between E6.5 and E7.5 but was no longer detectable after E8.5. In contrast, cell populations that had diverged into distinct lineages, including the splanchnic mesoderm, emerged beneath the epiblast from E7.5 onward and progressively descended along the landscape toward E9.5, differentiating into more stable cardiac and epithelial lineages. These findings suggest that we successfully visualized the fate decision process that occurs during development according to the height and shape of the potential landscape during gastrulation.

Next, we focused on the surface ectoderm lineage (Fig. 4F). The surface ectoderm, a nonneural ectoderm, gives rise to the epithelium and epidermis of sensory organs. Even at E8.5, pre-epidermal keratinocytes appeared at a higher positions in the plot (potential > 0.5) than the placodal area, suggesting that the epidermal lineage may be specified earlier than the epithelial lineage33. Furthermore, pre-epidermal keratinocytes and the otic epithelium exhibited stable cell states at E9.5, as indicated by the highly negative divergence values (approximately −0.01). In contrast, the divergence of the fusing epithelium, which exhibits gene expression characteristics associated with the amniotic ectoderm, remained close to zero, indicating that the cell state changed with a constant speed, which is a unique property of this type of cell.

Taken together, our results demonstrate that the developmental trajectory of the surface ectoderm lineage is consistent with a recent single-cell study of embryogenesis34,35. Moreover, we successfully quantified the potency of differentiation at each developmental stage using the potential and divergence.

ddHodge enables the extraction of critical timing information and genes related to cell state changes

As nonlinear dimensionality reduction is not performed in ddHodge, the genes that drive cell state changes can be tracked. We thus focused on individual points of interest (POIs) at which cell states undergo significant changes and investigated the local dynamics (Fig. 5).

Fig. 5: Cross-sections of the gene expression state space to elucidate the local dynamics of embryogenesis.
figure 5

A (Case 1) Stabilization of a cell state with a highly negative divergence in axon elongation: 2000 cells surrounding the selected spinal cord cells that had minimal negative divergence values in cluster 18, including the neuromesodermal progenitor (NMP) and forebrain/midbrain cell types, were selected (inset circle; enlarged image of the UMAP plot near the POI). From the Schur decomposition of the Jacobian matrix at the POI, the two dimensions (plane) with the minimal negative eigenvalues were determined. B The velocity of the 2000 selected cells projected onto the identified SV2–SV4 cross-section, along with the genes exhibiting significant changes within this plane, are shown. The cell-type annotation is shown for the top six cell types, and the numbers in parentheses indicate the number of cells in this plot. C (Case 2) Diversification of the cell state with a highly positive divergence in the amniochorionic mesoderm: 2000 cells surrounding amniochorionic mesoderm cells that had maximal positive divergence values in cluster 14, including allantois, extraembryonic mesoderm, and splanchnic mesoderm cell types, were selected. The two Schur vectors with the maximal positive eigenvalues were determined. D The velocity of the 2000 selected cells projected onto the identified SV1–SV3 cross-section, along with the genes exhibiting significant changes within this plane, are shown. The blue to red colors represents the sign (+/-) of divergence values.

Full size image

To identify cells that become more stable, namely, those with negative divergence values, we selected the POI with the smallest divergence within cluster 18 (Fig. 5A, top right). The Schur decomposition of the Jacobian matrix obtained for this POI revealed a remarkable state change in the SV2-SV4 plane, corresponding to the largest negative eigenvalues. To determine the most related genes in this section, we identified candidate genes using the correlation coefficients between the divergence and gene expression levels for 1000 neighboring cells in the POI (Supplementary Figure S8). The velocities projected onto the SV2-SV4 planes demonstrated that the cell states converged and stabilized (negative divergence); furthermore, Pax6 expression was identified36 (Fig. 5B, Supplementary Fig. S8). Additionally, Nrcam and Robo2, which are genes related to axon guidance37, exhibited strong negative correlations with the divergence values, similar to Pax6, suggesting their involvement in the stabilization of cell states during axial elongation. In contrast, another cell population annotated as neuromesodermal progenitors (NMPs) expressing Cdx2 was positively correlated with divergence (Fig. 5B, top and bottom left panels, Supplementary Figure S8). Cdx2, which is regulated by multiple extracellular signals, has been reported to play a role in maintaining bipotent NMP populations38,39. These findings suggest that the mechanism maintaining undifferentiated NMPs may involve destabilization of cell states through Cdx2 expression. In summary, through this analysis, we identified several genes associated with the spinal cord lineage involved in regulating cell state stability.

To identify genes associated with cell state diversification, we selected another POI with the highest divergence from cluster 14 (Fig. 5C, top right). The neighboring cells of this POI primarily consisted of two amnion-like cell groups (annotated as A and B) (Fig. 5C, D). The Schur decomposition of the Jacobian matrix at the POI revealed two branches along the SV1 axis, with one group diverging to the right (A) and the other to the left (B). Since the loading vector of the Schur decomposition can be used to identify genes that are strongly associated with the selected axis (Supplementary Figure S9), we attempted to identify the genes with the highest weights in the loading vector. Ptch1 presented the highest loading value along the SV1 axis, suggesting the important role of this gene in A/B fate determination (Fig. 5D, top). Ptch1, the receptor of Sonic hedgehog (Shh), has been reported to be expressed in the allantois, leading to the arrest of the epithelial‒mesenchymal transition and the progression of morphogenesis40. Therefore, only group A, which expresses Ptch1, may undergo de-repression via Shh signaling and transition to a distinct cell state. Additionally, Slc2a1, which encodes a glucose transporter of Glut1, presented the highest loading value on the SV3 axis; however, its expression level increased with developmental progression in both A and B. In summary, this axis-based approach can provide a biological interpretation similar to that of PCA.

As an extra verification with real data, we applied ddHodge to a muscle regeneration dataset41. ddHodge-derived metrics can be visualized by coloring points on UMAP, focusing on muscle satellite-cell activation and muscle differentiation processes (Supplementary Figure S10A). High-divergence regions (red) were observed on the upper side of the MuSCs cluster, suggesting the need for a detailed analysis of presumable fate-bifurcation points. This high-divergence region was selected as a POI. Schur decomposition analysis was performed (Supplementary Figure S10B). By visualizing divergence values and velocity-vector projections on the SV1–SV2 plane, we succeeded in extracting the moment of fate-determination in asymmetric-division, which is composed of Myog (muscle-differentiation marker) and Cdkn1c (cell-cycle control) expression axes (Supplementary Figure S10C).

Furthermore, compared to ordinal differential gene expression (DEG) analysis aimed at detecting marker genes selectively expressed in each cluster, ddHodge has the characteristic of detecting genes that change rapidly (accelerate/decelerate or diverge/contraction) within clusters, so the importance scores of both do not necessarily match (Supplementary Figure S11). ddHodge’s exploratory analysis using divergence as an indicator successfully captures the behavior of cellular state destabilization at this fate bifurcation point, that tend to be overlooked by conventional RNA velocity analysis and DEG analysis. In addition to divergence, other dynamical indicators calculated by ddHodge can be exported in anndata format, making it compatible with many single-cell data visualization tools (Supplementary Figure S12). This enables interactive POI selection considering biological context while confirming gene expression and cell-type annotations within each cell type.

Discussion

Here, we developed ddHodge as a method for reconstructing the original vector field using pairs of sampling points and their velocities obtained from an unknown system. By utilizing RNA velocities as inputs, ddHodge extends the capabilities of existing methods by allowing evaluations of both the direction of cell state changes and the stability of each state, which indicates a potency of change, which has not previously been evaluated. We further confirmed that the majority of the gene expression dynamics in embryogenesis can be described as a gradient system (i.e., a potential landscape), which was previously unclear. Furthermore, we quantified the difficulty of lineage transitions from the divergence values, complementing traditional trajectory analyses of cell fate conceptualized by the epigenetic landscape. Additionally, we identified genes that influence the stability of the cell state.

In this study, we demonstrated that the potential and divergence reconstructed from RNA velocity data can serve as indicators for identifying points at which cell states converge and diversify. Furthermore, ddHodge can be used to extract cross-sections of high-dimensional space and identify key regulatory genes that govern local dynamics. Since this framework is general and requires only a weak assumption that the velocity is sampled from an autonomous system, ddHodge can be broadly applied to various types of velocity data, including chromatin velocity data, protein velocity data, time-resolved measurements of cellular positions, and even simulation data from theoretical models of high-dimensional dynamical systems. Notably, as evidenced by the constructed interpretations of the high-dimensional dynamic indicators in both the simulations with the Lorenz system and the application to the RNA velocity data, the approximated velocity field on the data manifold does not necessarily need to precisely capture the true underlying dynamics.

Many methods have been proposed for scRNA-seq dynamics analysis, each with different interests and goals42. Methods such as DeepVelo43 and GraphVelo44 primarily focus on improving the estimation or embedding of RNA velocity, with the explicit vector-field reconstruction falling within the scope of Dynamo and ddHodge. By contrast, Graph-Dynamo17 takes a different conceptual approach, modeling stochastic cellular-state dynamics governed by a Fokker–Planck equation, focusing on transition probabilities among metastable states and exit times. ddHodge performs component-wise and precise reconstruction of vector fields—potential, divergence, and curl–on manifold via Hodge decomposition in a DEC–like, structure-preserving manner. It further provides principled cross-sectional analysis in tangent space (via Schur decomposition) while ensuring geometry-preserving dimensional reduction.

Reconstructing a high-dimensional Jacobian that encodes all gene interactions is fundamentally impossible after dimensionality reduction. A similar limitation arises when the dynamics are intrinsically confined to a low-dimensional data manifold, such as the disk-like structures in the Lorenz and feedback-loop examples, because velocity components orthogonal to this low-dimensional structure are never observed. In these situations, just as after dimensionality reduction, the full ambient-space Jacobian cannot be identified by any method—including ddHodge or Dynamo—unless additional assumptions or biases are imposed on the unobserved directions. Given this intrinsic constraint, the meaningful object that remains accessible is the Jacobian restricted to the tangent space of the manifold. The Schur vectors of the tangent-space Jacobian can be interpreted as composite axes, analogous to PCA loading vectors. As illustrated in Fig. 5, this enables identification and interpretation of genes that drive fate transitions, including those exhibiting rapid expression changes.

ddHodge reconstructs the fundamental components of vector fields, including the potential, divergence, curl, and harmonic by overcoming several difficulties associated with analyses performed in high-dimensional spaces. For example, in the DEC method, these difficulties include constructing the mesh, evaluating areas and volumes, which are required to determine the densities and constructing discrete Hodge stars21,22. With our approach, we could accurately estimate the divergence by applying previously intractable numerical analysis methods to higher-dimensional spaces without considering boundary conditions. Additionally, we developed a method for estimating the curl in spaces of three or more dimensions. To address concerns regarding the reliability of the results due to noise and limited sample sizes—challenges inherent in data analysis—we incorporated a dedicated smoothing strategy in ddHodge. However, due to computational constraints, we have not yet produced a strict reconstruction of the curl component with ddHodge; instead, an approximation strategy that recovers the rotational features only within the PC1-2 plane is employed in the current version of ddHodge.

We observed regions that exhibited higher curl values in the mouse gastrulation data, specifically in areas associated with hematopoietic processes. These regions may represent dynamics like strong perturbation components orthogonal to the gradient direction, rather than gentle periodic/cyclical dynamics. In this paper, we did not focus on the interpretation of strong curl components. However, in the future, we need to pursue a deeper biological interpretation, as well as a complete reconstruction of the curl components. Further interpretation of divergence-free periodic components may be feasible by developing efficient methods for computing curl by Hodge duals in high-dimensional spaces and establishing appropriate interpretative frameworks, such as optimal cycles45, to obtain a succinct representation of the harmonic components in higher-dimensional spaces.

Owing to its scalability to high-dimensional data without the need for mesh construction, this method has potential as a tool for supporting data analysis involving high-dimensional complex dynamic systems. The remaining fundamental challenges are interpretating the recovered dynamic indicators in the context of existing knowledge and experimentally validating the proposed method.

Methods

ddHodge workflow

This section outlines the ddHodge workflow. Detailed explanations and illustrations are provided in the Supplemental Notes. We assume that the input data consist of \(d\)-dimensional points \({{\mathcal{X}}}=\{{{{\boldsymbol{x}}}}_{1},{{{\boldsymbol{x}}}}_{2},\ldots,{{{\boldsymbol{x}}}}_{N}\}\) and their corresponding velocities \({{\mathcal{V}}}=\{{{{\boldsymbol{v}}}}_{1},{{{\boldsymbol{v}}}}_{2},\ldots,{{{\boldsymbol{v}}}}_{N}\}\). Additionally, a \(k\)-nearest neighbors (k-NN) graph \({{\mathcal{G}}}=\left(V,E\right)\) is constructed from \({{\mathcal{X}}}\).

Approximation of a low-dimensional data manifold

First, PCA is performed using the \(k\)-nearest neighbors of each point \({{{\boldsymbol{x}}}}_{i}\) to obtain the orthonormal basis \(\{{{{\boldsymbol{o}}}}_{1}^{i},{{{\boldsymbol{o}}}}_{2}^{i},\ldots,{{{\boldsymbol{o}}}}_{r}^{i}\}\) of the tangent vector space \({T}_{i}{{\mathcal{M}}}\), where \(r\le k\). Since the tangent spaces of neighboring vertices generally have different subspaces, we construct a discrete connection Laplacian \({\Delta }_{s}\) using sheaves on graphs to align the local coordinates within the tangent spaces. The alignment is performed by minimizing the following loss function \(L\) over the set of tangent vectors \({{{\boldsymbol{z}}}}_{i}\in {T}_{i}{{\mathcal{M}}}\):

$$L\left({{{\boldsymbol{z}}}}_{1},{{{\boldsymbol{z}}}}_{2},\ldots,{{{\boldsymbol{z}}}}_{N}\right)={\sum }_{\left(i,j\right)\in E}{||}{\widetilde{O}}_{i}{{{\boldsymbol{z}}}}_{i}-{\widetilde{O}}_{j}{{{\boldsymbol{z}}}}_{j}{\parallel }^{2}={{{\boldsymbol{z}}}}^{{{\top }}}{\Delta }_{s}{{\boldsymbol{z}}}$$

(1)

where \({{\boldsymbol{z}}}\) is the vector formed by stacking \({{{\boldsymbol{z}}}}_{1},{{{\boldsymbol{z}}}}_{2},\ldots,{{{\boldsymbol{z}}}}_{N}\) vertically. The matrices \({\widetilde{O}}_{i}\) and \({\widetilde{O}}_{j}\) are obtained from the singular value decomposition \({O}_{i}^{\top }{O}_{j}=U\varSigma {V}^{{\top }}\), where \({O}_{i}=\left[{{{\boldsymbol{o}}}}_{1}^{i}{{{\boldsymbol{o}}}}_{2}^{i}\cdots {{{\boldsymbol{o}}}}_{r}^{i}\right]\), and \({\widetilde{O}}_{i}={U}^{{\top }}\), \({\widetilde{O}}_{j}={V}^{{{\top }}}\).

The eigenvector associated with the minimum eigenvalue of \({\Delta }_{s}\) is the minimizer of \(L\).

Potential estimation

The potential \({{\boldsymbol{u}}}\in {{\mathbb{R}}}^{N}\) corresponding to the gradient component of the vector field is estimated using the gradient operator \({\delta }_{0}={{\rm{grad}}}\), which is an \(M\times N\) matrix, and the edge weights \({{\boldsymbol{w}}}\in {{\mathbb{R}}}^{M}\) of graph \({{\mathcal{G}}}\). The estimation is performed via the least squares method as follows:

$${\min }_{{{\boldsymbol{u}}}}\parallel {\delta }_{0}{{\boldsymbol{u}}}-{{\boldsymbol{w}}}{\parallel }^{2}$$

(2)

where \(N\) and \(M\) represent the number of vertices and edges in \({{\mathcal{G}}}\), respectively. In this study, the edge weights \({w}_{\left(i,j\right)}\) are defined as:

$${w}_{\left(i,j\right)}=\frac{1}{2}{\left({{{\boldsymbol{v}}}}_{i}+{{{\boldsymbol{v}}}}_{j}\right)}^{{{\top }}}\left({{{\boldsymbol{x}}}}_{j}-{{{\boldsymbol{x}}}}_{i}\right)$$

(3)

In this definition, the integral of the flux is approximated along the edge direction and is equivalent to fitting a quadratic function to the cross-section of potential function \(U\) (see Supplementary Notes for more details).

Hessian matrix estimation

For a point \({{{\boldsymbol{x}}}}_{i}\), let \({{\boldsymbol{\eta }}}={{{\boldsymbol{x}}}}_{j}-{{{\boldsymbol{x}}}}_{i}\) be the vector indicating the direction from this point to a neighboring point \({{{\boldsymbol{x}}}}_{j}\). The second-order directional differential coefficient of the fitted quadratic function is given by:

$${U}_{\eta \eta }=\frac{{\left({{{\boldsymbol{x}}}}_{j}-{{{\boldsymbol{x}}}}_{i}\right)}^{{{\top }}}\left({{{\boldsymbol{v}}}}_{i}-{{{\boldsymbol{v}}}}_{j}\right)}{\parallel {{{\boldsymbol{x}}}}_{j}-{{{\boldsymbol{x}}}}_{i}{\parallel }^{2}}$$

(4)

The Hessian matrix \(H\) in the original coordinate system can be obtained via a quadratic form. For example, in two dimensions of x-y plane, the relationship is expressed as:

$${U}_{\eta \eta }={{{\boldsymbol{\eta }}}}^{{{\top }}}H{{\boldsymbol{\eta }}} =\left({\eta }_{1}{\eta }_{2}\right)\left(\begin{array}{cc}{U}_{{xx}} & {U}_{{xy}}\\ {U}_{{xy}} & {U}_{{yy}}\end{array}\right)\left(\begin{array}{c}{\eta }_{1}\\ {\eta }_{2}\end{array}\right)\\ ={\eta }_{1}^{2}{U}_{{xx}}+2{\eta }_{1}{\eta }_{2}{U}_{{xy}}+{\eta }_{2}^{2}{U}_{{yy}}$$

(5)

By gathering the known second-order differential coefficients such as \({U}_{\eta \eta }\) in independent directions \({\bf{\varphi}},{\boldsymbol{\psi}},\ldots\), the linear equation can be constructed to estimate the elements of \(H\).

$$\left(\begin{array}{ccc}{\varphi }_{1}^{2} & 2{\varphi }_{1}{\varphi }_{2} & {\varphi }_{2}^{2}\\ {\psi }_{1}^{2} & 2{\psi }_{1}{\psi }_{2} & {\psi }_{2}^{2}\\ {\eta }_{1}^{2} & 2{\eta }_{1}{\eta }_{2} & {\eta }_{2}^{2}\end{array}\right)\left(\begin{array}{c}{U}_{{xx}}\\ {U}_{{xy}}\\ {U}_{{yy}}\end{array}\right)=\left(\begin{array}{c}{U}_{\varphi \varphi }\\ {U}_{\psi \psi }\\ {U}_{\eta \eta }\end{array}\right)$$

(6)

We denote the above equation as \(C{{\boldsymbol{\beta }}}={{\boldsymbol{y}}}\). The divergence is obtained from \(-{{\rm{tr}}}H\) after estimating \(H\). In practice, the problem is better solved as an optimization problem \({\min }_{{{\boldsymbol{\beta }}}}\parallel C{{\boldsymbol{\beta }}}-{{\boldsymbol{y}}}{\parallel }^{2}\) to mitigate issues in data analysis, such as those related to noise and the limited sample size. Additionally, since the number of Hessian matrix elements (number of estimated parameters) increases on the order of \({d}^{2}\), the Hessian matrix is estimated in the reduced-dimensional space.

Smoothing over graph

To enable the regularization of the estimation process between neighboring vertices, we merged the individual Hessian matrix estimation problems as a single linear equation:

$$C{{\boldsymbol{\beta }}}=\left(\begin{array}{ccc}{C}_{1} & & \\ & \ddots & \\ & & {C}_{N}\end{array}\right)\left(\begin{array}{c}{{{\boldsymbol{\beta }}}}_{1}\\ \vdots \\ {{{\boldsymbol{\beta }}}}_{N}\end{array}\right)$$

(7)

This allows us to introduce a penalty (Dirichlet energy on the graph) to the squares of the differences \({{{\boldsymbol{\beta }}}}_{i}-{{{\boldsymbol{\beta }}}}_{j}\) between adjacent vertices as:

$$P\left({{\boldsymbol{\beta }}}\right)={\left({D}_{0}{{\boldsymbol{\beta }}}\right)}^{{{\top }}}\left({D}_{0}{{\boldsymbol{\beta }}}\right)={{{\boldsymbol{\beta }}}}^{{\top }}{L}_{0}{{\boldsymbol{\beta }}}$$

(8)

Here, assume that the \(p\) parameters are arranged in a certain order as \({{{\boldsymbol{\beta }}}}_{i}=\left({U}_{{xx}}^{i},{U}_{{xy}}^{i},{U}_{{yy}}^{i}\right)\), such that the differences between each parameter can be individually evaluated using the extended gradient matrix \({D}_{0}={\delta }_{0}\otimes {I}_{p}\) with the Kronecker product and \({L}_{0}={D}_{0}^{{{\top }}}{D}_{0}\).

The optimization problem with this regularization term is then solved as:

$${\min }_{{{\boldsymbol{\beta }}}}\parallel {{\boldsymbol{y}}}-C{{\boldsymbol{\beta }}}{\parallel }^{2}+\lambda P\left({{\boldsymbol{\beta }}}\right)$$

(9)

with some small constant \(\lambda > 0\).

Given the large \({Np}\times {Np}\) size of the sparse matrix \({L}_{0}\), efficient solvers such as the Krylov subspace method are recommended.

Curl estimation

For differential forms in the Euclidean space \({{\mathbb{R}}}^{2}\), the relationship between \({{\rm{div}}}\) and \({{\rm{curl}}}\) is given by:

$${{\rm{div}}}\left(\star \xi \right)=\left(\star d\star \right)\left(\star \xi \right)=\left(\star d\right)\left(\star \star \right)\xi=-{{\rm{curl}}}\xi$$

(10)

where \(d\) is the exterior derivative and \(\star\) is the Hodge star operator. Therefore, the curl can be approximated by the divergence of the dual vector field \(\star \xi\), which is obtained by rotating the velocity by \(\pi /2\) in the PC1-2 plane.

Jacobian matrix estimation

According to the theory of Hodge decomposition, a vector field can be expressed as the sum of a gradient part \(-\nabla U\) and a divergence-free residual part \(r\):

Here, let \(D\) be the operator that gives the Jacobian matrix of \(f\). Since \(D\) is linear and \(D\nabla U\) gives the Hessian matrix \(H\) of \(U\), \(D\left(f\right)\) has the following decomposition:

$$D\left(f\right)=D\left(-\nabla U+r\right)=-H+D\left(r\right)$$

(12)

If \(r\) consists of only rotational components (i.e., there are no holes in the space), \(D\left(r\right)\) can be expressed using the matrix \(Q\), representing a \(\pi /2\) rotation in the PC1-2 plane, as \(D\left(r\right)=\frac{\theta }{2}Q\). Here, \(\theta\) is the magnitude of the rotation, which is calculated from the previously derived \({{\rm{curl}}}\).

Software implementation

ddHodge was implemented in Julia, a high-performance scientific programming language. The computation time depends on the number of analyzed cells and the dimensionality of the components after the reduction process is performed. The most computationally intensive parts are computing eigenvalues and solving least-squares problems with large sparse matrices. To optimize these calculations, we employ the Krylov subspace method (KrylovKit.jl), which efficiently handles sparse matrices by leveraging repeated matrix‒vector multiplication operations. With GPU acceleration (we used NVIDIA RTX A2000 12 GB), the computation takes approximately 35 seconds for 46,124 cells with the parameter: k = 12 of k-NN graph and r = 4 of reduced dimension, and approximately 2 minutes for another dataset with 98,316 cells with the same parameter (Figs. 45).

Dynamical system setup

Gradient system in the 2D plane

To create a gradient system in a 2D plane, we use the Gaussian function as follows:

$$g\left(x,y; {\mu }_{x},{\mu }_{y}\right)={e}^{-\frac{1}{2}\left\{{\left(x-{\mu }_{x}\right)}^{2}+{\left(y-{\mu }_{y}\right)}^{2}\right\}}$$

(13)

We constructed a potential function with peaks and valleys as follows:

$$U\left(x,y\right)=g\left(x,y; {\mu }_{x}=-0.5,{\mu }_{y}=-0.5\right)-g\left(x,y; {\mu }_{x}=0.5,{\mu }_{y}=0.5\right)$$

(14)

Using this potential function, we simulate the differential equation \({{\boldsymbol{F}}}\left(x,y\right)=-\nabla U\left(x,y\right)\). The simulation is performed by adding noise (Gaussian noise with a standard deviation of 0.2) to the initial positions at the center of the potential peak and running random simulations for 0 to 4 s using the Runge–Kutta method. We sampled \(N=100\) points and their velocities.

Toggle-switch model

The toggle-switch model is defined by the following set of differential equations:

$$\begin{array}{cc} & \frac{d{x}_{1}}{{dt}}={a}_{1}\frac{{x}_{1}^{n}}{{K}_{1}^{n}+{x}_{1}^{n}}+{b}_{1}\frac{{K}_{1}^{n}}{{K}_{1}^{n}+{x}_{2}^{n}}-{x}_{1}\\ & \frac{d{x}_{2}}{{dt}}={a}_{2}\frac{{x}_{2}^{n}}{{K}_{2}^{n}+{x}_{2}^{n}}+{b}_{2}\frac{{K}_{2}^{n}}{{K}_{2}^{n}+{x}_{1}^{n}}-{x}_{2}\\ & \end{array}$$

(15)

The parameters are set to \({K}_{1}={K}_{2}=1/2,n=4\), and \({a}_{1}={a}_{2}={b}_{1}={b}_{2}=1\). The same simulation result from Dynamo’s paper was reproduced by following the instructions at the following link: https://github.com/aristoteleo/dynamo_submission.

Lorenz system

The Lorenz system is defined by the following set of differential equations:

$$\begin{array}{cc} & \frac{{{\rm{d}}}x}{{{\rm{d}}}t}=\sigma \left(y-x\right)\\ & \frac{{{\rm{d}}}y}{{{\rm{d}}}t}=x\left(\rho -z\right)-y\\ & \frac{{{\rm{d}}}z}{{{\rm{d}}}t}={xy}-\beta z\end{array}$$

(16)

The parameters are set to \(\sigma=10\), \(\rho=28\), and \(\beta=8/3\). The simulation is performed by initializing \(\left(x,y,z\right)=\left(10,10,10\right)\) with added Gaussian noise \(N\left(3,0\right)\) and sampling points after a random time \(T\), which was generated from a uniform distribution \(T\sim U\left(0.5,5\right)\).

Noise settings in RNA velocity simulation

We consider RNA-velocity data with scaled degradation rate \(\widetilde{\gamma }=\gamma /\beta=1.0\). Such a condition emulates a scenario in which the steady states lie on the diagonal \({u}_{s}={s}_{s}\) in the spliced–unspliced plane (i.e., the ratio of spliced to unspliced counts is 1:1). We generated a proxy for the unspliced count \(u=\widetilde{\gamma }s+v\) because \({ds}/{dt}\propto v=u-\widetilde{\gamma }s\), where \(v\) is the (scaled) velocity. To set the signal-to-noise ratio (SNR), we fixed the signal scale at \({\sigma }_{s}=1.0\) so that the sampled region \([0,4]\times [0,4]\) in the toggle-switch simulation is nearly covered by the 2D Gaussian \({{\mathscr{N}}}((2,2),{\sigma }_{s}^{2}I)\). Then, SNR is defined by \({\sigma }_{s}/{\sigma }_{n}\) when the noise is drawn from \({{\mathscr{N}}}((0,0),{\sigma }_{s}^{2}I)\).

Proportions of gradient and cyclic component on graphs

The edge weights \({{\boldsymbol{w}}}\) are determined by velocity (see the Potential estimation section in Methods). Then, we calculate the vector \({{\boldsymbol{u}}}\) representing potential values at vertices, which allows us to determine the gradient component at edges as \(-{\delta }_{0}{{\boldsymbol{u}}}\). We set the magnitude of the gradient component as \(p\). Next, the cyclic (divergence-free) component \({{\boldsymbol{r}}}={{\boldsymbol{w}}}+{\delta }_{0}{{\boldsymbol{u}}}\) obtained by subtracting the gradient \(-{\delta }_{0}{{\boldsymbol{u}}}\) from weight \({{\boldsymbol{w}}}\). The magnitude of the cyclic component is similarly calculated as content \(q\), and the proportions of \(p\) and \(q\) represent the proportion of the gradient and cyclic components on the graph. These quantities are summarized as follows:

$$p=\parallel {\delta }_{0}{{\boldsymbol{u}}}{\parallel }^{2},q=\parallel {{\boldsymbol{w}}}+{\delta }_{0}{{\boldsymbol{u}}}{\parallel }^{2}.$$

(17)

The gradient and divergence-free components are orthogonal, and they satisfy \({{\langle }}{\delta }_{0}{{\boldsymbol{u}}},{{\boldsymbol{r}}}{{\rangle }}=0\). Therefore, their normalized contributions in \({{\boldsymbol{w}}}\) sum to one: \((p+q)/\parallel {{\boldsymbol{w}}}{\parallel }^{2}=1\).

scRNA-seq data analysis

Velocities were estimated using scvelo (version 0.3.2). For cell-type annotation, the scRNA-seq data of mouse embryos34 were used as training data for Celltypist (version 1.6.3), and the cell-type labels were given for the VASA-seq data. The analysis using ddHodge was performed with a \(k\)-NN graph where \(k=12\) and the reduced dimensionality was \(r=4\). The detailed methods for reproducing the data analysis are available on GitHub.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

All datasets analyzed in this study are publicly available. For VASA-seq data, spliced and unspliced read counts were obtained from GEO (accession: GSE176588). The processed loom file of FUCCI data was obtained from GitHub (CellProfiling/SingleCellProteogenomics [https://github.com/CellProfiling/SingleCellProteogenomics]). For the regenerating muscle tissue dataset, the raw sequence reads were obtained from NCBI (accession: SRP241205). The processed data from the VASA-seq and muscle regeneration dataset, along with various dynamic indicators calculated by ddHodge, are deposited in Figshare (https://doi.org/10.6084/m9.figshare.28723652). Source data for all figures are available in the Figshare repository and in the GitHub repository described below. No access restrictions apply.

Code availability

ddHodge46 is implemented as a Julia package and is available at https://github.com/kazumits/ddHodge.jl. The Jupyter notebooks used to reproduce the figures and the related processed data in this manuscript are also available at https://github.com/kazumits/ddHodge_figures. These codes were archived in Zenodo under the https://doi.org/10.5281/zenodo.17660331 and https://doi.org/10.5281/zenodo.17660342.

References

  1. Furusawa, C. & Kaneko, K. A dynamical-systems view of stem cell biology. Science 338, 215–217 (2012).

    Article  ADS  CAS  PubMed  Google Scholar 

  2. Fang, X., Kruse, K., Lu, T. & Wang, J. Nonequilibrium physics in biology. Rev. Mod. Phys. 91, 045004 (2019).

    Article  ADS  MathSciNet  CAS  Google Scholar 

  3. Picelli, S. et al. Smart-seq2 for sensitive full-length transcriptome profiling in single cells. Nat. Methods 10, 1096–1098 (2013).

    Article  CAS  PubMed  Google Scholar 

  4. Hashimshony, T. et al. CEL-Seq2: sensitive highly-multiplexed single-cell RNA-Seq. Genome Biol. https://doi.org/10.1186/s13059-016-0938-8 (2016)

  5. Sasagawa, Y. et al. Quartz-Seq2: a high-throughput single-cell RNA-sequencing method that effectively uses limited sequence reads. Genome Biol. 19, 29 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  6. Salmen, F. et al. High-throughput total RNA sequencing in single cells using VASA-seq. Nat. Biotechnol. https://doi.org/10.1038/s41587-022-01361-8 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  7. Trapnell, C. et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat. Biotechnol. https://doi.org/10.1038/nbt.2859 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  8. Haghverdi, L., Buettner, F. & Theis, F. J. Diffusion maps for high-dimensional single-cell analysis of differentiation data. Bioinformatics. https://doi.org/10.1093/bioinformatics/btv325 (2015)

  9. Street, K. et al. Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics. BMC Genom. 19, 477 (2018).

    Article  Google Scholar 

  10. La Manno, G. et al. RNA velocity of single cells. Nature. https://doi.org/10.1038/s41586-018-0414-6 (2018)

  11. Bergen, V., Lange, M., Peidli, S., Wolf, F. A. & Theis, F. J. Generalizing RNA velocity to transient cell states through dynamical modeling. Nat. Biotechnol. 38, 1408–1414 (2020).

    Article  CAS  PubMed  Google Scholar 

  12. Qiu, X. et al. Mapping transcriptomic vector fields of single cells. Cell. S0092867421015774. https://doi.org/10.1016/j.cell.2021.12.045 (2022)

  13. Jiang, X., Lim, L. H., Yao, Y. & Ye, Y. Statistical ranking and combinatorial Hodge theory. Math. Program. https://doi.org/10.1007/s10107-010-0419-x (2011)

  14. Lim, L.-H. Hodge Laplacians on Graphs. SIAM Rev. 62, 685–715 (2020).

    Article  MathSciNet  Google Scholar 

  15. Lange, M. et al. CellRank for directed single-cell fate mapping. Nat. Methods 19, 159–170 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Marot-Lassauzaie, V. et al. Towards reliable quantification of cell state velocities. PLoS Comput Biol. 18, e1010031 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Zhang, Y. et al. Graph-Dynamo: Learning stochastic cellular state transition dynamics from single cell data. 2023.09.24.559170 Preprint at. https://doi.org/10.1101/2023.09.24.559170 (2023).

  18. McInnes, L., Healy, J., Saul, N. & Großberger, L. UMAP: uniform manifold approximation and projection. J. Open Source Softw. 3, 861 (2018).

    Article  Google Scholar 

  19. Hu, S. et al. Epithelial-mesenchymal transition couples with cell cycle arrest at various stages. Preprint at. https://doi.org/10.1101/2025.02.24.639880 (2025).

  20. Singer, A. & Wu, H.-T. Vector diffusion maps and the connection Laplacian. Commun. Pure Appl. Math. 65, 1067–1144 (2012).

    Article  MathSciNet  Google Scholar 

  21. Hirani, A. N. Discrete Exterior Calculus. https://doi.org/10.7907/ZHY8-V329 (California Institute of Technology, 2003).

  22. Desbrun, M., Hirani, A. N., Leok, M. & Marsden, J. E. Discrete Exterior Calculus. Preprint at. https://doi.org/10.48550/arXiv.math/0508341 (2005).

  23. Ma, J., Zhao, J., Tian, J., Bai, X. & Tu, Z. Regularized vector field learning with sparse approximation for mismatch removal. Pattern Recognit. 46, 3519–3532 (2013).

    Article  ADS  Google Scholar 

  24. Crane, K., Desbrun, M. & Schröder, P. Trivial connections on discrete surfaces. Comput. Graph. Forum 29, 1525–1533 (2010).

    Article  Google Scholar 

  25. Pfau, D., Higgins, I., Botev, A. & Racanière, S. Disentangling by subspace diffusion. in Advances in Neural Information Processing Systems, Vol. 33, 17403–17415 (Curran Associates, Inc., 2020).

  26. Hansen, J. & Ghrist, R. Toward a spectral theory of cellular sheaves. J. Appl. Comput. Topol. 3, 315–358 (2019).

    Article  MathSciNet  Google Scholar 

  27. Hansen, J. & Ghrist, R. Opinion dynamics on discourse sheaves. SIAM J. Appl. Math. 81, 2033–2060 (2021).

    Article  MathSciNet  Google Scholar 

  28. Bodnar, C., Di Giovanni, F., Chamberlain, B. P., Liò, P. & Bronstein, M. Neural sheaf diffusion: a topological perspective on heterophily and oversmoothing in GNNs. In Advances in Neural Information Processing Systems, Vol. 35, 18527–18541 (Curran Associates, Inc., 2022).

  29. Berwick-Evans, D., Hirani, A. N. & Schubel, M. D. Discrete Vector Bundles with Connection and the Bianchi Identity. Preprint at. https://doi.org/10.48550/arXiv.2104.10277 (2023).

  30. Reuter, B., Fackeldey, K. & Weber, M. Generalized Markov modeling of nonreversible molecular kinetics. J. Chem. Phys. 150, 174103 (2019).

    Article  ADS  PubMed  Google Scholar 

  31. Gorin, G., Fang, M., Chari, T. & Pachter, L. RNA velocity unraveled. PLoS Comput Biol. 18, e1010492 (2022).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  32. Mahdessian, D. et al. Spatiotemporal dissection of the cell cycle with single-cell proteogenomics. Nature 590, 649–654 (2021).

    Article  ADS  CAS  PubMed  Google Scholar 

  33. Ohyama, T., Mohamed, O. A., Taketo, M. M., Dufort, D. & Groves, A. K. Wnt signals mediate a fate decision between otic placode and epidermis. Development 133, 865–875 (2006).

    Article  CAS  PubMed  Google Scholar 

  34. Qiu, C. et al. Systematic reconstruction of cellular trajectories across mouse embryogenesis. Nat. Genet 54, 328–341 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Qiu, C. et al. A single-cell time-lapse of mouse prenatal development from gastrula to birth. Nature 626, 1084–1093 (2024).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  36. Ericson, J. et al. Pax6 Controls Progenitor Cell Identity and Neuronal Fate in Response to Graded Shh Signaling. Cell 90, 169–180 (1997).

    Article  CAS  PubMed  Google Scholar 

  37. Thompson, H., Andrews, W., Parnavelas, J. G. & Erskine, L. Robo2 is required for Slit-mediated intraretinal axon guidance. Dev. Biol. 335, 418–426 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Henrique, D., Abranches, E., Verrier, L. & Storey, K. G. Neuromesodermal progenitors and the making of the spinal cord. Development 142, 2864–2875 (2015).

    Article  CAS  PubMed  Google Scholar 

  39. Gouti, M. et al. A Gene regulatory network balances neural and mesoderm specification during vertebrate trunk development. Dev. Cell 41, 243–261.e7 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Downs, K. M. The mouse allantois: new insights at the embryonic–extraembryonic interface. Philos. Trans. R. Soc. B 377, 20210251 (2022).

    Article  Google Scholar 

  41. De Micheli, A. J. et al. Single-cell analysis of the muscle stem cell hierarchy identifies heterotypic communication signals involved in skeletal muscle regeneration. Cell Rep. 30, 3583–3595.e5 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  42. Zhang, Z., Sun, Y., Peng, Q., Li, T. & Zhou, P. Integrating dynamical systems modeling with spatiotemporal scRNA-seq data analysis. Entropy 27, 453 (2025).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  43. Zhanlin, Chen., William, C. K., Aheyon, H., Mark, G. & Jing, Z. DeepVelo: Single-cell transcriptomic deep velocity field learning with neural ordinary differential equations. Sci. Adv. 8, eabq3745 (2022).

  44. Chen, Y. et al. GraphVelo allows for accurate inference of multimodal velocities and molecular mechanisms for single cells. Nat. Commun. 16, 7831 (2025).

  45. Dey, T. K., Hirani, A. N. & Krishnamoorthy, B. Optimal Homologous Cycles, Total Unimodularity, and Linear Programming. SIAM J. Comput. 40, 1026–1044 (2011).

    Article  MathSciNet  Google Scholar 

  46. Maehara, K. Geometry-preserving vector field reconstruction of high-dimensional cell-state dynamics using ddHodge. Zenodo https://doi.org/10.5281/zenodo.17660331 (2025).

Download references

Acknowledgements

We thank Profs. Yusuke Imoto and Yasuaki Hiraoka at Kyoto University for insightful discussions, and Advanced Research Initiative, Research Promotion Unit, Medical Institute of Bioregulation, Kyushu University, for their support. This work was supported by JST PRESTO JPMJPR2026, JST CREST JPMJCR23N3, JPMJCR24Q1, JSPS KAKENHI JP22H04696, JP23H04288 and JP25H01484 to K.M., AMED BINDS JP22ama121017j0001, JSPS KAKENHI JP24H02323, JP23H00372, JP22H04676 and JP22K19275 to Y.O. This work was also supported in part by the MEXT Promotion of Development of a Joint Usage / Research System Project: Pan-Omics DDRIC, MRCI for High Depth Omics, CURE:JPMXP1323015486 for MIB and RIIT and Autonomous Medical Research Center in Kyushu University.

Author information

Authors and Affiliations

  1. Department of Multi-Omics, Graduate School of Medical Sciences, Kyushu University, Fukuoka, Japan

    Kazumitsu Maehara

  2. Division of Transcriptomics, Medical Institute of Bioregulation, Kyushu University, Fukuoka, Japan

    Kazumitsu Maehara & Yasuyuki Ohkawa

Authors

  1. Kazumitsu Maehara
  2. Yasuyuki Ohkawa

Contributions

K. M. and Y.O. conceived and interpretation of results, and K.M. built the theoretical framework of ddHodge and developed software, K.M. and Y.O. wrote the paper.

Corresponding author

Correspondence to Yasuyuki Ohkawa.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Jianhua Xing, who co-reviewed with Yan Zhang, Xiaojie Qiu and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Maehara, K., Ohkawa, Y. Geometry-preserving vector field reconstruction of high-dimensional cell-state dynamics using ddHodge. Nat Commun 16, 11342 (2025). https://doi.org/10.1038/s41467-025-67782-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • DOI: https://doi.org/10.1038/s41467-025-67782-6