Geometry-preserving vector field reconstruction of high-dimensional cell-state dynamics using ddHodge

Introduction

The dynamics of cell states, namely, the temporal changes in cell states, underpin all biological phenomena, including development, regeneration, differentiation, and pathology. One of the primary goals of life sciences is to understand these cell state dynamics. However, these dynamics represent a high-dimensional, multibody system involving interactions among tens of thousands of proteins, nucleic acids, and lipid molecules, constituting a complex, nonequilibrium system^1,2. Consequently, traditional hypothesis-driven approaches have focused on small subsystems formed by the interactions among the molecules of interest, creating building blocks for understanding the whole system. Recent advancements in single-cell omics^3,4,5,6 have enabled comprehensive evaluations of cell states represented by all gene expression profiles, which could allow a data-driven understanding of complex cell state dynamics. However, single-cell measurement techniques are invasive, and the resulting data are static, lacking temporal changes for each cell. Therefore, pseudotime analysis techniques^7,8,9 have been employed to construct temporal information based on of large-scale static snapshot data to analyze gene expression dynamics. Nevertheless, conventional pseudotime analysis methods provide only an order of time, and the speed and acceleration/deceleration cannot be evaluated, making it challenging to analyze detailed dynamics, including the stability of states that represent sensitivity of gene expression responses to perturbations, as well as more diverse types of state changes.

Therefore, RNA velocity^10,11 has frequently been used to trace changes in cell states as trajectories in gene expression space by utilizing the time derivative of the gene expression (velocity) of each cell. Based on this time-derivative inference approach involving gene expression data, methods for reverse engineering these dynamics (vector-field reconstruction) have been proposed, allowing analyses of the interactions among genes that drive local dynamics and the least action paths of the cell state¹². However, owing to the lack of a sufficient theoretical foundation to ensure accurate reconstruction of vector fields on the basis of high-dimensional data, a methodology for precisely reconstructing cell state dynamics, appropriately summarizing the dynamics occurring in high-dimensional space, and evaluating the stability of each cell state has yet to be developed.

Here, we developed a high-dimensional dynamics analysis framework called ddHodge. Our framework is based on the mathematical theorem of Hodge decomposition, which provides the three fundamental components of vector fields on manifolds^13,14. ddHodge offers a unified perspective on the problem of high-dimensional vector-field reconstruction on manifold and provides a practical method to precisely reconstruct dynamics in unprecedented detail. Through simulations and validation using single-cell RNA sequencing (scRNA-seq) data, we demonstrated that ddHodge can provide a summary of the overall unknown high-dimensional dynamics and identify genes associated with fate determination by affecting the stability of cell states.

Results

ddHodge provides a framework for analyzing the high-dimensional dynamics of cell states

The cell states represented by each data point are generated from an unknown differential equation system F(x, y) (Fig. 1A, left). These states are pairs of observed positions (x, y) and their corresponding velocities F(x, y). Each velocity is given as the gradient of an unknown potential function U. The magnitude of the potential value at each point reflects the temporal ordering of the cell states (Fig. 1A, middle). The convex region in the potential landscape, indicates the stability of states that are diversifying; branching is likely to occur, and the concave region is considered to be converged; the cell states adopt specific lineages (Fig. 1A, right). Therefore, the shape of the potential landscape is a critical indicator of dynamics. Here, we used a 2D gradient system as an illustrative example to explain the interpretation of potential. Note that gradient systems are the special cases where vector fields are represented solely by the gradients of a scalar function, and only the potential determines the dynamics. Generally, dynamics are non-gradient systems.

**Fig. 1: The ddHodge framework provides a method for analyzing unknown high-dimensional dynamics.**

Reconstructing cell state dynamics after nonlinear dimensionality reduction involving data deformation is a considerably difficult problem^15,16,17. Nevertheless, nonlinear dimensionality reduction is useful for visualization. We set the original high-dimensional (3D) dynamical system to have a region of instability (S₁) with a large positive divergence and a region of stability (S₂) with a large negative divergence (Fig. 1B, left). However, in commonly used nonlinear dimensionality reduction methods such as UMAP¹⁸, the density of the data points in original space is equalized in the reduced space; as a result, geometric features such as length and volume are lost. This deformation affects the landscape shape, making it difficult to identify regions S₁ and S₂ existed in the original space (Fig. 1B, right). Additionally, the velocity embedding approach^10,15 used for visualizing velocity information can misrepresent dynamics by embedding high-dimensional velocity data with neighboring data points in the reduced space, potentially presenting nonexistent closed trajectories at the data boundaries. Hu et al. also showed that trajectories on cylindrical manifolds—formed by combining circular cell cycles trajectory with straight path trajectory—lose essential geometric information under nonlinear dimensionality reduction, which makes appropriate dynamical interpretation difficult¹⁹.

To address the challenge of reconstructing high-dimensional dynamics, we propose a combination of two approaches. First, we introduce a high-precision vector-field reconstruction method based on Hodge decomposition. A vector field on a manifold can be decomposed into three independent components: a gradient with divergence, a divergence-free rotational component, and a harmonic component as the residual (Hodge decomposition). The goal is to completely reconstruct these three components. Next, to make this method applicable in high-dimensional spaces, we consider approximating the Hodge components on a low-dimensional manifold (Fig. 1C; dimensionality reduction preserving data geometry). By using local principal component analysis (PCA) to approximate tangent spaces²⁰ and patchworking these spaces, we build a low-dimensional curved space embedded in the high-dimensional space (hereafter referred to as the data manifold) (Fig. 1C, left). During the reconstruction process, we smooth the estimated values along the curved axes on the data manifold to compensate for noise and sample size limitations (Fig. 1C, top), enabling the reconstruction of the potential, divergence, and curl in high dimensions (Fig. 1C, bottom).

We implemented this framework as ddHodge.

ddHodge allows us to evaluate the temporal order and stability of data points on the basis of the potential and divergence, respectively. The periodic and oscillatory behavior of dynamic processes, such as the cell cycle and circadian rhythms, can be evaluated using the curl and harmonic components, independent of the gradient component. Even after dimensionality reduction, the Grassmann distance (the degree of bending when tangent spaces are patchworked) allows us to identify the timing of significant changes of dimensions (genes). The Jacobian matrix reconstructed on the data manifold can be used to select cross-sections of the gene expression space via Schur decomposition, where the trajectories undergo significant changes. By extracting and visualizing these rich dynamic indicators (Fig. 1D), we can identify genes associated with the diversification and convergence of dynamic cell states.

ddHodge accurately reconstructs the true dynamics from partial velocity sampling

We evaluated the performance of ddHodge with synthetic data (Fig. 2). The evaluation metric was the mean squared error (MSE) between the theoretical values of the predefined potential U(x, y) and the reconstructed values. Additionally, we assessed the ability of ddHodge to reconstruct the shape of the potential landscape (divergence) and compared it with that of conventional methods.

**Fig. 2: ddHodge accurately reconstructs dynamics from a partial sample of velocities.**

To evaluate the performance of ddHodge in reconstructing dynamic information without dimensionality reduction, we conducted simulations of a dynamical system in which the trajectory of each point on a 2D plane was determined solely by the gradient of the potential U, defined as F(x, y) = −∇U(x, y) (Fig. 2A, 1st and 2nd panels, Methods). We randomly initialized points near the peak of the potential in the lower left region and simulated their trajectories for random durations between 0-4 seconds, obtaining 100 pairs of position and velocity data (Fig. 2A, 3rd and 4th panels). When ddHodge was applied to estimate the potential via the least squares method introduced in HodgeRank¹³, the reconstructed values agreed well with the theoretical values (Fig. 2A, 5th panel; MSE: 0.001). This result demonstrates that the proposed method can provide high-precision potential reconstruction results even with small datasets and partial sampling of data points.

We next leveraged the geometric interpretation of the estimated potential to reconstruct the shape of the potential landscape. We viewed the least squares potential estimation method as fitting a piecewise quadratic function to the edges of a graph with given slopes at the ends (Supplementary Notes). This approach can be extended to estimate the shape of the potential function (the Hessian matrix H of U). The shape is approximated by collecting fitted piecewise quadratic functions of each cross-section to the neighboring vertex (Fig. 2B, Methods). The shape of the potential indicates whether the cell state traverses the landscape at a constant speed (flat slope; upper case in Fig. 2B) or experiences acceleration, deceleration, or passes through branching points (changing slope; lower case in Fig. 2B).

Next, we verified the reliability of ddHodge’s results. We compared the divergence reconstruction performance of ddHodge with that of the discrete exterior calculus (DEC) method^21,22, which can be realized as a type of finite element method (FEM), a well-established numerical analysis method (Fig. 2C). The results showed that ddHodge achieved a reconstruction accuracy comparable to or better than that of the DEC method, even when the data boundary, where the DEC method had lower reproducibility, was excluded (yellow triangles in Fig. 2C, D) (Fig. 2D, DEC: MSE = 0.502, ddHodge: MSE = 0.026).

Next, to evaluate the reconstruction performance of ddHodge, we compared ddHodge with Dynamo, which has similar vector-field reconstruction capabilities. In this comparison, we used simulation data from the toggle-switch model, a 2D nonlinear differential equation model (Fig. 2E). We compared the reconstruction results of the two methods using N = 5000 uniformly sampled pairs of position (x, y) and velocity F(x, y) data from the toggle-switch model¹². The curl was reconstructed using the ddHodge’s divergence of the dual velocity fields (Methods, Supplementary Notes). The results showed that ddHodge significantly outperformed Dynamo in all metrics: the divergence (ddHodge vs. Dynamo: 0.0027 vs. 0.1142), curl (0.0027 vs. 0.0742), and Jacobian (0.0051 vs. 0.1050) (Fig. 2F).

The vector-field reconstruction function in Dynamo, utilizing SparseVFC²³, is based on the theory of reproducing kernel Hilbert spaces (RKHS). In the context of numerical analysis, as explained in the comparison of DEC and ddHodge, this kernel method can also be viewed as a variant of the numerical method. Specifically, this is a variant of the Galerkin method that utilizes radial basis functions, which applies to higher dimensions. Hence, we further verified whether both could achieve equivalent reconstruction performance in terms of the numerical method with appropriate parameter settings. We thus anticipated that increasing the number of kernel basis functions (parameter M) in SparseVFC would improve accuracy. By varying M in the 2D toggle-switch model simulation, we confirmed that Dynamo’s reconstruction accuracy is comparable to ddHodge’s around M = 1000 (Supplementary Figure S1). In contrast, ddHodge retained advantages in computational efficiency and exhibited a scaling law with respect to the sampling density (cell number), owing to its use of sparse matrices derived from the local k-nearest-neighbor structure, combined with the efficient Krylov subspace solver.

We also evaluated the robustness of ddHodge to noise by varying the smoothing parameter λ and neighbor count k at different noise levels (SNR = 1000 to 10; see Supplementary Figure S2 and Methods for the definition of SNR). In high-noise environments (SNR = 10), increasing λ from 0.1 to 10.0 improved MSE from 22.3 to 1.27, and increasing k from 6 to 30 under the same conditions dramatically improved MSE from 1.27 to 0.23. These results suggest that increasing k is effective for noisy data.

Furthermore, the Jacobian matrix J can be reconstructed from the accurately reconstructed Hessian matrix and the curl (Methods, Supplementary Notes). We then evaluated the applicability of the proposed method for interaction analyses of gene pairs using J. The clear red and blue bands in the reconstruction results of each element of the Jacobian matrix (∂ẋ/∂x, ∂ẏ/∂x, ∂ẋ/∂y, and ∂ẏ/∂y where ẋ represents dx/dt) faithfully reproduced the regions that respond sensitively to perturbations in genes x and y (Fig. 2G).

These results demonstrate the ability of ddHodge to reconstruct unknown nonlinear dynamics with high accuracy and efficiency, even from partially sampled data.

Approximating low-dimensional dynamics on the data manifold

Next, we aimed to develop a dimensionality reduction method suitable for high-dimensional dynamical systems (Fig. 1C, top). For example, the surface of a sphere in three dimensions can be represented as a collection of small two-dimensional planes (Fig. 3A, top). Based on this concept, we hypothesized that by reconstructing and stitching dynamics within local lower-dimensional spaces (tangent spaces) using a local PCA approach²⁰, both dimensionality reduction and faithful reproductions of high-dimensional dynamics could be achieved while preserving the geometry of the data manifold. The number of parameters in the Hessian matrix estimation process is reduced to m² < d², where d is the number of genes and m is the dimension of the tangent spaces. We also considered smoothing (regularization) along the curved axes on the data manifold. In the smoothing process, the axes of adjacent tangent spaces should be matched (Fig. 3A, middle; x_u and x_v, y_u and y_v); therefore, using idea of parallel transport in differential geometry^20,24,25, we developed a method for designing near-parallel coordinate axes for high-dimensional data using a discrete connection Laplacian (sheaf Laplacian)^25,26,27,28 (Fig. 3A, bottom; Supplementary Notes).

**Fig. 3: Rebuilding high-dimensional dynamics by patchworking lower-dimensional local dynamics.**

First, we present a simple example of designing near-parallel coordinates on a sphere (Fig. 3B). We used the graph structure of a triangulated mesh as the input. By adopting the eigenvector corresponding to the smallest eigenvalue of the connection Laplacian of the input graph, we obtained a near-parallel axis that is consistent^24,25 on the surface. We use the term ‘near’-parallel to indicate directions of movement close to parallel transport on curved space. Streamlines on the sphere trace the directions obtained from the eigenvector of the sheaf Laplacian. Singularities, the points that consolidate and resolve coordinate distortions across the entire sphere, appeared to be vortices²⁴ (Fig. 3B right). The leftmost six identical eigenvalues indicate that the total amount of coordinate distortion remains unchanged, regardless of which vector in the 6-dimensional eigenspace is used to construct the axis. Furthermore, there was a gap²⁹ in the spectrum of the eigenvalues from the 7th value onward, corresponding to the doubling of the number of visible vortices.

Next, using velocities sampled from the artificial dynamics simulated on a sphere (Methods), we reconstructed the dynamics with smoothing on the data manifold (Fig. 3C). The potential and divergence were reconstructed with peaks at the east and west ends of the sphere, and the curl was concentrated at the north and south ends, reproducing values approximately identical to the theoretical values (Fig. 3C; inset images, MSEs: 1.0 × 10⁻³, 4.0 × 10⁻⁴, and 1.0 × 10⁻³, respectively). We further verified the importance of the subspace alignment procedure using an asymmetric saddle-shaped potential example (Supplementary Figure S3). While smoothing results with unaligned axes significantly deviate from theoretical values of the Hessian matrix, subspace alignment using the sheaf Laplacian enables estimation that perfectly matches the theoretical values. These results indicate that both the unknown low-dimensional manifold and the dynamics on the manifold can be accurately recovered from the sampled data.

ddHodge enables qualitative interpretation of unknown complex systems

Next, we validated the interpretability of the lower-dimensional approximation of high-dimensional dynamics using the Lorenz system (Fig. 3D). The dynamics of the Lorenz system are characterized by trajectories that irregularly move between two disks in three-dimensional space. Therefore, we attempted to approximate the dynamics in the space of two joint planes as a data manifold and capture the characteristics of the original dynamics (Fig. 3E). As expected, the main characteristics of rotational behavior were identified by the positive (anticlockwise) and negative (clockwise) curl values on the left and right sides of this space. Additionally, the acceleration in the trajectory in the lower half of the space and deceleration in the upper half of the space were confirmed by the sign of the divergence value. In summary, as indicated by the smaller divergence values compared with the curl values, the dynamics of this system were considered dominated by rotations rather than gradients. Furthermore, the Grassmann distance was used to identify the planar region of the trajectory (cyan) and the hinge region at the center (magenta; points of dimensional change) (Fig. 3F).

The representative variables (genes) driving the dynamics can be identified by the Schur decomposition^15,30 of the Jacobian matrix (Fig. 3G, left). Here, we used the ddHodge results in three dimensions without approximating the Lorenz system in 2D space (Fig. 3G, middle). As expected, prominent rotational dynamics were observed on the SV1-SV2 plane at the point at which the curl has its maximum value (Fig. 3G right). We also verified the accuracy of Jacobian matrix reconstruction in tangent space using the Lorenz system and a feedback-loop system (Supplementary Fig. S4). When the estimated tangent space lies within an invariant subspace of the true Jacobian matrix (i.e., when PCA well captures the tangent space), the reconstructed eigenvalues matched the theoretical ones (Supplementary Figure S4A–C). Despite information loss due to dimensionality reduction, qualitative consistency with the true Jacobian matrix will be maintained, demonstrating its applicability to gene interaction analysis (Supplementary Figure S4D–F). These results indicate that the Schur decomposition of the Jacobian matrix is effective for identifying the cross-section of the space containing the representative dynamics nearby specified point (in this example, the cross-section along the disk).

Before scRNA-seq data analysis, we conducted a 100-gene dynamics simulation based on the RNA velocity model³¹ to verify ddHodge’s ability to detect changes in gene-expression dynamics (Supplementary Figure S5A-D). Note that, in standard scRNA-seq analysis, thousands of genes are reduced to less than 100 dimensions through PCA preprocessing, so ddHodge’s local PCA approach naturally integrates with existing analysis workflows. By introducing cell-type–specific transcription-rate changes at $t=3$ and $t=7$, we simulated cell-type–dependent expression changes. Unlike the Lorenz example, this RNA-velocity simulation presents a difficult setting for ddHodge: because 100 genes evolve independently, no low-dimensional data manifold exists. Nevertheless, after low-dimensional approximation ($r=4$), decreases in potential and changes in divergence (Supplementary Figure S5C-D) captured transitions to equilibrium states, the presence of perturbations (rate changes), and points of fate diversification.

These results demonstrate the utility of ddHodge in inferring the behavior associated with the original dynamics, even in the approximate reconstruction of the high-dimensional dynamics.

The data analysis framework of ddHodge provides a global view of embryogenesis

Next, the scRNA-seq data were analyzed using ddHodge to quantify the potential, which represents the order of cell state transitions, and divergence, which reflects the stability of each cell state (Fig. 4A). In our analysis of scRNA-seq data, the following assumptions are adopted: (1) cells progress through state changes governed by an autonomous dynamical system, (2) both expression levels and velocity measurements are obtainable via existing methods, and (3) the resulting dynamics can be represented on a low-dimensional data manifold. These assumptions are widely used in existing RNA velocity and general scRNA-seq data analysis, including dimensionality reduction processes, and ddHodge operates faithfully within these constraints.

**Fig. 4: Potential landscape provides a succinct global view of gene expression dynamics in embryogenesis.**

In this analysis, we focused on VASA-seq data, which provide highly accurate velocity information due to their high degree of intron coverage⁶. By leveraging this dataset, we attempted to elucidate the hierarchical differentiation sequence of cell populations and the changes in the stability of the cell state for each cell lineage during the embryogenesis.

The VASA-seq data contained N = 46,124 cells from mouse embryos in the stage of E6.5–9.5. Using the gene expression levels and RNA velocity as inputs, the components of the Hodge decomposition—potential, divergence, curl, and Grassmann distance—were reconstructed and mapped to a low-dimensional UMAP representation using different colors (Fig. 4B). We found that the Grassmann distance remained low across the dataset (median value: 0.43), while the reduced dimension was m = 4. The small Grassmann distance can be interpreted as indicating that the trajectory was confined to almost a flat space, meaning there were no significant changes in the utilized gene sets (dimensions). Conversely, if one finds hinge-like points as shown in the Lorenz system, it can be interpreted that there were points where the utilized gene groups changed significantly. This result suggests that the trajectories of the gene expression dynamics associated with embryonic development reside in a nearly four-dimensional manifold.

We also found a correlation between the potential value and developmental stage, i.e., the potential progressively decreases as development progresses, indicating its utility as a pseudotime indicator¹² (Supplementary Figure S6). Similarly, the divergence became increasingly negative as development progressed. Notably, the value of the curl, which reflects cyclic/oscillatory dynamics such as those associated with the cell cycle, was substantially lower than that of the divergence. These findings suggest that the majority of the gene expression dynamics during embryogenesis can be described by a gradient system governed by a potential landscape.

To further confirm that the dynamics of embryogenesis is mainly driven by a gradient system, we quantified the proportion of cyclic or periodic components separated from the gradient component (Fig. 4C). In this dataset, the gradient component accounted for 88.2% of the gene expression dynamics, whereas the divergence-free cyclic component accounted for only 11.7% of the gene expression dynamics. In contrast, in the control dataset of FUCCI-labeled cells³², which specifically represents the progression of the cell cycle, the cyclic component accounted for 49.0% of the dynamics (Fig. 4C, magenta; Methods). In the FUCCI data, the cyclic component was dominant (Supplementary Figure S7A, B). In regions where the divergence-free flow dominates, potential cannot provide a faithful temporal ordering (Supplementary Figure S7C). For circular component visualization, subtracting the gradient component from edge weights reveals the distribution of curl and harmonic components, and we observed strong divergence-free components in the G2M region (Supplementary Figure S7D). These findings suggest that embryonic gene expression dynamics can be explained mainly by the potential rather than by periodic components such as the cell cycle. This underscores the importance of understanding the topography of the landscape to elucidate the mechanistic underpinnings of embryogenesis.

To further elucidate the biological significance of this divergence, we analyzed its changes over the course of embryogenesis (Fig. 4D). During the early stages (E6.5–E7.5), most cells exhibited positive divergence, indicating an undifferentiated and unstable (divergent) state. In contrast, from E8.5 to E9.5, the divergence values became increasingly negative, suggesting that the gene expression states of the cells tended to converge and stabilize as development progressed. This finding indicates that early embryonic cells exhibit a high degree of plasticity, which subsequently decreases as differentiation becomes more defined and cell states stabilize.

Thus, we attempted to describe the dynamics of embryogenesis using only the potential and divergence (Fig. 4E). The epiblast, which possesses pluripotency with high divergence, maintained its cellular identity with a shallow slope (gradient) between E6.5 and E7.5 but was no longer detectable after E8.5. In contrast, cell populations that had diverged into distinct lineages, including the splanchnic mesoderm, emerged beneath the epiblast from E7.5 onward and progressively descended along the landscape toward E9.5, differentiating into more stable cardiac and epithelial lineages. These findings suggest that we successfully visualized the fate decision process that occurs during development according to the height and shape of the potential landscape during gastrulation.

Next, we focused on the surface ectoderm lineage (Fig. 4F). The surface ectoderm, a nonneural ectoderm, gives rise to the epithelium and epidermis of sensory organs. Even at E8.5, pre-epidermal keratinocytes appeared at a higher positions in the plot (potential > 0.5) than the placodal area, suggesting that the epidermal lineage may be specified earlier than the epithelial lineage³³. Furthermore, pre-epidermal keratinocytes and the otic epithelium exhibited stable cell states at E9.5, as indicated by the highly negative divergence values (approximately −0.01). In contrast, the divergence of the fusing epithelium, which exhibits gene expression characteristics associated with the amniotic ectoderm, remained close to zero, indicating that the cell state changed with a constant speed, which is a unique property of this type of cell.

Taken together, our results demonstrate that the developmental trajectory of the surface ectoderm lineage is consistent with a recent single-cell study of embryogenesis^34,35. Moreover, we successfully quantified the potency of differentiation at each developmental stage using the potential and divergence.

ddHodge enables the extraction of critical timing information and genes related to cell state changes

As nonlinear dimensionality reduction is not performed in ddHodge, the genes that drive cell state changes can be tracked. We thus focused on individual points of interest (POIs) at which cell states undergo significant changes and investigated the local dynamics (Fig. 5).

**Fig. 5: Cross-sections of the gene expression state space to elucidate the local dynamics of embryogenesis.**

To identify cells that become more stable, namely, those with negative divergence values, we selected the POI with the smallest divergence within cluster 18 (Fig. 5A, top right). The Schur decomposition of the Jacobian matrix obtained for this POI revealed a remarkable state change in the SV2-SV4 plane, corresponding to the largest negative eigenvalues. To determine the most related genes in this section, we identified candidate genes using the correlation coefficients between the divergence and gene expression levels for 1000 neighboring cells in the POI (Supplementary Figure S8). The velocities projected onto the SV2-SV4 planes demonstrated that the cell states converged and stabilized (negative divergence); furthermore, Pax6 expression was identified³⁶ (Fig. 5B, Supplementary Fig. S8). Additionally, Nrcam and Robo2, which are genes related to axon guidance³⁷, exhibited strong negative correlations with the divergence values, similar to Pax6, suggesting their involvement in the stabilization of cell states during axial elongation. In contrast, another cell population annotated as neuromesodermal progenitors (NMPs) expressing Cdx2 was positively correlated with divergence (Fig. 5B, top and bottom left panels, Supplementary Figure S8). Cdx2, which is regulated by multiple extracellular signals, has been reported to play a role in maintaining bipotent NMP populations^38,39. These findings suggest that the mechanism maintaining undifferentiated NMPs may involve destabilization of cell states through Cdx2 expression. In summary, through this analysis, we identified several genes associated with the spinal cord lineage involved in regulating cell state stability.

To identify genes associated with cell state diversification, we selected another POI with the highest divergence from cluster 14 (Fig. 5C, top right). The neighboring cells of this POI primarily consisted of two amnion-like cell groups (annotated as A and B) (Fig. 5C, D). The Schur decomposition of the Jacobian matrix at the POI revealed two branches along the SV1 axis, with one group diverging to the right (A) and the other to the left (B). Since the loading vector of the Schur decomposition can be used to identify genes that are strongly associated with the selected axis (Supplementary Figure S9), we attempted to identify the genes with the highest weights in the loading vector. Ptch1 presented the highest loading value along the SV1 axis, suggesting the important role of this gene in A/B fate determination (Fig. 5D, top). Ptch1, the receptor of Sonic hedgehog (Shh), has been reported to be expressed in the allantois, leading to the arrest of the epithelial‒mesenchymal transition and the progression of morphogenesis⁴⁰. Therefore, only group A, which expresses Ptch1, may undergo de-repression via Shh signaling and transition to a distinct cell state. Additionally, Slc2a1, which encodes a glucose transporter of Glut1, presented the highest loading value on the SV3 axis; however, its expression level increased with developmental progression in both A and B. In summary, this axis-based approach can provide a biological interpretation similar to that of PCA.

As an extra verification with real data, we applied ddHodge to a muscle regeneration dataset⁴¹. ddHodge-derived metrics can be visualized by coloring points on UMAP, focusing on muscle satellite-cell activation and muscle differentiation processes (Supplementary Figure S10A). High-divergence regions (red) were observed on the upper side of the MuSCs cluster, suggesting the need for a detailed analysis of presumable fate-bifurcation points. This high-divergence region was selected as a POI. Schur decomposition analysis was performed (Supplementary Figure S10B). By visualizing divergence values and velocity-vector projections on the SV1–SV2 plane, we succeeded in extracting the moment of fate-determination in asymmetric-division, which is composed of Myog (muscle-differentiation marker) and Cdkn1c (cell-cycle control) expression axes (Supplementary Figure S10C).

Furthermore, compared to ordinal differential gene expression (DEG) analysis aimed at detecting marker genes selectively expressed in each cluster, ddHodge has the characteristic of detecting genes that change rapidly (accelerate/decelerate or diverge/contraction) within clusters, so the importance scores of both do not necessarily match (Supplementary Figure S11). ddHodge’s exploratory analysis using divergence as an indicator successfully captures the behavior of cellular state destabilization at this fate bifurcation point, that tend to be overlooked by conventional RNA velocity analysis and DEG analysis. In addition to divergence, other dynamical indicators calculated by ddHodge can be exported in anndata format, making it compatible with many single-cell data visualization tools (Supplementary Figure S12). This enables interactive POI selection considering biological context while confirming gene expression and cell-type annotations within each cell type.

Discussion

Here, we developed ddHodge as a method for reconstructing the original vector field using pairs of sampling points and their velocities obtained from an unknown system. By utilizing RNA velocities as inputs, ddHodge extends the capabilities of existing methods by allowing evaluations of both the direction of cell state changes and the stability of each state, which indicates a potency of change, which has not previously been evaluated. We further confirmed that the majority of the gene expression dynamics in embryogenesis can be described as a gradient system (i.e., a potential landscape), which was previously unclear. Furthermore, we quantified the difficulty of lineage transitions from the divergence values, complementing traditional trajectory analyses of cell fate conceptualized by the epigenetic landscape. Additionally, we identified genes that influence the stability of the cell state.

In this study, we demonstrated that the potential and divergence reconstructed from RNA velocity data can serve as indicators for identifying points at which cell states converge and diversify. Furthermore, ddHodge can be used to extract cross-sections of high-dimensional space and identify key regulatory genes that govern local dynamics. Since this framework is general and requires only a weak assumption that the velocity is sampled from an autonomous system, ddHodge can be broadly applied to various types of velocity data, including chromatin velocity data, protein velocity data, time-resolved measurements of cellular positions, and even simulation data from theoretical models of high-dimensional dynamical systems. Notably, as evidenced by the constructed interpretations of the high-dimensional dynamic indicators in both the simulations with the Lorenz system and the application to the RNA velocity data, the approximated velocity field on the data manifold does not necessarily need to precisely capture the true underlying dynamics.

Many methods have been proposed for scRNA-seq dynamics analysis, each with different interests and goals⁴². Methods such as DeepVelo⁴³ and GraphVelo⁴⁴ primarily focus on improving the estimation or embedding of RNA velocity, with the explicit vector-field reconstruction falling within the scope of Dynamo and ddHodge. By contrast, Graph-Dynamo¹⁷ takes a different conceptual approach, modeling stochastic cellular-state dynamics governed by a Fokker–Planck equation, focusing on transition probabilities among metastable states and exit times. ddHodge performs component-wise and precise reconstruction of vector fields—potential, divergence, and curl–on manifold via Hodge decomposition in a DEC–like, structure-preserving manner. It further provides principled cross-sectional analysis in tangent space (via Schur decomposition) while ensuring geometry-preserving dimensional reduction.

Reconstructing a high-dimensional Jacobian that encodes all gene interactions is fundamentally impossible after dimensionality reduction. A similar limitation arises when the dynamics are intrinsically confined to a low-dimensional data manifold, such as the disk-like structures in the Lorenz and feedback-loop examples, because velocity components orthogonal to this low-dimensional structure are never observed. In these situations, just as after dimensionality reduction, the full ambient-space Jacobian cannot be identified by any method—including ddHodge or Dynamo—unless additional assumptions or biases are imposed on the unobserved directions. Given this intrinsic constraint, the meaningful object that remains accessible is the Jacobian restricted to the tangent space of the manifold. The Schur vectors of the tangent-space Jacobian can be interpreted as composite axes, analogous to PCA loading vectors. As illustrated in Fig. 5, this enables identification and interpretation of genes that drive fate transitions, including those exhibiting rapid expression changes.

ddHodge reconstructs the fundamental components of vector fields, including the potential, divergence, curl, and harmonic by overcoming several difficulties associated with analyses performed in high-dimensional spaces. For example, in the DEC method, these difficulties include constructing the mesh, evaluating areas and volumes, which are required to determine the densities and constructing discrete Hodge stars^21,22. With our approach, we could accurately estimate the divergence by applying previously intractable numerical analysis methods to higher-dimensional spaces without considering boundary conditions. Additionally, we developed a method for estimating the curl in spaces of three or more dimensions. To address concerns regarding the reliability of the results due to noise and limited sample sizes—challenges inherent in data analysis—we incorporated a dedicated smoothing strategy in ddHodge. However, due to computational constraints, we have not yet produced a strict reconstruction of the curl component with ddHodge; instead, an approximation strategy that recovers the rotational features only within the PC1-2 plane is employed in the current version of ddHodge.

We observed regions that exhibited higher curl values in the mouse gastrulation data, specifically in areas associated with hematopoietic processes. These regions may represent dynamics like strong perturbation components orthogonal to the gradient direction, rather than gentle periodic/cyclical dynamics. In this paper, we did not focus on the interpretation of strong curl components. However, in the future, we need to pursue a deeper biological interpretation, as well as a complete reconstruction of the curl components. Further interpretation of divergence-free periodic components may be feasible by developing efficient methods for computing curl by Hodge duals in high-dimensional spaces and establishing appropriate interpretative frameworks, such as optimal cycles⁴⁵, to obtain a succinct representation of the harmonic components in higher-dimensional spaces.

Owing to its scalability to high-dimensional data without the need for mesh construction, this method has potential as a tool for supporting data analysis involving high-dimensional complex dynamic systems. The remaining fundamental challenges are interpretating the recovered dynamic indicators in the context of existing knowledge and experimentally validating the proposed method.

Methods

ddHodge workflow

This section outlines the ddHodge workflow. Detailed explanations and illustrations are provided in the Supplemental Notes. We assume that the input data consist of $d$-dimensional points ${{\mathcal{X}}}=\{{{{\boldsymbol{x}}}}_{1},{{{\boldsymbol{x}}}}_{2},\ldots,{{{\boldsymbol{x}}}}_{N}\}$ and their corresponding velocities ${{\mathcal{V}}}=\{{{{\boldsymbol{v}}}}_{1},{{{\boldsymbol{v}}}}_{2},\ldots,{{{\boldsymbol{v}}}}_{N}\}$. Additionally, a $k$-nearest neighbors (k-NN) graph ${{\mathcal{G}}}=\left(V,E\right)$ is constructed from ${{\mathcal{X}}}$.

Approximation of a low-dimensional data manifold

First, PCA is performed using the $k$-nearest neighbors of each point ${{{\boldsymbol{x}}}}_{i}$ to obtain the orthonormal basis $\{{{{\boldsymbol{o}}}}_{1}^{i},{{{\boldsymbol{o}}}}_{2}^{i},\ldots,{{{\boldsymbol{o}}}}_{r}^{i}\}$ of the tangent vector space ${T}_{i}{{\mathcal{M}}}$, where $r\le k$. Since the tangent spaces of neighboring vertices generally have different subspaces, we construct a discrete connection Laplacian ${\Delta }_{s}$ using sheaves on graphs to align the local coordinates within the tangent spaces. The alignment is performed by minimizing the following loss function $L$ over the set of tangent vectors ${{{\boldsymbol{z}}}}_{i}\in {T}_{i}{{\mathcal{M}}}$:

$$L\left({{{\boldsymbol{z}}}}_{1},{{{\boldsymbol{z}}}}_{2},\ldots,{{{\boldsymbol{z}}}}_{N}\right)={\sum }_{\left(i,j\right)\in E}{||}{\widetilde{O}}_{i}{{{\boldsymbol{z}}}}_{i}-{\widetilde{O}}_{j}{{{\boldsymbol{z}}}}_{j}{\parallel }^{2}={{{\boldsymbol{z}}}}^{{{\top }}}{\Delta }_{s}{{\boldsymbol{z}}}$$

(1)

where ${{\boldsymbol{z}}}$ is the vector formed by stacking ${{{\boldsymbol{z}}}}_{1},{{{\boldsymbol{z}}}}_{2},\ldots,{{{\boldsymbol{z}}}}_{N}$ vertically. The matrices ${\widetilde{O}}_{i}$ and ${\widetilde{O}}_{j}$ are obtained from the singular value decomposition ${O}_{i}^{\top }{O}_{j}=U\varSigma {V}^{{\top }}$, where ${O}_{i}=\left[{{{\boldsymbol{o}}}}_{1}^{i}{{{\boldsymbol{o}}}}_{2}^{i}\cdots {{{\boldsymbol{o}}}}_{r}^{i}\right]$, and ${\widetilde{O}}_{i}={U}^{{\top }}$, ${\widetilde{O}}_{j}={V}^{{{\top }}}$.

The eigenvector associated with the minimum eigenvalue of ${\Delta }_{s}$ is the minimizer of $L$.

Potential estimation

The potential ${{\boldsymbol{u}}}\in {{\mathbb{R}}}^{N}$ corresponding to the gradient component of the vector field is estimated using the gradient operator ${\delta }_{0}={{\rm{grad}}}$, which is an $M\times N$ matrix, and the edge weights ${{\boldsymbol{w}}}\in {{\mathbb{R}}}^{M}$ of graph ${{\mathcal{G}}}$. The estimation is performed via the least squares method as follows:

$${\min }_{{{\boldsymbol{u}}}}\parallel {\delta }_{0}{{\boldsymbol{u}}}-{{\boldsymbol{w}}}{\parallel }^{2}$$

(2)

where $N$ and $M$ represent the number of vertices and edges in ${{\mathcal{G}}}$, respectively. In this study, the edge weights ${w}_{\left(i,j\right)}$ are defined as:

$${w}_{\left(i,j\right)}=\frac{1}{2}{\left({{{\boldsymbol{v}}}}_{i}+{{{\boldsymbol{v}}}}_{j}\right)}^{{{\top }}}\left({{{\boldsymbol{x}}}}_{j}-{{{\boldsymbol{x}}}}_{i}\right)$$

(3)

In this definition, the integral of the flux is approximated along the edge direction and is equivalent to fitting a quadratic function to the cross-section of potential function $U$ (see Supplementary Notes for more details).

Hessian matrix estimation

For a point ${{{\boldsymbol{x}}}}_{i}$, let ${{\boldsymbol{\eta }}}={{{\boldsymbol{x}}}}_{j}-{{{\boldsymbol{x}}}}_{i}$ be the vector indicating the direction from this point to a neighboring point ${{{\boldsymbol{x}}}}_{j}$. The second-order directional differential coefficient of the fitted quadratic function is given by:

$${U}_{\eta \eta }=\frac{{\left({{{\boldsymbol{x}}}}_{j}-{{{\boldsymbol{x}}}}_{i}\right)}^{{{\top }}}\left({{{\boldsymbol{v}}}}_{i}-{{{\boldsymbol{v}}}}_{j}\right)}{\parallel {{{\boldsymbol{x}}}}_{j}-{{{\boldsymbol{x}}}}_{i}{\parallel }^{2}}$$

(4)

The Hessian matrix $H$ in the original coordinate system can be obtained via a quadratic form. For example, in two dimensions of x-y plane, the relationship is expressed as:

$${U}_{\eta \eta }={{{\boldsymbol{\eta }}}}^{{{\top }}}H{{\boldsymbol{\eta }}} =\left({\eta }_{1}{\eta }_{2}\right)\left(\begin{array}{cc}{U}_{{xx}} & {U}_{{xy}}\\ {U}_{{xy}} & {U}_{{yy}}\end{array}\right)\left(\begin{array}{c}{\eta }_{1}\\ {\eta }_{2}\end{array}\right)\\ ={\eta }_{1}^{2}{U}_{{xx}}+2{\eta }_{1}{\eta }_{2}{U}_{{xy}}+{\eta }_{2}^{2}{U}_{{yy}}$$

(5)

By gathering the known second-order differential coefficients such as ${U}_{\eta \eta }$ in independent directions ${\bf{\varphi}},{\boldsymbol{\psi}},\ldots$, the linear equation can be constructed to estimate the elements of $H$.

$$\left(\begin{array}{ccc}{\varphi }_{1}^{2} & 2{\varphi }_{1}{\varphi }_{2} & {\varphi }_{2}^{2}\\ {\psi }_{1}^{2} & 2{\psi }_{1}{\psi }_{2} & {\psi }_{2}^{2}\\ {\eta }_{1}^{2} & 2{\eta }_{1}{\eta }_{2} & {\eta }_{2}^{2}\end{array}\right)\left(\begin{array}{c}{U}_{{xx}}\\ {U}_{{xy}}\\ {U}_{{yy}}\end{array}\right)=\left(\begin{array}{c}{U}_{\varphi \varphi }\\ {U}_{\psi \psi }\\ {U}_{\eta \eta }\end{array}\right)$$

(6)

We denote the above equation as $C{{\boldsymbol{\beta }}}={{\boldsymbol{y}}}$. The divergence is obtained from $-{{\rm{tr}}}H$ after estimating $H$. In practice, the problem is better solved as an optimization problem ${\min }_{{{\boldsymbol{\beta }}}}\parallel C{{\boldsymbol{\beta }}}-{{\boldsymbol{y}}}{\parallel }^{2}$ to mitigate issues in data analysis, such as those related to noise and the limited sample size. Additionally, since the number of Hessian matrix elements (number of estimated parameters) increases on the order of ${d}^{2}$, the Hessian matrix is estimated in the reduced-dimensional space.

Smoothing over graph

To enable the regularization of the estimation process between neighboring vertices, we merged the individual Hessian matrix estimation problems as a single linear equation:

$$C{{\boldsymbol{\beta }}}=\left(\begin{array}{ccc}{C}_{1} & & \\ & \ddots & \\ & & {C}_{N}\end{array}\right)\left(\begin{array}{c}{{{\boldsymbol{\beta }}}}_{1}\\ \vdots \\ {{{\boldsymbol{\beta }}}}_{N}\end{array}\right)$$

(7)

This allows us to introduce a penalty (Dirichlet energy on the graph) to the squares of the differences ${{{\boldsymbol{\beta }}}}_{i}-{{{\boldsymbol{\beta }}}}_{j}$ between adjacent vertices as:

$$P\left({{\boldsymbol{\beta }}}\right)={\left({D}_{0}{{\boldsymbol{\beta }}}\right)}^{{{\top }}}\left({D}_{0}{{\boldsymbol{\beta }}}\right)={{{\boldsymbol{\beta }}}}^{{\top }}{L}_{0}{{\boldsymbol{\beta }}}$$

(8)

Here, assume that the $p$ parameters are arranged in a certain order as ${{{\boldsymbol{\beta }}}}_{i}=\left({U}_{{xx}}^{i},{U}_{{xy}}^{i},{U}_{{yy}}^{i}\right)$, such that the differences between each parameter can be individually evaluated using the extended gradient matrix ${D}_{0}={\delta }_{0}\otimes {I}_{p}$ with the Kronecker product and ${L}_{0}={D}_{0}^{{{\top }}}{D}_{0}$.

The optimization problem with this regularization term is then solved as:

$${\min }_{{{\boldsymbol{\beta }}}}\parallel {{\boldsymbol{y}}}-C{{\boldsymbol{\beta }}}{\parallel }^{2}+\lambda P\left({{\boldsymbol{\beta }}}\right)$$

(9)

with some small constant $\lambda > 0$.

Given the large ${Np}\times {Np}$ size of the sparse matrix ${L}_{0}$, efficient solvers such as the Krylov subspace method are recommended.

Curl estimation

For differential forms in the Euclidean space ${{\mathbb{R}}}^{2}$, the relationship between ${{\rm{div}}}$ and ${{\rm{curl}}}$ is given by:

$${{\rm{div}}}\left(\star \xi \right)=\left(\star d\star \right)\left(\star \xi \right)=\left(\star d\right)\left(\star \star \right)\xi=-{{\rm{curl}}}\xi$$

(10)

where $d$ is the exterior derivative and $\star$ is the Hodge star operator. Therefore, the curl can be approximated by the divergence of the dual vector field $\star \xi$, which is obtained by rotating the velocity by $\pi /2$ in the PC1-2 plane.

Jacobian matrix estimation

According to the theory of Hodge decomposition, a vector field can be expressed as the sum of a gradient part $-\nabla U$ and a divergence-free residual part $r$:

Here, let $D$ be the operator that gives the Jacobian matrix of $f$. Since $D$ is linear and $D\nabla U$ gives the Hessian matrix $H$ of $U$, $D\left(f\right)$ has the following decomposition:

$$D\left(f\right)=D\left(-\nabla U+r\right)=-H+D\left(r\right)$$

(12)

If $r$ consists of only rotational components (i.e., there are no holes in the space), $D\left(r\right)$ can be expressed using the matrix $Q$, representing a $\pi /2$ rotation in the PC1-2 plane, as $D\left(r\right)=\frac{\theta }{2}Q$. Here, $\theta$ is the magnitude of the rotation, which is calculated from the previously derived ${{\rm{curl}}}$.

Software implementation

ddHodge was implemented in Julia, a high-performance scientific programming language. The computation time depends on the number of analyzed cells and the dimensionality of the components after the reduction process is performed. The most computationally intensive parts are computing eigenvalues and solving least-squares problems with large sparse matrices. To optimize these calculations, we employ the Krylov subspace method (KrylovKit.jl), which efficiently handles sparse matrices by leveraging repeated matrix‒vector multiplication operations. With GPU acceleration (we used NVIDIA RTX A2000 12 GB), the computation takes approximately 35 seconds for 46,124 cells with the parameter: k = 12 of k-NN graph and r = 4 of reduced dimension, and approximately 2 minutes for another dataset with 98,316 cells with the same parameter (Figs. 4–5).

Dynamical system setup

Gradient system in the 2D plane

To create a gradient system in a 2D plane, we use the Gaussian function as follows:

$$g\left(x,y; {\mu }_{x},{\mu }_{y}\right)={e}^{-\frac{1}{2}\left\{{\left(x-{\mu }_{x}\right)}^{2}+{\left(y-{\mu }_{y}\right)}^{2}\right\}}$$

(13)

We constructed a potential function with peaks and valleys as follows:

$$U\left(x,y\right)=g\left(x,y; {\mu }_{x}=-0.5,{\mu }_{y}=-0.5\right)-g\left(x,y; {\mu }_{x}=0.5,{\mu }_{y}=0.5\right)$$

(14)

Using this potential function, we simulate the differential equation ${{\boldsymbol{F}}}\left(x,y\right)=-\nabla U\left(x,y\right)$. The simulation is performed by adding noise (Gaussian noise with a standard deviation of 0.2) to the initial positions at the center of the potential peak and running random simulations for 0 to 4 s using the Runge–Kutta method. We sampled $N=100$ points and their velocities.

Toggle-switch model

The toggle-switch model is defined by the following set of differential equations:

$$\begin{array}{cc} & \frac{d{x}_{1}}{{dt}}={a}_{1}\frac{{x}_{1}^{n}}{{K}_{1}^{n}+{x}_{1}^{n}}+{b}_{1}\frac{{K}_{1}^{n}}{{K}_{1}^{n}+{x}_{2}^{n}}-{x}_{1}\\ & \frac{d{x}_{2}}{{dt}}={a}_{2}\frac{{x}_{2}^{n}}{{K}_{2}^{n}+{x}_{2}^{n}}+{b}_{2}\frac{{K}_{2}^{n}}{{K}_{2}^{n}+{x}_{1}^{n}}-{x}_{2}\\ & \end{array}$$

(15)

The parameters are set to ${K}_{1}={K}_{2}=1/2,n=4$, and ${a}_{1}={a}_{2}={b}_{1}={b}_{2}=1$. The same simulation result from Dynamo’s paper was reproduced by following the instructions at the following link: https://github.com/aristoteleo/dynamo_submission.

Lorenz system

The Lorenz system is defined by the following set of differential equations:

$$\begin{array}{cc} & \frac{{{\rm{d}}}x}{{{\rm{d}}}t}=\sigma \left(y-x\right)\\ & \frac{{{\rm{d}}}y}{{{\rm{d}}}t}=x\left(\rho -z\right)-y\\ & \frac{{{\rm{d}}}z}{{{\rm{d}}}t}={xy}-\beta z\end{array}$$

(16)

The parameters are set to $\sigma=10$, $\rho=28$, and $\beta=8/3$. The simulation is performed by initializing $\left(x,y,z\right)=\left(10,10,10\right)$ with added Gaussian noise $N\left(3,0\right)$ and sampling points after a random time $T$, which was generated from a uniform distribution $T\sim U\left(0.5,5\right)$.

Noise settings in RNA velocity simulation

We consider RNA-velocity data with scaled degradation rate $\widetilde{\gamma }=\gamma /\beta=1.0$. Such a condition emulates a scenario in which the steady states lie on the diagonal ${u}_{s}={s}_{s}$ in the spliced–unspliced plane (i.e., the ratio of spliced to unspliced counts is 1:1). We generated a proxy for the unspliced count $u=\widetilde{\gamma }s+v$ because ${ds}/{dt}\propto v=u-\widetilde{\gamma }s$, where $v$ is the (scaled) velocity. To set the signal-to-noise ratio (SNR), we fixed the signal scale at ${\sigma }_{s}=1.0$ so that the sampled region $[0,4]\times [0,4]$ in the toggle-switch simulation is nearly covered by the 2D Gaussian ${{\mathscr{N}}}((2,2),{\sigma }_{s}^{2}I)$. Then, SNR is defined by ${\sigma }_{s}/{\sigma }_{n}$ when the noise is drawn from ${{\mathscr{N}}}((0,0),{\sigma }_{s}^{2}I)$.

Proportions of gradient and cyclic component on graphs

The edge weights ${{\boldsymbol{w}}}$ are determined by velocity (see the Potential estimation section in Methods). Then, we calculate the vector ${{\boldsymbol{u}}}$ representing potential values at vertices, which allows us to determine the gradient component at edges as $-{\delta }_{0}{{\boldsymbol{u}}}$. We set the magnitude of the gradient component as $p$. Next, the cyclic (divergence-free) component ${{\boldsymbol{r}}}={{\boldsymbol{w}}}+{\delta }_{0}{{\boldsymbol{u}}}$ obtained by subtracting the gradient $-{\delta }_{0}{{\boldsymbol{u}}}$ from weight ${{\boldsymbol{w}}}$. The magnitude of the cyclic component is similarly calculated as content $q$, and the proportions of $p$ and $q$ represent the proportion of the gradient and cyclic components on the graph. These quantities are summarized as follows:

$$p=\parallel {\delta }_{0}{{\boldsymbol{u}}}{\parallel }^{2},q=\parallel {{\boldsymbol{w}}}+{\delta }_{0}{{\boldsymbol{u}}}{\parallel }^{2}.$$

(17)

The gradient and divergence-free components are orthogonal, and they satisfy ${{\langle }}{\delta }_{0}{{\boldsymbol{u}}},{{\boldsymbol{r}}}{{\rangle }}=0$. Therefore, their normalized contributions in ${{\boldsymbol{w}}}$ sum to one: $(p+q)/\parallel {{\boldsymbol{w}}}{\parallel }^{2}=1$.

scRNA-seq data analysis

Velocities were estimated using scvelo (version 0.3.2). For cell-type annotation, the scRNA-seq data of mouse embryos³⁴ were used as training data for Celltypist (version 1.6.3), and the cell-type labels were given for the VASA-seq data. The analysis using ddHodge was performed with a $k$-NN graph where $k=12$ and the reduced dimensionality was $r=4$. The detailed methods for reproducing the data analysis are available on GitHub.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

All datasets analyzed in this study are publicly available. For VASA-seq data, spliced and unspliced read counts were obtained from GEO (accession: GSE176588). The processed loom file of FUCCI data was obtained from GitHub (CellProfiling/SingleCellProteogenomics [https://github.com/CellProfiling/SingleCellProteogenomics]). For the regenerating muscle tissue dataset, the raw sequence reads were obtained from NCBI (accession: SRP241205). The processed data from the VASA-seq and muscle regeneration dataset, along with various dynamic indicators calculated by ddHodge, are deposited in Figshare (https://doi.org/10.6084/m9.figshare.28723652). Source data for all figures are available in the Figshare repository and in the GitHub repository described below. No access restrictions apply.

Code availability

ddHodge⁴⁶ is implemented as a Julia package and is available at https://github.com/kazumits/ddHodge.jl. The Jupyter notebooks used to reproduce the figures and the related processed data in this manuscript are also available at https://github.com/kazumits/ddHodge_figures. These codes were archived in Zenodo under the https://doi.org/10.5281/zenodo.17660331 and https://doi.org/10.5281/zenodo.17660342.

References

Furusawa, C. & Kaneko, K. A dynamical-systems view of stem cell biology. Science 338, 215–217 (2012).
Article ADS CAS PubMed Google Scholar
Fang, X., Kruse, K., Lu, T. & Wang, J. Nonequilibrium physics in biology. Rev. Mod. Phys. 91, 045004 (2019).
Article ADS MathSciNet CAS Google Scholar
Picelli, S. et al. Smart-seq2 for sensitive full-length transcriptome profiling in single cells. Nat. Methods 10, 1096–1098 (2013).
Article CAS PubMed Google Scholar
Hashimshony, T. et al. CEL-Seq2: sensitive highly-multiplexed single-cell RNA-Seq. Genome Biol. https://doi.org/10.1186/s13059-016-0938-8 (2016)
Sasagawa, Y. et al. Quartz-Seq2: a high-throughput single-cell RNA-sequencing method that effectively uses limited sequence reads. Genome Biol. 19, 29 (2018).
Article PubMed PubMed Central Google Scholar
Salmen, F. et al. High-throughput total RNA sequencing in single cells using VASA-seq. Nat. Biotechnol. https://doi.org/10.1038/s41587-022-01361-8 (2022).
Article PubMed PubMed Central Google Scholar
Trapnell, C. et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat. Biotechnol. https://doi.org/10.1038/nbt.2859 (2014).
Article PubMed PubMed Central Google Scholar
Haghverdi, L., Buettner, F. & Theis, F. J. Diffusion maps for high-dimensional single-cell analysis of differentiation data. Bioinformatics. https://doi.org/10.1093/bioinformatics/btv325 (2015)
Street, K. et al. Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics. BMC Genom. 19, 477 (2018).
Article Google Scholar
La Manno, G. et al. RNA velocity of single cells. Nature. https://doi.org/10.1038/s41586-018-0414-6 (2018)
Bergen, V., Lange, M., Peidli, S., Wolf, F. A. & Theis, F. J. Generalizing RNA velocity to transient cell states through dynamical modeling. Nat. Biotechnol. 38, 1408–1414 (2020).
Article CAS PubMed Google Scholar
Qiu, X. et al. Mapping transcriptomic vector fields of single cells. Cell. S0092867421015774. https://doi.org/10.1016/j.cell.2021.12.045 (2022)
Jiang, X., Lim, L. H., Yao, Y. & Ye, Y. Statistical ranking and combinatorial Hodge theory. Math. Program. https://doi.org/10.1007/s10107-010-0419-x (2011)
Lim, L.-H. Hodge Laplacians on Graphs. SIAM Rev. 62, 685–715 (2020).
Article MathSciNet Google Scholar
Lange, M. et al. CellRank for directed single-cell fate mapping. Nat. Methods 19, 159–170 (2022).
Article CAS PubMed PubMed Central Google Scholar
Marot-Lassauzaie, V. et al. Towards reliable quantification of cell state velocities. PLoS Comput Biol. 18, e1010031 (2022).
Article CAS PubMed PubMed Central Google Scholar
Zhang, Y. et al. Graph-Dynamo: Learning stochastic cellular state transition dynamics from single cell data. 2023.09.24.559170 Preprint at. https://doi.org/10.1101/2023.09.24.559170 (2023).
McInnes, L., Healy, J., Saul, N. & Großberger, L. UMAP: uniform manifold approximation and projection. J. Open Source Softw. 3, 861 (2018).
Article Google Scholar
Hu, S. et al. Epithelial-mesenchymal transition couples with cell cycle arrest at various stages. Preprint at. https://doi.org/10.1101/2025.02.24.639880 (2025).
Singer, A. & Wu, H.-T. Vector diffusion maps and the connection Laplacian. Commun. Pure Appl. Math. 65, 1067–1144 (2012).
Article MathSciNet Google Scholar
Hirani, A. N. Discrete Exterior Calculus. https://doi.org/10.7907/ZHY8-V329 (California Institute of Technology, 2003).
Desbrun, M., Hirani, A. N., Leok, M. & Marsden, J. E. Discrete Exterior Calculus. Preprint at. https://doi.org/10.48550/arXiv.math/0508341 (2005).
Ma, J., Zhao, J., Tian, J., Bai, X. & Tu, Z. Regularized vector field learning with sparse approximation for mismatch removal. Pattern Recognit. 46, 3519–3532 (2013).
Article ADS Google Scholar
Crane, K., Desbrun, M. & Schröder, P. Trivial connections on discrete surfaces. Comput. Graph. Forum 29, 1525–1533 (2010).
Article Google Scholar
Pfau, D., Higgins, I., Botev, A. & Racanière, S. Disentangling by subspace diffusion. in Advances in Neural Information Processing Systems, Vol. 33, 17403–17415 (Curran Associates, Inc., 2020).
Hansen, J. & Ghrist, R. Toward a spectral theory of cellular sheaves. J. Appl. Comput. Topol. 3, 315–358 (2019).
Article MathSciNet Google Scholar
Hansen, J. & Ghrist, R. Opinion dynamics on discourse sheaves. SIAM J. Appl. Math. 81, 2033–2060 (2021).
Article MathSciNet Google Scholar
Bodnar, C., Di Giovanni, F., Chamberlain, B. P., Liò, P. & Bronstein, M. Neural sheaf diffusion: a topological perspective on heterophily and oversmoothing in GNNs. In Advances in Neural Information Processing Systems, Vol. 35, 18527–18541 (Curran Associates, Inc., 2022).
Berwick-Evans, D., Hirani, A. N. & Schubel, M. D. Discrete Vector Bundles with Connection and the Bianchi Identity. Preprint at. https://doi.org/10.48550/arXiv.2104.10277 (2023).
Reuter, B., Fackeldey, K. & Weber, M. Generalized Markov modeling of nonreversible molecular kinetics. J. Chem. Phys. 150, 174103 (2019).
Article ADS PubMed Google Scholar
Gorin, G., Fang, M., Chari, T. & Pachter, L. RNA velocity unraveled. PLoS Comput Biol. 18, e1010492 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Mahdessian, D. et al. Spatiotemporal dissection of the cell cycle with single-cell proteogenomics. Nature 590, 649–654 (2021).
Article ADS CAS PubMed Google Scholar
Ohyama, T., Mohamed, O. A., Taketo, M. M., Dufort, D. & Groves, A. K. Wnt signals mediate a fate decision between otic placode and epidermis. Development 133, 865–875 (2006).
Article CAS PubMed Google Scholar
Qiu, C. et al. Systematic reconstruction of cellular trajectories across mouse embryogenesis. Nat. Genet 54, 328–341 (2022).
Article CAS PubMed PubMed Central Google Scholar
Qiu, C. et al. A single-cell time-lapse of mouse prenatal development from gastrula to birth. Nature 626, 1084–1093 (2024).
Article ADS CAS PubMed PubMed Central Google Scholar
Ericson, J. et al. Pax6 Controls Progenitor Cell Identity and Neuronal Fate in Response to Graded Shh Signaling. Cell 90, 169–180 (1997).
Article CAS PubMed Google Scholar
Thompson, H., Andrews, W., Parnavelas, J. G. & Erskine, L. Robo2 is required for Slit-mediated intraretinal axon guidance. Dev. Biol. 335, 418–426 (2009).
Article CAS PubMed PubMed Central Google Scholar
Henrique, D., Abranches, E., Verrier, L. & Storey, K. G. Neuromesodermal progenitors and the making of the spinal cord. Development 142, 2864–2875 (2015).
Article CAS PubMed Google Scholar
Gouti, M. et al. A Gene regulatory network balances neural and mesoderm specification during vertebrate trunk development. Dev. Cell 41, 243–261.e7 (2017).
Article CAS PubMed PubMed Central Google Scholar
Downs, K. M. The mouse allantois: new insights at the embryonic–extraembryonic interface. Philos. Trans. R. Soc. B 377, 20210251 (2022).
Article Google Scholar
De Micheli, A. J. et al. Single-cell analysis of the muscle stem cell hierarchy identifies heterotypic communication signals involved in skeletal muscle regeneration. Cell Rep. 30, 3583–3595.e5 (2020).
Article PubMed PubMed Central Google Scholar
Zhang, Z., Sun, Y., Peng, Q., Li, T. & Zhou, P. Integrating dynamical systems modeling with spatiotemporal scRNA-seq data analysis. Entropy 27, 453 (2025).
Article ADS CAS PubMed PubMed Central Google Scholar
Zhanlin, Chen., William, C. K., Aheyon, H., Mark, G. & Jing, Z. DeepVelo: Single-cell transcriptomic deep velocity field learning with neural ordinary differential equations. Sci. Adv. 8, eabq3745 (2022).
Chen, Y. et al. GraphVelo allows for accurate inference of multimodal velocities and molecular mechanisms for single cells. Nat. Commun. 16, 7831 (2025).
Dey, T. K., Hirani, A. N. & Krishnamoorthy, B. Optimal Homologous Cycles, Total Unimodularity, and Linear Programming. SIAM J. Comput. 40, 1026–1044 (2011).
Article MathSciNet Google Scholar
Maehara, K. Geometry-preserving vector field reconstruction of high-dimensional cell-state dynamics using ddHodge. Zenodo https://doi.org/10.5281/zenodo.17660331 (2025).

Download references

Acknowledgements

We thank Profs. Yusuke Imoto and Yasuaki Hiraoka at Kyoto University for insightful discussions, and Advanced Research Initiative, Research Promotion Unit, Medical Institute of Bioregulation, Kyushu University, for their support. This work was supported by JST PRESTO JPMJPR2026, JST CREST JPMJCR23N3, JPMJCR24Q1, JSPS KAKENHI JP22H04696, JP23H04288 and JP25H01484 to K.M., AMED BINDS JP22ama121017j0001, JSPS KAKENHI JP24H02323, JP23H00372, JP22H04676 and JP22K19275 to Y.O. This work was also supported in part by the MEXT Promotion of Development of a Joint Usage / Research System Project: Pan-Omics DDRIC, MRCI for High Depth Omics, CURE:JPMXP1323015486 for MIB and RIIT and Autonomous Medical Research Center in Kyushu University.

Author information

Authors and Affiliations

Department of Multi-Omics, Graduate School of Medical Sciences, Kyushu University, Fukuoka, Japan
Kazumitsu Maehara
Division of Transcriptomics, Medical Institute of Bioregulation, Kyushu University, Fukuoka, Japan
Kazumitsu Maehara & Yasuyuki Ohkawa

Authors

Kazumitsu Maehara
Yasuyuki Ohkawa

Contributions

K. M. and Y.O. conceived and interpretation of results, and K.M. built the theoretical framework of ddHodge and developed software, K.M. and Y.O. wrote the paper.

Corresponding author

Correspondence to Yasuyuki Ohkawa.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Jianhua Xing, who co-reviewed with Yan Zhang, Xiaojie Qiu and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Maehara, K., Ohkawa, Y. Geometry-preserving vector field reconstruction of high-dimensional cell-state dynamics using ddHodge. Nat Commun 16, 11342 (2025). https://doi.org/10.1038/s41467-025-67782-6

Download citation

Received: 10 April 2025
Accepted: 09 December 2025
Published: 29 December 2025
Version of record: 29 December 2025
DOI: https://doi.org/10.1038/s41467-025-67782-6