Settings

Theme

Building a Zero-Dependency secp256k1 CUDA Engine from Scratch (2.5B ops/SEC)

github.com

2 points by shrecshrec 3 months ago · 3 comments

Reader

shrecshrecOP 3 months ago

I implemented a full secp256k1 engine from scratch in C++ and CUDA with zero external dependencies (no GMP, no OpenSSL).

The goal was to explore performance limits of:

Jacobian mixed-add

Batch inversion using Montgomery’s trick

Large-scale scalar stepping

GPU memory coalescing strategies

On RTX 5060 I’m getting ~2.5B mixed-add operations/sec.

Key design decisions:

Little-endian limb layout for hardware efficiency

Big-endian only for visualization

Deterministic memory layout

No dynamic allocation in hot paths

Would love feedback from people working on ECC or GPU math.

shrecshrecOP 3 months ago

I will be glad to hire any suggestions from everyone abut future improvements and ideas.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection