Press enter or click to view image in full size
I Thought GPUs Were the Future. Then Fortran Happened.
It started as a joke.
We were benchmarking our matrix solver — the one optimized with CUDA, cuBLAS, and enough parallelism to make your GPU whine audibly.
A colleague, half-joking, said:
“You know, my old professor had this Fortran code that did the same thing. Probably slower, but might be fun to compare.”
Fun. Right.
I ran it anyway — mostly to laugh at it.
Except… it wasn’t funny.
The Fortran version ran 1.4× faster than our hand-tuned GPU kernel on an NVIDIA A100.
And honestly? I didn’t see it coming.
The Ancient Loop That Refused to Die
Here’s the kind of code we’re talking about — straight out of a dusty 1978 research archive:
SUBROUTINE MATMUL(A, B, C, N)
REAL*8 A(N,N), B(N,N), C(N,N)
DO 20 I = 1, N
DO 10 J = 1, N
C(I,J) = 0.0
DO 10 K = 1, N…