Settings

Theme

ORNL’s Frontier breaks the exaflop ceiling

top500.org

37 points by davidmr 4 years ago · 26 comments

Reader

ckastner 4 years ago

This filled me with nostalgia. I remember how excited I was -- maybe even in awe -- when, in 2008, IBM's Roadrunner broke the PetaFLOPS barrier and was the first computer to do so. That's not even that long ago.

  • saltcured 4 years ago

    Ha, my memory is how tera-scale was the motivating target for all the high-performance research when I was in college. If I stop to think about it, it still seems dissonant to think that a random gamer's GPU has more power than that now.

davidmrOP 4 years ago

At the time of the last list, there were two systems in China that had recently broken this barrier, but their owners have chosen not to submit benchmarks to the Top500.

  • arcanus 4 years ago

    > there were two systems in China that had recently broken this barrier

    Allegedly broken this barrier. There is a reason that science is conducted in the open, in a reproducible and traceable manner. Those systems might not function properly at scale, or might not have run at an exaflop in double precision compute.

    Frontier is certainly the first publicly verified system to achieve Exascale on the internationally accepted standard measurement.

    • muxr 4 years ago

      There is also an article about how they did submit a score from one of these Chinese "exaflop" systems, for a different benchmark and it turns out it can only achieve the claimed performance at half precision:

      https://www.tomshardware.com/news/chinese-exascale-supercomp...

    • throwaway4good 4 years ago

      I suppose we can discuss what the source of the souring of scientific coorperation is. But it does appear that China has at least two computers that are faster than its current best performing entry on the top 500. And that basically invalidates the list.

      • 0des 4 years ago

        China has a lot of things that invalidate common knowledge but that doesn't mean it is real, reproducable, or novel. There is a reason why scrutiny exists.

thenoblesunfish 4 years ago

Anyone less familiar with the field should be aware that it’s widely known that the benchmark used for this list (multiplying two huge dense matrices) is not representative of many real use cases in scientific computing. So while it’s fun to push this one number higher, one has to ask if it’s really worth the incredible amount of money involved, when it’s something of a drag race.

  • dragontamer 4 years ago

    ORNL is not an entity who builds supercomputers just for giggles. Department of Energy has plenty of practical, and top secret (nuclear) programs to fund, and these kinds of computers are fundamental to sustaining the nuclear edge for the US Government.

    Among other scientific purposes of course. But the 'quiet part' is that a lot of this comes down to simulated nukes. (Much like how the space program was really a nuke delivery project)

    These computers remain useful for other physics simulations of course: atom to atom interactions, protein folding, weather modeling. So they also serve the scientific community.

  • JonChesterfield 4 years ago

    Fortunately the hardware can do other calculations as well

Aardwolf 4 years ago

Apparently the first petaflops computer was Roadrunner in 2008 [1]

So the supercomputer speed went 1000x from 2008-2022. But home computer speeds definitely did not go up that much, it was maybe around 10x. Does this mean there is more potential for home computers in the future?

Of course the supercomputers are massively parallel, but it's not like they got a 100x times larger building, or do they?

[1] https://en.wikipedia.org/wiki/Roadrunner_(supercomputer)

  • dagw 4 years ago

    So the supercomputer speed went 1000x from 2008-2022. But home computer speeds definitely did not go up that much

    A lot of it is simply scale. This computer has ~8 million cores compared to ~12k full cores and ~100k "processing units" on the Roadrunner.

    Secondly we have fundamentally changed how we do computation, by learning how to utilise GPUs (and GPU like architectures) better. This alone gives a far greater than 10x boost between 2008-2022

  • Retric 4 years ago

    Home GPU’s have gotten more than a 10x boost from 2008 to 2022 and current supercomputers are much more expensive than those from 2008.

    A 9800 GTX from March 2008 had 432.1 GFLOPS 32FP. A 3060 GTX from February 2021 is at a similar price point and 12.74 TFLOPS 32FP. A 3090 Ti is 40 TFLOPS 32FP or 100x the performance of the 9800 GTX.

    • Aardwolf 4 years ago

      The GPU's aren't helping compile code 100x faster or having a 100x faster boot time, so would it be fair to say the 1000x speed increase of the exaflop supercomputers is also mainly for specialized workloads like matrix multiplications, but it wouldn't be 1000x faster than Roadrunner for general purpose computation? how much faster would it be at SAT solving?

      • freemint 4 years ago

        > how much faster would it be at SAT solving?

        With modern SAT solvers or with historic (single threaded) SAT solvers?

        • Aardwolf 4 years ago

          Now that I think of it, maybe I should have given Argon2 hashing instead of SAT as an example.

          Argon2 hashing is designed to benefit as little as possible from parallel GPU computations.

          • freemint 4 years ago

            Why would you want to do that? What are you even trying to measure?

            • Aardwolf 4 years ago

              A measure of speed of running general purpose code, including compiling, operating systems, GUI's, rendering websites, running electron applications, database queries, ... to get an idea of when we can see the 1000x speedup super computers got, to daily applications

      • dragontamer 4 years ago

        > specialized workloads like matrix multiplications

        They're not so specialized in the scope of supercomputers.

        Matrix multiplications are pretty much any "simulation of reality". Be it a nuclear explosion, weather modeling, finite element analysis (aka: simulated car crashes), protein folding, chemical atom-to-atom simulations and more.

        > how much faster would it be at SAT solving?

        Dumb WalkSAT is embarrassingly parallel and probably would scale to GPUs very easily actually. I admit that I'm not in the field of SAT but there seems to be plenty of research into how to get SAT onto GPUs.

        I've personally been playing with BDDs (or really, MDDs for my particular application). Traversal of BDDs / MDDs is embarrassingly parallel, assuming your BDD / MDD is wide enough (which for any "complicated" function, it should be).

        BDD / MDD based traversal of SAT / constrain satisfaction is likely going to be a growing area of research, since the problems are closely related. I've also seen some BDD/MDD "estimate" methodologies, akin to how arc-consistency / path-consistency estimates a SAT/Constraint solver.

        In effect, if you say "BDD that underestimates the function" and a 2nd BDD that "overestimates the function", you can use BDDs / MDDs to have a lower-bound and upper-bound on some estimates. And those structures can scale from kilobytes to gigabytes, depending on how much or little "estimation" you want.

        Does such a thing exist yet? I don't think so. The papers I've read on this subject are from 2018, and they weren't from parallel programmers who know much about high-speed GPU programming. But I definitely believe that there's some synergy here if researchers combine the two fields. You use the GPU to embarassingly parallel calculate the BDD every step to guide a sequential-algorithm search over the SAT/Constraint Satisfaction problems.

        Using BDDs (instead of arc-consistency) as your "heuristic" of search is still a relatively unexplored field, but is "very obviously" a way to utilize a GPU if you know how they work.

        -----------

        Speaking of embarrassingly parallel, I'm pretty sure arc-consistency is also an embarrassingly parallel problem, but arc-consistency is "too inflexible", leading to "too small" (not efficiently utilizing the GBs or TBs of RAM we have today).

        Extending the arc-consistency to path-consistency or K-consistency increases the size of the data-structure (and therefore the parallelism and work involved), but not very smoothly.

        Instead, the estimated BDD/MDD methodology seems like a magical methodology that scales to different memory sizes better. That is, relaxed BDDs or restricted BDDs for lower-bounds and upper-bounds estimations.

SMAAART 4 years ago

https://archive.ph/EEw1x#selection-1515.0-1518.0

ketanmaheshwari 4 years ago

More info about Frontier here: https://www.olcf.ornl.gov/frontier/

maliker 4 years ago

21 Megawatts of power to run at full capacity. Wow! About the same power consumption as 16,000 homes. I’m not saying it’s a bad use of power, just that it’s impressive.

JonChesterfield 4 years ago

top500.org is down. That seems unnecessary.

Was going to say that it's always great to see GPU machines performing well. Looking forward to seeing how far off theoretical peak the benchmark hit.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection