Settings

Theme

Converting an Integer to a Decimal String in Under Two Nanoseconds

onlinelibrary.wiley.com

98 points by mpweiher 5 days ago · 55 comments

Reader

Nokinside 10 hours ago

Sounds familiar. If one of the authors Lemire? Of course.

SIMD-accelerated integer-to-string conversion https://lemire.me/blog/2026/05/18/simd-accelerated-integer-t...

Other speedy things:

On-Demand JSON: A Better Way to Parse Documents? https://lemire.me/en/publication/arxiv231217149/

Parsing Millions of URLs per Second https://lemire.me/en/publication/arxiv231110533/

Transcoding Unicode Characters with AVX-512 Instructions https://lemire.me/en/publication/arxiv221205098/

xlii 12 hours ago

I wonder if this can be categorized as galactic algorithm. I can't imagine systems where bulk of processing goes into integer to decimal string conversion but maybe there are such.

https://en.wikipedia.org/wiki/Galactic_algorithm

  • oersted 11 hours ago

    My understanding of a Galactic Algorithm is that it has better performance scaling based on input size/complexity, but its overhead is such that it will not actually be faster unless you use it for impracticality large inputs.

    I don’t think it has much to do with the case of an algorithm that offers a faster solution to a problem that is rarely a bottleneck (not sure if that’s true in this case anyway).

  • Tuna-Fish 11 hours ago

    It takes a substantial amount of time when emitting lots of numbers in JSON, happens very commonly.

    And this algorithm has low constant costs, and does not take dramatically more icache than the simple versions. There is no reason not to use this if your compile target can handle avx-512.

  • jmalicki 6 hours ago

    I don't know about this specifically, but I've seen a lot of big data jobs where 99% of the CPU was spent in JSON ser/deser. This might be a reasonable chunk of it.

    • aardvark179 an hour ago

      JSON ser deser is usually dominated by floats rather than ints, and they are more expensive to handle.

  • Aardwolf 4 hours ago

    > An example of a galactic algorithm is the fastest known way to multiply two numbers,[4] which is based on a 1729-dimensional Fourier transform.[5]

    That number looked familiar, and yep it's the taxicab number. Coincidence? Neither of the two references seems to mention it

  • jcelerier 2 hours ago

    export to large files that represent numbers in textual format for instance ? this can be the difference between "waiting 10 seconds when hitting ctrl-s" and "the software saving automatically on each change because it's unnoticeable"

  • adrian_b 11 hours ago

    I always use binary interchange formats between programs so I am not familiar with the overhead caused by format conversions. Even when displaying numbers for reading them, in the case of floating-point numbers that are displayed in the "scientific" format, i.e. with exponents, I prefer to have only the exponent as a decimal number, but the significand as a hexadecimal number. So I do not need fast algorithms for number conversions.

    Nonetheless, there are plenty of people who advocate the use of JSON, XML and similar formats, in which case I assume that number conversions can take a non-negligible time, which might be decreased by such fast algorithms.

    • superjan 8 hours ago

      You know, if can change code without overhead to ends of the pipeline, using the language & library of my choice, I’d do this too. For many of us this isn’t always the case.

  • po1nt 5 hours ago

    We used it for payment processing. We got huge CSVs from various APIs and used string decimals for computing to avoid overflows/underflows and rounding errors.

  • superjan 8 hours ago

    It’s faster for 3 digits and more. 3 digits is not galactic scale. Otoh, if over half of your numbers are single digits, it will lose to other implementations. I think that is more often the case that we’d like it to be.

  • Galanwe 5 hours ago

    The reverse, string to integer, has huge applications in quant finance.

childintime 9 hours ago

What will be the lifetime of AVX512? There have been many similar extensions before it. So it's a great result, but heavily marked by the target platform. I have the hope that RISC-V vector extensions will prove to be the more durable substrate to develop on, and a result there would be much more relevant for the future.

  • adrian_b 8 hours ago

    AVX-512, originally called the "Larrabee New Instructions" has been the only decent vector extension of the Intel-AMD ISA, which has been coherently planned instead of being a heap of more or less randomly chosen instructions, each being thought to be useful to accelerate some particular benchmark or a certain workload of one of the big customers.

    MMX (Pentium MMX, 1997) sucked badly (because in designing it ease of implementation was prioritized over usefulness), SSE (Pentium III, 1999) was much worse than the simultaneously launched Motorola AltiVec, and AVX (Sandy Bridge, 2011) was much worse than the simultaneously developed Larrabee New Instructions (despite the fact that Sandy Bridge was developed by the A-team, while Larrabee was developed by the C- or D-team, which however had hired competent consultants from outside Intel, experienced in programming games and graphic applications).

    AVX-512 is for now better than any competitive vector ISA, both in the achievable energy efficiency and in the achievable performance. Obviously, it is possible that some future Aarch64 (Arm) or even RISC-V CPUs will change this, by implementing wider registers and execution units and by adding any missing operations.

    The SME ISA extension (Scalable Matrix Extension), which is available in the latest Apple CPUs and in the current 2026 generation of Arm C1 CPUs, has the potential to be more efficient than AVX-512, exploiting the fact that the current Intel AMX ISA is intended only for ML/AI and not also for general-purpose computing. Nonetheless this may happen only in a rather distant future, because neither Apple nor Qualcomm nor Arm seem interested to make products suitable for the needs of technical and scientific computing, like Intel and AMD. Because of that, in the existing CPUs with SME the ratio between SME execution units and the general-purpose CPU cores is low, resulting in a low total throughput.

    • vardump 2 hours ago

      MMX was what they could do that time without adding a lot of new registers. It still had its uses. 3DNow! made MMX semi decent on AMD CPUs. Of course SSE was superior, but early SIMD was all about compromises.

    • fweimer 7 hours ago

      SME (like AMX) are easier in this regard because there is a clear expectation that they are used in dedicated code blocks only, so run-time dispatch becomes feasible. In contrast, with auto-vectorization, general-purposes vector ISAs such as AVX-512 and SVE tend to get used all over the place.

  • fweimer 7 hours ago

    Wide (especially unconditional) use of AVX-512 faces two main issues today: There's no public commitment from Intel to phase out CPUs that don't support it. And some emulation-adjacent tools (the prime example is valgrind) do not support it.

    The latter could at least be solved with some community effort, although the relevant set of instructions is quite large. It's also not specific to AVX-512. Any comparable vector ISA faces the same challenge.

    • paulf38 6 hours ago

      "some community effort" is a huge understatement. Let me rephrase that for you: "Possibly the largest ever single contribution to Valgrind".

      Initial work on this was started by an engineer at Intel. She was based in St Petersburg so that work stalled in 2022. Here is the bugzilla item https://bugs.kde.org/show_bug.cgi?id=383010. The other big issue is that we don't have enough people working on Valgrind that are experts with the virtual CPU. There are a couple of guys working on s390 and a little bit of work is being done reusing amd64 sse4 support on x86. I dabble a little bit on arm64,

      If there are any AVX512 experts that would like to help with this it would be most welcome.

      • fweimer an hour ago

        I didn't intend to make a statement about the programming effort required. I wanted to contrast it with corporate politics at CPU vendors, from which it is largely decoupled. Given the size of the task, it needs corporate funding, just not from x86 vendors. For example, we're fairly strongly incentivized to make valgrind support for any potential future x86-64-v4 transition because our development community really expects valgrind support as part of the core toolchain.

    • Dylan16807 6 hours ago

      > There's no public commitment from Intel to phase out CPUs that don't support it.

      They dropped the idea of having AVX10 variants that don't support the full thing, and as of Nova Lake even the E cores will have it. Is there a significant risk it doesn't get into all products starting soon?

      • fweimer an hour ago

        Historically, the edge business unit did a bit of their own thing with their CPUs. I believe the transition is finally happening once we see AVX10 CPUs over there as well. Until then I'm somewhat skeptical. (To be clear, I have no insight into their roadmaps, precisely because it's so separate.)

  • simonask 9 hours ago

    It will be literal decades before RISC-V becomes mainstream. Not because it’s not a perfectly fine ISA, but because business incentive structures aren’t nowhere near supporting it.

    Literal man-millennia have been poured into writing software for both x86 and ARM, and nobody seems close to designing a competitive RISC-V chip.

  • jcelerier 2 hours ago

    next intel cpus will have AVX 10.2 & APX

  • camel-cdr 7 hours ago

    Porting this optimization to RISC-V Vector is pretty trivial.

amelius 7 hours ago

Decimal strings are for human consumption, I suppose. Not sure if the nanosecond timescale is relevant then (unless you send these numbers to billions of people which is unlikely). Sounds like a pointless exercise, or maybe they should have picked a better example.

  • bee_rider 3 hours ago

    I do sort of agree that, at some point, there arrives a question of “are we sure we need to convert the ints to strings?” But it also serves as a convenient excuse to write fast AVX-512 code (practice and show off tricks if nothing else), the objective is immediately obvious (no need for intense numerical proofs). I like it.

  • pipe2devnull 7 hours ago

    Wow that’s a bit harsh. You can’t think of any examples where you may need to send a string to something but want to do it quickly?

    • amelius 7 hours ago

      Yeah maybe a bit harsh. But the point is if you are looking for that kind of performance, I don't see why you wouldn't send binary data and then unpack it as needed.

      • DarkUranium 4 hours ago

        Because maybe the choice of serialization format isn't under your control?

      • appreciatorBus 6 hours ago

        If you control both ends of the pipe, then sure. But for better or worse, large chunks of infrastructure expect to send or receive JSON.

        • __s 5 hours ago

          csv/tsv are alive & well for interorg data pipelines

        • amelius 6 hours ago

          But JSON uses floats, not integers.

          • Shish2k 2 hours ago

            javascript uses floats as its own default numeric data type; but other languages do have integers, and might want to convert those integers into a JSON (string) representation

          • bee_rider 3 hours ago

            Does it really? I assumed it used strings.

  • whiatp 3 hours ago

    Many large distributed systems are built around pushing data through web requests, and human readable request/response formats (JSON, XML) are the most popular, and require integer to string conversions for serialization.

  • teo_zero 5 hours ago

    You assume that human consumption means immediate human consumption.

    By this metric, there's no need for video encoders faster than the movie's own FPS, either.

  • analog31 6 hours ago

    I can imagine a time in the near future when the conversion of integers to decimal strings becomes a limit to the rate at which AI can generate text that's not worth reading.

Cold_Miserable 11 hours ago

This is just a worse copy of the original ifma method. Sneller is even better for max throughput.

IshKebab 12 hours ago

Very impressive! But yeah AVX-512 is an awkward requirement.

  • adrian_b 11 hours ago

    There already exists a large installed base of AMD Zen 4 and Zen 5 CPUs.

    Next year, these AVX-512 supporting CPUs will be joined by AMD Zen 6 and Intel Nova Lake. Starting with Intel Nova Lake, all future Intel CPUs will support AVX-512.

    • matja 8 hours ago

      The problem is AVX-512 was disabled in later Intel Alder Lake CPUs, and later generation Intel desktop CPUs, so very few Intel desktop CPUs have AVX-512 now. Ironic that AMD has better support/performance for an ISA extension that Intel invented.

    • sgerenser 8 hours ago

      I don’t think that’s correct, Intel is transitioning to AVX10, which is essentially the instruction set of AVX-512 but without mandating 512 but vector width. Future E cores, afaik, will still only be capable of 256 bit vector ops. EDIT: ok maybe not, it sounds like that was the plan a year or so ago but newer articles are saying future E cores will actually support 512b.

      • adrian_b 7 hours ago

        No, that has changed.

        About a half of year ago Intel has announced that they will mandate the 512-bit vector width and the full AVX-512 a.k.a. AVX10 support in all future CPUs, starting with Nova Lake. This includes all E-cores that will succeed the current Skymont/Darkmont cores, which have been the roadblock to general AVX-512 adoption.

        Obviously, they were forced to do this to align with AMD. Moreover, Intel has announced that they will coordinate the future ISA extensions with AMD and with the major customers, so that all future Intel and AMD CPUs will remain mostly ISA compatible, at least for the user applications.

        Not long ago, there has been published a joint AMD-Intel whitepaper about the future "AI Compute Extensions for x86", which will be present in future AMD and Intel CPUs for accelerating AI inference, extending the AVX-512 ISA, and which are similar to the Advanced Matrix Extensions currently supported by some of the Intel server CPUs, but the new ISA extensions are better compatible with AVX-512.

        This document demonstrates that at least for now Intel and AMD have understood that implementing a compatible ISA is their greatest moat against Arm and other competitors, so they should better coordinate their extensions instead of trying to pull in different directions.

    • fweimer 7 hours ago

      Do you have a public reference for the “all future Intel CPUs” aspect? The AVX10 change (no more 256-bit-only EVEX tier) is well-documented in compiler patches and whatnot, but what I haven't seen so far is an unambiguous commitment that starting with 2027 (say), all new CPU models will support AVX10.

      For example, Intel stated this:

      > Intel® Advanced Vector Extensions 10 (Intel® AVX10) introduces a modern vector Instruction Set Architecture (ISA) that will be supported across future Intel® processors.

      They don't actually say “all”, and it is probably meant to apply to future microarchitectures anyway. Depending on various factors, Intel may end up designing new CPUs based on existing microarchitectures well into the 2030s.

      • adrian_b 6 hours ago

        Intel® Advanced Vector Extensions 10.2 Architecture Specification, Revision 5.0

        Page 18:

        > 3.1 INTEL® AVX10 INTRODUCTION

        > ...

        > This ISA will be supported on all future processors, including Performance cores (P-cores) and Efficient cores (E-cores).

        As you see, now they actually say "all future".

        The Intel Nova Lake desktop and laptop CPUs and the Diamond Rapids server CPUs will mark a jump in the Intel ISA, by changing the CPUID CPU family number for the first time after a few decades and by introducing not only AVX-512 across all cores, to match AMD, but also the APX ISA extension, which adds features that remove some of the advantages of Arm Aarch64, by increasing to 32 the number of general-purpose registers and by adding double-register load/store instructions.

        • fweimer an hour ago

          It's confusing because this statement predates the release of Panther Lake and Amston Lake. Neither support AVX10.

          I'm excited about Nova Lake as well. Maybe not so much for the EGPRs (maybe we should have made 4 of the new registers callee-saved?), but there are other goodies as well.

    • IshKebab 9 hours ago

      Sure, it's not just the support though. As I understand it it also has serious power and frequency implications. Also if your process uses AVX-512 you suddenly have an extra 2kB of data to save/restore on context switches. Maybe not super significant but I really doubt this will ever make it into standard libraries.

      • adrian_b 7 hours ago

        AVX-512 increases energy efficiency and performance unconditionally, in all AMD Zen 4 and later CPUs and in all Intel Ice Lake and later CPUs.

        Only in the now obsolete Intel server and workstation CPUs from the Skylake Server, Cascade Lake and Cooper Lake families, using AVX-512 was a win when a great number of vector instructions were executed, but it was a loss when only a small number of vector instructions were executed.

        This was caused by a really stupid Intel voltage and frequency controller, which could not react quickly enough to changes in power consumption. Because of this, it dropped preemptively the clock frequency, before there was any need for this, for fear that the CPU will overheat if the power consumption will increase in the future and the voltage/frequency controller will not be able to decrease it fast enough, before reaching the temperature limit. For the same reason, the clock frequency was kept low a very long time after the last vector instruction seen.

        Now this is pretty much history, because few have kept those 7 to 10 years old inefficient servers and those who have kept them normally run workloads that either use a lot of vector instructions or they do not use such instructions.

        Nowadays the only reason for not using AVX-512 is when the software is intended to be widely used and one does not want to distribute 2 versions, one with AVX-512 for AMD or P-core Intel CPUs and one with AVX for E-core or hybrid Intel CPUs or very old AMD CPUs.

jqpabc123 5 days ago

Our design exploits the AVX-512 instruction set

AVX-512 is being discontinued in newer Intel consumer CPUs, particularly with the Alder Lake series, where it has been completely disabled through BIOS updates.

  • adrian_b 11 hours ago

    Your comment is obsolete.

    AVX-512 had been discontinued in the CPU generations from Alder Lake until the Panther Lake, Wildcat Lake and Clearwater Forest CPUs introduced during the first half of 2026, but Intel has committed than all future Intel CPUs will implement the complete 512-bit variant of the AVX-512 a.k.a. AVX10 ISA, starting with the Nova Lake desktop and laptop CPUs, to be launched by the end of this year.

    Obviously, the competition from the AMD Zen 4, Zen 5 and Zen 6 CPUs, all of which implement AVX-512 and easily beat any Intel CPU in any workload that has been updated to use the AVX-512 ISA, has forced Intel to reconsider its previous decision.

  • anematode 12 hours ago

    To the contrary, Nova Lake, coming out this year, will have it.

  • yvdriess 11 hours ago

    And that's a shame, but the relevant workloads typically run on server class CPUs.

    • adrian_b 11 hours ago

      From all the workloads that I execute on my laptops or desktops, there is only one where the speed matters yet it is not significantly affected by the use of the AVX-512 ISA: the compilation of big software projects.

      All the other things that I do and which can take a noticeable CPU time (i.e. not time used for waiting on SSDs or other peripherals) can be accelerated by AVX-512. This includes things like computing file hashes, data compression and encryption algorithms, graphics/audio/video algorithms and also EDA/CAD applications.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection