Ask HN: What is your go to performance optimization?

31 points by perfsea 2 years ago · 46 comments · 1 min read

I'm curious what backpocket optimizations you have from your past experiences which you pull out when a workload costs too much in the cloud. For me its.

* disable lower c-states

* enable adaptive coalesce in the network driver

* mount volumes with noatime

koliber 2 years ago

Reduce how often something runs and the amount of work it does when it does run. Sometimes, things are done too often, or process unnecessary data.

I once had a project where a SAP system was crawling and literally causing company-wide stoppages. We found a job that literally ran every minute of every day and it processed a table that contained a few thousand tasks. This was something that could be done once per hour, and only during business hours. Furthermore, it was re-processing thousands of records each time it ran. In reality, after a record was processed once, it could be deleted from the table.

We emptied out the table, and scheduled the job to run hourly. The whole company noticed an immediate improvement.

This pattern happens a lot. Someone builds a polling system that hits the server once a second in order to see if a task finished. A cron job runs every 5 minutes. All data is processed in a daily job, instead of doing a 24 hour cutoff.

The world is filled with computers doing useless work. Mostly, no one notices.

ozim 2 years ago

How it usually goes for me.
I ask business how often should something be updated - they say “real time” (not going into details what real time means really) - it is hard to explain processing all data all the time so everything is fresh takes forever….
After couple of months it turns out they never ever open their “super important dashboard” or do it once in 6 months.
Great after a year of bogging down everything I can clean up and make jobs run once a day instead of once a minute because now it is acceptable.
- jononor 2 years ago
  
  Why take the unspecific "real time" answers as gospel? Just propose daily initially (or weekly or monthly), listen to their protests, and see if there are good reasons why it needs to be more often. Then pick a suitable interval that fulfills their needs and that you can guarantee. If you offer pink fluffy unicorns for free, people will always pick them, without thinking.
  - ozim 2 years ago
    
    Well it is not clear from my initial post but I do propose daily or whatever else than just "every SeCoND BecAuSE we NeeD iT NOW!!!" but people that are clicking the system right away expect stuff to happen "right away" - and any amount of explanation does not work.
    Bonus points for trying to explain you want to implement "eventual consistency" - keep in mind I have added {“real time” (not going into details what real time means really)} in text to indicate that I am not some junior whining around but someone with deep understanding of computing...
NicoJuicy 2 years ago

Just an alternative related to pulling.
We had a task that checked every minute what could be executed.
Instead of doing that, we scheduled the executing code to execute on that time ( if it didn't exist yet) through service bus. Since events can be planned...
Easy peasy and avoided a micro service named "scheduler", a db and an Serverless function... ( Hoboy)

bunderbunder 2 years ago

Push less data through wires.

The memory hierarchy is so stark on modern hardware that the 30-year-old adage that "the fastest code is the code you don't run" is maybe less important than, "the fastest code is the code that doesn't spend much time talking to the memory controller."

And it's even worse once we start talking about accessing memory that's on an entirely different computer. Serialization/deserialization, IPC, network calls, and all those other things we do with reckless abandon in modern service-oriented and distributed applications are just unbelievably expensive.

Last year I took a slow heavily parallelized batch job and improved its throughput by 60% by getting rid of both scale-out and multithreading and just taking it all down to a single thread. Everyone expected it to be slower because we were using a small fraction as many CPU cores, but in truth it was faster because the time savings from having fewer memory fences, less data copying, and less network I/O was just that great. And then the performance gains kept coming because, having simplified things to that extent, I was then in a much better position to judiciously re-introduce parallelism in ways ways that weren't so wasteful.

tracker1 2 years ago

I've seen that happen a lot as JSON(XHR) and ORMs started to become more common... certain queries would return way too much data from related (auto-fetched) records and it was just slow AF on remote computers.
Another common one is just poor query performance from a database. Lack of appropriate indexes, or other relatively easy optimizations.
Similarly, finding a method of caching that's just bad (in-memory database, with sql queries instead of a dictionary). Isn't so bad for one call, very bad when a a given request (login) makes over 200 calls to this cache for configuration settings. It wasn't a problem per request but in aggregate.
spicybbq 2 years ago

> Push less data through wires.
In the systems I work on this has been a big one. In SQL people are pretty good about not writing `select *` in production code, but when querying directory servers, redis, mongodb, etc. people get sloppy. When a system is small, it's enticing to pull in lots of data and work with it in code instead of writing real queries. This doesn't scale.
- bunderbunder 2 years ago
  
  Unless you use an ORM, in which case I'm used to seeing it be all SELECT * all the time. And then you get an entire generation of engineers who've never known any other way to talk to a database going around complaining about how this Miata is so slow when really it's just that nobody ever taught them how to shift out of first gear.
pdimitar 2 years ago

Your example can really go either way. I've done tht exact opposite, with crushing success.

jawns 2 years ago

I've spent the last couple of years identifying and resolving N+1 problems in a Django codebase.

https://planetscale.com/blog/what-is-n-1-query-problem-and-h...

Aside from the performance gains, it's very satisfying to go from 1,000+ inefficient DB queries to 1-2 optimized queries.

joenot443 2 years ago

That's a well written article. For developers newer to relational databases, there's a heuristic at the beginning which I remember hearing elsewhere and keeping it in mind when I'm doing query work.
"You might expect that many small queries would be fast and one large, complex query will be slow. This is rarely the case. In practice, the opposite is true."
- bunderbunder 2 years ago
  
  It's a great heuristic.
  A big part of why it works out that way is that the query planner can only optimize for the query you give it. If you give it a bunch of small queries, it can only make relatively inconsequential micro-optimizations. The one big query gives it a lot more degrees of freedom and opportunities to make big gains.
  Here's another great resource for getting more out of relational databases: https://use-the-index-luke.com/

nevon 2 years ago

I realize you're asking about performance optimizations, but since you put it in the context of a workload's cloud bill being too large, I'll chip in and say that by far the largest impact on cost I've seen over the years is to just rightsize the infrastructure the workloads are running on. What I see more than anything is applications that are reserving 10x more CPU or memory than they're actually using. In some cases this might mean amortizing resource usage over time, by asynchronously consuming some kind of queue, in cases where the extreme reservation of resources is due to some temporary usage spike (downstream client doing some batch processing, for example).

srcreigh 2 years ago

Easy trick to making joins 50x faster: don't use Postgres and give your tables a primary key which groups related items together.

A lot of people don’t know that a database index doesn’t order the actual rows on disk. It’s just a Btree of pointers.

If you use clustered index for a table query pattern, the rows are actually ordered on disk.

Most DBs load data in 8KiB chunks. So if you query 100 rows that are 100bytes, if they’re not sorted, you actually need to load nearly 1MiB of data even tho the query result is 10KiB.

Speeds up joins and range queries 50x or more, less cache evictions, etc.

You can do this in any database except for Postgres. Postgres doesn’t have the ability to keep rows sorted on disk.

mthoms 2 years ago

Although it isn't automatic, doesn't the Postgres CLUSTER command reorder the rows on disk? Or am I misunderstanding something?
- tehlike 2 years ago
  
  It does, but it's a bit of a problem when table is large and gets new rows since by default it locks the table and is a slow operation.
  Oh and also does compression which helps quite a bit with network storages (like cloud disks)
tehlike 2 years ago

Cockroachdb or yugabyte kind of solves for sortedness by pk since it uses rocksdb variants/lsm tree underneath.

mikequinlan 2 years ago

The fastest code is code that isn't executed.

max_k 2 years ago

Related: do only what is necessary. This works on so many levels and is my magic trick to make software faster.
It's a pity computers are so fast and have so much memory that people can get away with not caring about minimalism.
antisthenes 2 years ago

Also the most secure.

riskable 2 years ago

I've seen memoization improve performance by enormous amounts. Even for simple functions that do a few simple calculations before returning a result.

Another go-to of mine is to take conditionals out of loops that really only need to be checked once. For example:

    for foo in whatever:
        for bar in foo:
            if len(foo) > some_value:
                do_something(bar)

Can become:

    for foo in whatever:
        if len(foo) > some_value:
            for bar in foo:
                do_something(bar)

This example is trivial and wouldn't gain much but imagine if `len(foo)` was a more computationally expensive function. You'd only need to call it on each iteration of foo instead of every iteration of foo * bar.

cirrus3 2 years ago

I wouldn't even really call that loop example an "optimization", it is just the obviously more efficient implementation.
- riskable 2 years ago
  
  Yet I see it constantly. That pattern of `for foo in whatever: for bar in foo` is everywhere. People don't even think about it... They just write the looping part then start thinking about the logic below. It's such a common thing I'm surprised compilers and interpreters don't just optimize it away :shrug:

mamcx 2 years ago

Eliminate complicated stacks/frameworks (dockers, npm, reacts, multi-stores, clouds, etc) and use the simplest alternative available (solid, single `exes`, htmx+tailwind, only Postgres, normal hosting, etc).

Improve the data(structure) first if possible.

bbstats 2 years ago

@njit on any for loops w/ numpy that require a recursive calculation

juujian 2 years ago

So many times I have been able to pull something out of my behind that got rid of the recursive element to vectorize stuff. Usually involves ranks and groups.

perfseaOP 2 years ago

also less logging. Seen this pop up way too many times

withinboredom 2 years ago

Log, but keep it in a buffer. On an unhandled exception or crash, emit all logs relating to the exception (such as per request/rpc).
TatersGonnaT8 2 years ago

Agreed. If you're on AWS, CloudWatch can get really expensive really quickly.

pllbnk 2 years ago

On the back-end application level programming side:

* Look for O(N^X), that is nested loops even when they are not necessarily expressed as loops on the language level.

* If possible, get rid of ORMs in favor of raw SQL. Not because ORMs are very bad but because almost nobody bothers to learn them; they often start causing issues with any non-trivial amount of load.

* Study data access patterns and figure out where and what composite indexes might help. I am saying composite indexes because I assume regular indexes are more or less always there, often even too many of them.

Especially with the last one I have achieved impressive results without any kind of impressive effort, just setting aside some time to understand the code.

PigeonHolePncpl 2 years ago

In my case, it was database optimizations that reduced the overall costs significantly

* Instead of writing big complex queries that had nested SELECT's, I split them into smaller bite-sized chunks that could be cached * Better caching strategies - reducing how many caches were flushed when a change was made * Tweaking the index's on database tables to improve WHERE clauses * Storing intermediate calculations into the database (for example, the number of posts a user has could be stored on the user table instead of counting them each time)

When I optimized the database, I could then reduce the size of the DB and the server as they no longer needed to work / wait as much

Enginerrrd 2 years ago

This is a surprisingly uncommon technique, but:

Think deeply about what you're making the computer do, and ask it to do less things by being smarter about what you ask it to do.

I'd say 95% of the time most of your OOM gains will come from the above.

withinboredom 2 years ago

- Remove locks (lock-free algorithms)

- Delete as much code as possible

kriz9 2 years ago

If the workload does not benefit from cloud then I just run it locally because hardware is usually much cheaper and much faster than cloud.

signa11 2 years ago

umm this is kinda tongue-in-cheek

```

#include <time.h>

#include <stdio.h>

int main(int argc, char argv[]) { int i = 0; time_t timep;

        /*
         * ok so now we are printing something
        **/
        printf("Greetings!\n");

        /*
         * this is a for loop from 0..9
        **/
        for(i=0; i<10; i++) {
                time(&timep);
                localtime(&timep);
        }

        printf("Godspeed, dear friend!\n");
        return 0;

}
```
now, the canonical
```
$> gcc tz-test.c -o obj/tz-test
```
now do this
```
$> unset TZ
$> strace -ff ./obj/tz-test 2>&1 | grep 'local' | wc
10 77 851
$> export TZ=:/etc/localtime
$> strace -ff ./obj/tz-test 2>&1 | grep 'local' | wc
1 5 59
```

moral, always set TZ to avoid localtime(3) from stat'ing /etc/localtime :o)

perfseaOP 2 years ago

lol never knew about this thanks. Has this actually slowed down someone's code?

drewcoo 2 years ago

Assuming you gate things like merges on test results . . .

Remove end-to-end tests. Replace with contract tests and service-level functional tests.

Much faster feedback! At the same time, better coverage. The only serious problem with the approach is that it upsets the magical thinkers in your org. Often those folks are managers.

BWStearns 2 years ago

Making sane DB indices and constraints. It's amazing how often people just don't add indices even when the access pattern is clear from the outset. "Premature optimization is the root of all evil!" ok, so when are we actually going to add that index? (Answer: never)

jiehong 2 years ago

Usually, just checking for access patterns and data structure (like not using a set where it should be).

Also, avoiding code that has pointers to many small pieces everywhere in memory and lead to a bad cache misses score.

Finally, just good old profiling.

awaythrow999 2 years ago

On 32bit it remains -fomit-frame-pointer for me. On native compiles -march=native

perfseaOP 2 years ago

I'm conflicted about this one. march native definitely. Have you seen substantial gains from omitting frame pointers?

joshka 2 years ago

Meta rules that are often ignored:

1. Establish what is good enough

2. Measure, don't guess

3. Fix the biggest bottleneck first

4. Measure after fixing

And some general things:

5. Avoid micro-benchmarks (i.e. things not at the entire system level)

6. Be careful with synthetic data

7. Know your general estimates (e.g. cache, memory, disk, network speeds)

bkgh 2 years ago

Profiling (e.g: Pyroscope) for better understanding (+load test). More performant libraries with same interface. DB optimizations (e.g: Indexing, Denormalization, Tuning, Connection Pooling)

anonymoushn 2 years ago

Use explicit huge pages.

perfseaOP 2 years ago

better than transparent?

Settings

Ask HN: What is your go to performance optimization?

Keyboard Shortcuts