Settings

Theme

Ask HN: Where can I use a 40+ core machine?

9 points by sbenario 11 years ago · 14 comments · 1 min read


When installing at a new client site, we (@UforaInc) found a bug that repro's on their 60 core machine that we have not be able to reproduce on 16 (32 HT) core machines from AWS.

The bug is related to lock contention on many-core machines, and we're trying to find a machine where we reproduce the issue ourselves.

AWS, GAE, Azure, etc don't seem to have anything this size. Does anyone know where we can rent or borrow a beefy machine like this for a couple of days?

davismwfl 11 years ago

You might check with a couple of the OS/hardware vendors too. I know Microsoft used to have labs on both east and west coasts that allow you to do this type of debugging, I would imagine they still do.

Sun used to do the same thing back in the day, not sure since Oracle took over, but it would be worth a call. IBM/HP also might have a lab environment where you can test on.

Maybe call the machine vendor your client used for that machine and explain what you need, you'd be surprised how accommodating they can be. It won't be free, but it likely is far far cheaper than a 60 core machine.

Good luck!

  • sbenarioOP 11 years ago

    Good idea! I'll check on that!

    • techdragon 11 years ago

      If the software is running on RHEL, you should talk to Red Hat. I'm pretty sure that customers with a RHEL support contract can get access to application support from Red Hat to debug issues on the OS.

      I also know first hand from Red Hat employees that they have a dedicated infrastructure for testing just this kind of "weird issue only happens on $foo hardware" stuff.

      And while i can't confirm this as directly, I'm quite sure their hardware testing pool will contain some hardware that can help with debugging an issue like this.

      • sbenarioOP 11 years ago

        Indeed this was running on RHEL, though our client is a RHEL customer, and so far we aren't.

techdragon 11 years ago

You may want to have a chat to IX Systems. They do a lot of pre ship 'white glove testing' which means they may have an opportunity to let you remotely borrow one before they run their shakedown testing battery.

For instance they built this 40 core / 80 logical thread monster. http://www.ixsystems.com/whats-new/megacore-freebsd-foundati...

And sell two 64 Core machines as part of their standard product lineup.

The Mercury - (AMD Opteron) - http://www.ixsystems.com/servers/families/?family=Mercury%20...

and

The Neptune - (Intel Xeon) - http://www.ixsystems.com/servers/families/?family=Neptune%20...

JoachimSchipper 11 years ago

If you're not scared of a "weird" architecture, http://labs.runabove.com/power8/ gets you a lot of threads at a very low price.

  • sbenarioOP 11 years ago

    I can't figure out how they're "weird"... am I missing something?

    EDIT: Yes, yes I am. Power8 architecture. Probably not best for attempting to reproduce client bugs :-)

valarauca1 11 years ago

Look into used UltraSPARC hardware. T3/T4/T5 all should offer >100 cores, or start building a beowulf cluster with in house computers.

  • sbenarioOP 11 years ago

    crossing machine boundaries doesn't seem to trigger the issue. We're specifically stuck looking for machines with lots of cores on a single box.

Bluecobra 11 years ago

If you don't mind AMD, you buy find a quad Opteron 6168 server and return it. This one is $8,029 and has 48 cores:

http://www.ebay.com/itm/DELL-C6145-SERVER-QUAD-AMD-6168-1-90...

BetaCygni 11 years ago

Maybe the client will allow you to test on their 60 core machine? You know for sure the problem happens there.

iSloth 11 years ago

OVH.co.uk/OVH.us do a 40 thread server

  • sbenarioOP 11 years ago

    That may be a winner! $150/week for a server that MIGHT get us a repro is a good place to start.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection