How we built an auto-scalable Minecraft server for 1000+ players

401 points by hopfog 4 years ago · 144 comments

Reader

Jaxkr 4 years ago

Hi, I made this! Was gonna wait until it was downloadable and the documentation was more complete before posting it on HN. Thanks for sharing it though.

The Minecraft setup will be available and useable by anyone Wednesday of next week. More documentation and a roadmap will follow shortly after.

cdev_gl 4 years ago

I rent a small Linode to run a minecraft server for my extended family. Sometimes it struggles to keep up with just a half dozen of us, and when I tried swapping from a spigot based system to a forge one so the kids could have mods it was basically unplayable. I can't imagine scaling out to thousands of users, this was some amazing work.
While developing this did you come up with any best practice recommendations for individual nodes?
- Jaxkr 4 years ago
  
  Your single-core performance is the most important thing for a Minecraft server, especially for Forge mods.
  For vanilla, I’d try Paper (a high performance spigot fork) and see if you still have problems. If you’re lagging with 6 people while running Paper, you simply need a better CPU.
  - MuffinFlavored 4 years ago
    
    Tangentially related, I had to move a web app off of DigitalOcean because the CPU performance was terrible compared to what you get for $80/mo from a dedicated server from OVH. I know Linode and DigitalOcean compete. Get a OVH dedicated server, try the CPU passmark bench or whatever and compare. AWS + Linode + DigitalOcean are all VPS and pretty slow.
- e12e 4 years ago
  
  It might be too pricy, but if you go to about 25usd/month, you can get a dedicated server with an i7 from hetzner server auction. These cheap boxes don't have ecc ram etc - but should work well for things like this.
  https://www.hetzner.com/sb?country=us
  - jaflo 4 years ago
    
    On the page, one option advertised is for €23.53/month with an Intel Core i7-4770, 2 TB HDD, and 32 GB RAM. What’s the catch? How is it possible to rent such hardware for so cheap?
    
    Jweb_Guru 4 years ago
    
    There is actually a catch (besides very little customer service, which is par for the course with cloud stuff anyway), but it's not really their fault. Hetzner doesn't peer with a lot of bigger players, and therefore its datacenters can have really lousy / inconsistent bandwidth and latency for people outside a relatively small area in Europe (and can occasionally have bizarre outages that are presumably related to partitions). If that's not a dealbreaker for you, then they're one of the best deals out there if you don't want to manage your own hardware and don't need really efficient access to other cloud services like S3 (which obviously work much better within Amazon's datacenters).
    
    xuki 4 years ago
    
    No catch. Cloud services are extremely expensive when it comes to computing. If you need raw power, rent a dedicated server.
    Edit: For a little more a month, you can get a Ryzen 3600, 64GB of RAM and much faster NVMe SSD: https://www.hetzner.com/dedicated-rootserver/ax41-nvme
    
    NavinF 4 years ago
    
    Ehh you can get cheaper compute in Fremont if you're willing to rent >=1 rack. I'm guessing you only think €23.53/mo is cheap because you've been getting reamed by AWS pricing :P
    
    xuki 4 years ago
    
    Can I have a link please? I'm a sucker for cheap dedicated servers :).
    
    NavinF 4 years ago
    
    I use he.net/colocation
    To be clear, I'm taking about racking your own servers, not cloud "dedicated servers"
    
    e12e 4 years ago
    
    I'm not sure what you imply with the quotes around dedicated servers - this is hardware for rent, not VMs (hetzner has VMs too, starting from about 5usd/month - but that's something else).
    You do need to go a bit up in price for proper server hw - although there are a few aging xeon boxes with ecc ram on the low end now.
    
    e12e 4 years ago
    
    They are second hand servers - if you want brand new servers you pay a setup fee, and more pr month. Once you upgrade, or cancel, that (once new) server goes into the auction at a lower price.
    
    bluedino 4 years ago
    
    Those are ancient servers, long since paid for, you're paying for 1.50 worth of bandwidth and basically some electricity and a little bit of "support"
  - _fnqu 4 years ago
    
    If you go with Oracle Cloud, you can get an 4-core, 24gb RAM aarch64 server for free. I've been using it for minecraft and it performs excellent, especially combined with PaperMC
    
    e12e 4 years ago
    
    Note, looks like oracle cloud has a nice free tier, but that's not a dedicated arm server, but a vm (but still, with 24 cores):
    https://www.oracle.com/cloud/free/?source=:ow:o:p:nav:0916BC...
    > Infrastructure
    > 2 AMD based Compute VMs with 1/8 OCPU* and 1 GB memory each.
    > 4 Arm-based Ampere A1 cores and 24 GB of memory usable as one VM or up to 4 VMs.
    > 2 Block Volumes Storage, 200 GB total.
    > 10 GB Object Storage.
    > 10 GB Archive Storage.
    Just to compare with Hetzner - you would typically be able to get 16 or 32gb ram, an i7 with 4 cores and 2x1tb disk (no ssd). I'm guessing single core performance might be higher than the arm offering - and a better fit for minecraft.
    Ed: although maybe core count would win out with PaperMC?
  - ipaddr 4 years ago
    
    The cheapest available right now was 140 euro. When do the 25 dollar server come out?
    
    freemint 4 years ago
    
    They are in a separate category called Serverbörse [1]. The cheapest currently available is for 28€ a i7-3770 with 16GB RAM and 2x3TB spinning rust.
    [1] https://www.hetzner.com/sb
    
    grp000 4 years ago
    
    If the cheapest is $25/month, I think it would just make sense to get a used school/office PC. I was able to pick up a used Haswell i5, 8GB of memory PC for $200, which you would probably get better performance out of, and break even at ~8 month mark. That being said, pandemic prices might shift those numbers.
    
    e12e 4 years ago
    
    Nothing wrong with getting a used pc, but do you have 1gb uplink at home?
    In the cheapest offers right now, there's a Intel Core i7-4770, 2x 2 TB Ent. HDD (spinning rust, not ssd) with 32 GB ram. And a xeon box with ecc ram (same, low price).
    
    Jweb_Guru 4 years ago
    
    Also, even if you do have a 1 Gbps uplink, a lot of providers will throttle you if they suspect you're actually hosting a server that tries to utilize most of that bandwidth in anything but bursts.
    
    bluedino 4 years ago
    
    For example, 1TB monthly is really only 10mbs...common quota for hosted servers.
    
    PaywallBuster 4 years ago
    
    I can see few options for 23.53 EUR right there
    Either way, it's auction page, may not always have the deals you're looking for in this price range
    
    corobo 4 years ago
    
    When someone cancels their old server and they recycle it into the auction system
- InvaderFizz 4 years ago
  
  Assuming the Minecraft server runs on Arm64, you may want to try out Oracle Cloud free tier.
  You can spin up a 4 core 24GB instance, the CPU is 4 dedicated cores and quite fast.
  - ijlx 4 years ago
    
    It does run on arm64, I am currently running a forge server with a few mods on this exact setup. It's just for a few friends, I don't think we've had more than 5 or 6 on at a time but it seems to be performant enough.
    It's not perfect but it ended up being a better experience than we were having with minecraft realms.
- ianhawes 4 years ago
  
  What were the specs of the server and which Spigot fork did you run?
  A month ago I ran a Paper instance from a Digital Ocean instance with about 4 GB of RAM and OpenJ9 JVM and it never dipped below 20 TPS even with a larger render distance. This was on Vanilla 1.17.
  - eropple 4 years ago
    
    Have you noticed much difference between a HotSpot-driven JRE and OpenJ9?
    I am somewhat irrationally biased against J9 because they made us stick it in everything at IBM, but I'm willing to reconsider for better Minecraft performance.
    
    ianhawes 4 years ago
    
    I have yet to see hard data on it, but the folks that have been in the trenches doing client-side mods on Minecraft swear that J9 is the superior JVM for the latest versions of Minecraft (1.16+). I don't know that anyone has really done a proper benchmark.
- zimpenfish 4 years ago
  
  > a small Linode [...] it struggles to keep up with just a half dozen of us
  I run my server on a 32GB Ryzen5 PC with the world on an SSD and it often struggles with just 2 players no matter what options I tweak. Keep dreaming of the day the Java parts get the same performence as the Bedrock parts (but I know it'll likely never happen.)
  - Hextinium 4 years ago
    
    Something wierd about a vanilla minecraft server is that sometimes less RAM is better. If you are using a good enough cpu going from 4 gb allocated to 2 gb allocated massively increases performance for low numbers of users because garbage collection runs more efficiently. You can tune the garbage collection manually, but super counterintuitively this fixed most of my problems.
gravypod 4 years ago

This is very impressive work! Do you plan to make WorldQL highly available? It seems like if it died you'd loose the server.
- Jaxkr 4 years ago
  
  Thank you for your kind words!
  WorldQL uses Postgres under-the-hood to store permanent information. I was inspired by companies like TimescaleDB which build new functionality on top of Postgres’s rock-solid base.
  Any high-availability solution for Postgres will also be available for WQL.
  - gravypod 4 years ago
    
    Do the clients of WorldQL (the workers) know how to gracefully fail over when WorldQL fails?
    
    Jaxkr 4 years ago
    
    No. They fail very ungracefully. But it’s something I’m planning to implement in the future.
  - xupybd 4 years ago
    
    Do you use postgis for spatial queries?
    
    Jaxkr 4 years ago
    
    No. All spatial queries use WorldQL’s in-memory k-d tree implementation and are not used for permanent world alterations.
    For the Minecraft world catch-up example, it’s as simple as querying Postgres for block records in a certain chunk after a certain timestamp. No fancy spatial stuff happens on the Postgres side.
    
    steve-chavez 4 years ago
    
    Could yo elaborate on why you don't use PostGIS? I've always assumed its C algorithms are high-performance.
Fiahil 4 years ago

Is there a specific reason why Minecraft hasn’t been “adapted” from the original game to a robust design that could scale to larger worlds and player populations before your project ?
- forrestthewoods 4 years ago
  
  Pretty much no video game engine scales with cores. There are too many dependent systems. There's no game engine that I'm aware of that effectively uses more than a couple of cores for gameplay simulation.
  Distributed physics is a well understood problem with no "solution". Just different trade-offs. This solution is cool, but it's not novel or anything. Minecraft is particularly challenging because the entire world is highly mutable.
  There are no general solutions to any of this. Just a bunch of custom one-offs. SpatialOS is trying. My opinion on it is extremely, extremely negative. And I'll leave it at that.
  Unity and Unreal both have primarily single-threaded gameplay. Writing multi-threaded gameplay code is extraordinarily difficult. Unity has been working on their DOTS/ECS system for years, but it does not appear to be close to ready for the mainstream.
  So I think the short answer is "it's really really hard. Like, radically harder than you are imagining. Even if you think it's hard. The value of that work is likely not worth the cost. If most players WANT to play on small servers with close friends then the mountain range of work to effectively support a 1000 player server is not worth it".
  - Jweb_Guru 4 years ago
    
    Video games can scale just fine with many cores, including MMOs and games like Minecraft. Veloren, for example, uses multiple cores everywhere for just about everything--most of the gameplay loop, physics, chunk meshing, world generation, realtime simulation, rendering (can be parallelized a lot further now that we've switched to wgpu), networking, and background tasks like snapshotting persistence--and it is certainly not hard to find places where parallelization yields real speedup (quite the opposite--every time we further parallelize physics it yields significant performance improvements!). There are many hard problems in video game performance, but "games just can't make good use of multiple cores" is not one of them.
    If anything, I often wish we could rely on many more cores being available than actually are! It is certainly far far easier to see real performance wins with multicore than by using multiple servers, which introduce very heavy coordination costs.
    
    forrestthewoods 4 years ago
    
    > There are many hard problems in video game performance, but "games just can't make good use of multiple cores" is not one of them.
    That's not quite what I said.
    I'm happy you're using specs to work on an open source game. Specs and bevy and all the ECS work being done is super exciting and fun. I <3 Rust.
    In the meantime there are no major games that effectively scale to, let's say, 64 cores. I don't know of anything shipped that can saturate a 12-core/24-thread Ryzen. ECS alone will not get us to that level of scaling.
    Yes it's trivial to to throw audio, networking, and a few other subsystems onto separate threads. Modern games definitely leverage 4 cores. Although several of those cores will be severely under utilized.
    Modern ECS designs are rapidly evolving and rapidly improving our ability to better leverage multiple cores. But we're not yet to a point where games can easily and efficiently saturate 10+ cores.
    Personally I'd love to see a game like Eve Online that can effectively simulate a universe with tens of thousands of players either spread across the universe or all in one places during one giant battle.
    > It is certainly far far easier to see real performance wins with multicore than by using multiple servers, which introduce very heavy coordination costs.
    This is extremely true.
    
    Jweb_Guru 4 years ago
    
    If you're just using ECS to parallelize disjoint subsystems then no, it won't get you there. But if you're using it (as Veloren is, and as I hope more people do) in a more deliberate way, to further parallelize within a system, you can indeed scale quite well to large numbers of cores. I've done some theoretical bottleneck calculations and we will still have tons of work to do with 64 cores available, if the game is written properly. We can already get decent utilization out of 32 threads at busy times, and the server was not really close to peak load in terms of player count (and we are very far from done optimizing): https://media.discordapp.net/attachments/539518074106413056/....
    That's just for our server. Our clients can make use of cores in even more ways, although they have less work to do and generally have fewer cores available, and you can see Veloren taking advantage of 16 client threads with similar utilization here: https://twitter.com/sahajsarup/status/1431837669391142916.
    The most important thing I want to note is that in both cases, you are not seeing tremendous imbalance between the cores most of the time. While there are definitely single-threaded bottlenecks in games, you have to be working pretty hard before they start bottlenecking the workload! Instead, we are just suffering from a combination of general inefficiency and lack of work to do.
    So no, I'm going to push back against this notion that multicore scaling for games is some sort of crazy intractable problem. It's not. Like any other kind of parallel scaling, it's trivial in some places, more challenging in others, and depends a lot on your workload (including you actually having enough work to saturate the cores in the first place!). But there's nothing special about games here.
    
    forrestthewoods 4 years ago
    
    The original question was:
    > Is there a specific reason why Minecraft hasn’t been “adapted” from the original game to a robust design that could scale to larger worlds and player populations before your project ?
    The answer is because it's a lot of hard work.
    I am happy that a bunch of smart and talented people are working really hard to optimize Veloren. Good for you. I hope you help push the state of the art.
    
    Jweb_Guru 4 years ago
    
    We have plenty of contributors who have never even programmed before, and certainly have no experience with parallel programming, and the bulk of our physics parallelization (easily the trickiest part to make concurrent) has been done by people who aren't professional programmers. And I don't think I can recall a single change (proposed or implemented) by any of these contributors that resulted in a significant reduction in parallelization opportunities, nor any that found the difficulty of contributing to have been increased by the fact that much of the game is parallelized--so this is not primarily a result of benevolent guardians gatekeeping unfriendly features, or anything like that. I can also count on one hand the number of bugs we've had due to race conditions caused by parallelizing things that were previously single-threaded. I think that modern tooling, languages, and libraries have gotten good enough that correct parallel programming is no longer nearly as hard as it's reputed to be.
    Nor is scaling well on multicore the "point" of the game or even an explicit goal (though handling lots of players is)--taking advantage of multiple CPU cores is just one of many ways to improve performance, which we try to tackle on multiple fronts (including increased utilization of the GPU, explicit SIMD, smarter algorithms, allocation reduction, structure compression for improved cache locality and network utilization, etc. etc.). We haven't made any special effort to parallelize at the expense of single-threaded optimizations, and generally only do parallelization within a system, move things to the background, etc. where it is revealed as a bottleneck. And we've mostly done so by utilizing existing Rust libraries like crossbeam, specs, rayon, and wgpu, not rolling our own stuff. So again, there is nothing at all special about Veloren's design or focus here that makes it more amenable to parallelization than any other game would be, despite it being in a genre that is supposedly difficult to make scale.
    And that's the thing I'm specifically trying to push back on--the idea that multicore scaling for games is only possible if you have some dedicated cabal of programming wizards who want to push the state of the art. We live in the age of libraries, and wizards are only needed deep in the guts of the implementations of those libraries (just as they always have been, and probably will be until the end of time). A game programmer does not need to understand how a lockfree work stealing queue is implemented (or even what it is!) in order to use "parallel for" to beat the snot out of a carefully optimized single-threaded version of the same task, and it's usually far easier to do the former than the latter.
    I certainly understand why for a game like Minecraft, with a lot of legacy mechanics and mods that were never designed to be threadsafe, or a game engine like Unity, Unreal, or Roblox, that similarly have lots of plugins and customer code they would like to keep working, it would be very challenging to parallelize after the fact. And naturally, there are limits to what you can do on a single system, and your game design options become far more restricted once you're talking about 10k rather than 1k concurrent players. But for a brand new game without any legacy baggage, there's really no reason why it should scale poorly on multicore systems.
    
    imtringued 4 years ago
    
    I apologize for the late response but here is a challenge to you.
    Try to to parallelize a simplified form of applied energistics.
    Applied energistics is a mod that lets you create an item transportation network. There are storage containers and machines with an inventory (for the sake of simplicity make them hold exactly 1 item and let the machines just turn A into B, B into C, C into A). The network interacts with inventories through interfaces. A storage interface makes items in that inventory accessible to every machine. Machines receive inputs through exporter interfaces and send outputs through importer interfaces.
    It effectively is a database for items and that is exactly what makes it difficult to parallelize. The vast majority of games have 1:1 interactions between entities. In this system interactions can be as bad as n:m. That's also why it lags so badly with large networks. A hundred machines periodically scan hundreds of inventories.
    
    Jweb_Guru 4 years ago
    
    So, firstly, I'm not entirely clear on what you're asking for here. If you just need to transfer ownership in parallel between different machines, the answer is to use a channel from one machine to the other. There are very efficient channels provided in crossbeam, and we commonly use them for tasks like this. If a channel between every machine would be too costly, a hub-spoke model can be used pretty easily, with routing performed between regions.
    Similarly, fine-grained parallelism can be employed by storing each storage container behind a mutex or reader-writer lock, or even avoiding locking entirely and just using copy-on-write to update the item state when it is changed (we can either do this by executing all our state changes for each tick at once, in parallel, using Arc::make_mut, which is usually fastest, or if we need to do it asynchronously by using a crate like arcswap, which is slower). This is less efficient than a channel, but it has the advantage that the current inventory of a machine can be read without extracting the item (something you didn't specify as a requirement, but which I'm including for completeness).
    Note from what I said previously that we don't actually need to continuously scan inventories for updates at all. The obvious optimization to perform is instead to have channel writes push changes directly to a change queue (this can be parallelized or sharded with some difficulty, but from experience a single channel usually suffices). The change queue can then be read or routed (in parallel or otherwise) to the appropriate storage devices to deliver its payload. If need be (since you haven't given a lot of details), we can also track which storage interfaces are being read by players, and each tick (in parallel) iterate through any players attached to the interface to notify them of new updates to that interface. There are other crates that automatically implement the incremental updates I mentioned, such as Frank McSherry's https://github.com/TimelyDataflow/timely-dataflow, for when you have something more complex to do; however, I have never had to reach for this because (which is why I wrote this post) it's actually uncommon to have something super complicated to parallelize!
    From what I understand, this does not sound like it has nearly the complexity of a database :) The major thing that makes database performance harder to parallelize (though to be clear--they parallelize extremely well!) is not knowing what transactions are needed. In this case, though, we have perfect forward knowledge of what kinds of transactions there could be; the only things we would likely want to serialize would be attaching and detaching storage interfaces, and we can batch them up very easily on each tick due to the relatively "low" concurrent transaction count (keep in mind that some databases can process millions of transactions per second on a 16 core machine). And even if we did need to parallelize attaching and removing storage interfaces, it's not a strict requirement that we do that serially--crates like dashmap provide parallel reads, insertions, and deletions, and are basically an off-the-shelf, in-memory key-value database.
    Finally, the kind of load you're talking about (hundreds of machines and hundreds of inventories) does not sound remotely sufficient to lag the game if it's optimized well, particularly since if we did do the naive scan strategy, it parallelizes easily (to see why: each scan tick, we first parallelize all imports into storage, then parallelize all scans from storage).
    I suspect the problem here is not that the challenge you've provided is difficult to parallelize, or that it implements the functionality of a database or is M:N (by the way--something that is M:N in a hard to address way are entity-entity collisions!), but that the solution is designed in a very indirect way on top of existing Minecraft mechanics. As far as I can tell from what I've read about Redstone, it's completely possible to parallelize for most purposes to which it's put, since blocks can only update other blocks in very limited, local ways on each tick--it might even be amenable to GPU optimizations (in our own game, we would make sure that updates commuted on each tick to avoid needing to serialize merging operations on adjacent Redstone tiles). However, I could easily be misunderstanding both what you're asking for, and how Redstone works. If this is the case, please let me know!
    Even more speculatively: I think a lot of game designers, when they think about parallelizing something, think about doing it in the background, or running things concurrently at different rates. While this can be done, this is primarily useful for performing a long-running background operation without blocking the game, not for improving the game's overall performance! In fact, running in the background in this way is often slower than just running single threaded, especially if it interacts with other world state. Many game developers therefore conclude that the task can't be profitably parallelized and move on. But the best (and simplest) solutions often involve keeping a sequential algorithm, but rewriting it so that each step of the algorithm can be performed massively in parallel, as in several of the possible solutions I outlined above. This is the bulk synchronous parallel model, which is the most commonly used parallelization strategy in HPC environments, and is also the primary parallelization strategy for GPU programming. It allows mixing fine-grained state updates with partitioning to maximize your utilization of all your cores, and because you're parallelizing a single workload and partitioning by write resources, it usually has far less contention with other threads than if you were trying to parallelize many workloads at once, each hitting the same stuff. This is the model we almost always turn to to parallelize things unless it's extremely obvious that we don't want them blocking a tick (like chunk generation, for example) and it reliably gives us great speedups without making the algorithm incomprehensible.
    
    Kiro 4 years ago
    
    You're way too humble. Why are you downplaying your achievements with Veloren? It's a miracle made by geniuses and should be advertised as such.
    
    Jweb_Guru 4 years ago
    
    Not sure if you're being sarcastic, but either way, I'm not saying there aren't some very impressive developers contributing to Veloren, or some really tricky and highly optimized code. But for the parallelization part, we're really not doing anything special. Pretty much all the cleverness there is in the libraries we use, and newer programmers can parallelize stuff about as easily as the more experienced devs.
    Of course, if you wanted to say that libraries like crossbeam or rayon are "miracles made by geniuses" then I'd be more inclined to agree :) But there are similar facilities in other languages too, e.g. folly and OpenMP for C++.
  - Animats 4 years ago
    
    There are no general solutions to any of this. Just a bunch of custom one-offs. SpatialOS is trying. My opinion on it is extremely, extremely negative. And I'll leave it at that.
    I'd like to hear more about that. nagle at animats.com, if you don't want to say much in public.
    This ties into the "metaverse" business. Lots of metaverse talk about big seamless worlds full of user-created content, but not much is running. So far, nothing with user created content really scales. There are lots of little shared worlds, like Breakroom/Sinespace, Facebook Horizon, IMVU, etc. There are big-space voxel worlds, such as Dual Universe and Roblox. There are general purpose region-oriented big "seamless" worlds such as Second Life, which have trouble at the seams and can't handle crowds in one place, the same problem these Minecraft improvers hit in round 1.
    Spatial OS was going to fix all this. Their system is basically objects which can be accessed remotely and which migrate to where the most accesses are coming from. It cost over $100 million to develop, they had to do the hosting, and the first four games all went broke due to the high cost of hosting. So, since they had too much venture capital, they set up an in-house game studio and created Scavengers, which is reportedly a so-so shooter.
    Roblox has plans to solve this, somehow, by sheer money power. When you have a few billion dollars to spare, that might work.
    If we're headed for the "metaverse", this has to be cracked. Somehow.
    (I've been writing a multi-threaded Rust/Vulkan client for Second Life / Open Simulator so I'm painfully aware of these problems.)
    
    evgen 4 years ago
    
    Forgive the late follow-up, but my first startup in the valley did this back in the mid-90s (Electric Communities, everal EC founders and key hires are mentioned in the Snow Crash acknowledgements) and it is amusing to see people continue to make the same mistakes and step on the same landmines every few years. EC created an entire language (E) to try to hide the complexity and remain secure, had a PERT chart where an actual color on the chart (purple I think) was used to signify that this task would be a good PhD thesis for someone, and really only managed to get a feel for just how large this elephant was before eventually pivoting in a direction that had the possibility to lead to profit in the pre-dotCom era. At best, I think they managed to stumble in the right direction for close to a decade by collecting a large group of really smart people and somehow keeping them happy and engaged for several years and through the ability of the president of the company to get Hollywood people to keep throwing money at them year after year. If you want an interesting read for the evening you should find a copy of Chip Morningstar's 'Cyberspace Protocols Requirements' paper; it holds up incredibly well for something written almost 25 years ago when most people were still on dial-up and we thought 64MB of RAM and a Pentium Pro was too much to require on the client side...
  - 8note 4 years ago
    
    Couldn't Minecraft chunks run fairly independently, with some message passing when something needs to move between chunks?
    
    kevingadd 4 years ago
    
    Yes*, but players build complex machinery fairly often that can span chunk boundaries, so that requires synchronous coordination between the chunks for those machines to work right.
    
    devoutsalsa 4 years ago
    
    When will players learn to parallelize their simulated DOGE coin miners?!
    
    Kiro 4 years ago
    
    Imagine two players fighting on a seam between chunks.
  - imtringued 4 years ago
    
    I agree it's hard. It's only worth doing if your business depends on it.
- Qi_ 4 years ago
  
  The development priorities of Minecraft are a little strange. They have had a lot of opportunities to increase extensibility and performance (see modding API, cubic chunks, etc) but were too slow about it or focused on other things. AFAIK the long-term vision is to move the game entirely to the C++ clients which are much more performant, but a lot of the core playerbase is still on Java for modding and platform compatibility.
  - sbierwagen 4 years ago
    
    That would be fine, but there's substantial UI, graphical and gameplay (!) differences between the C++ and Java clients. Crafting recipes are different, redstone has very different connection rules... If the plan is to gently ease Java users to Bedrock, then I would have thought the first step would be to actually port the game, rather than rewrite it with many small pointless changes.
    
    rcxdude 4 years ago
    
    Yeah, it's explicitly not a goal to make the two versions the same (despite being incredibly similar), and bedrock is (un)affectionately known as 'bugrock' in the community due to its tendency to glitch out even more so than the java edition, often in extremely frustrating ways like randomly applying fatal fall damage. AFAICT there's not a huge performance win from bedrock anyway (especially compared to the various performance mods for java).
  - zimpenfish 4 years ago
    
    > The development priorities of Minecraft are a little strange [...] focused on other things
    Cynical to say that Microsoft are focused more on milking as much out of the brand and merchandising as possible rather than actually improving the core gameplay in meaningful ways.
    "Can we have more performance?" "I can do you a glow squid?" "What about shaders and whatnot?" "Axolotl?"
fire 4 years ago

Do you have any articles or information available that explain why a spacial database is useful for and helps solve problems like this?
This is super interesting and I'm excited that someone has finally made real progress on distributed game world software
3np 4 years ago

> To accomplish this, the world state needed to be stored in a central database and served to Minecraft servers as they popped in and out of existence
Could you elaborate on this point? To me "need" seems to strong; it could also be addressed via replication or a distributed model.
Centralized is absolutely the most straight-forward (and may be the most suitable) but I'd love to see some reasoning as to why it's the only appropriate approach.
tehbeard 4 years ago

Curious how you plan to handle Redstone and hostile mobs.
- Jaxkr 4 years ago
  
  Thanks for your question! Redstone is still WIP but will follow:
  1. An optimistic execution strategy (the servers can run their own redstone)
  2. A locking system allowing only one server to have redstone current in a given chunk at once.
  3. A rollback-based system to repair race conditions caused by (1)'s optimism.
  Hostile mobs are ALSO still WIP and:
  1. Are only synced if two players are near each-other on two different servers.
  2. Have their aggression entirely managed by a WorldQL script which sends messages to the appropriate servers instructing them to call LivingEntity.setTarget on the correct player.
  Was hoping nail those both down before I shared it here, so please forgive me! Thanks for your interest and stay tuned.
  - L3viathan 4 years ago
    
    Re (2): Wouldn't that cause wonky redstone behavior on chunk borders?
    
    the_gipsy 4 years ago
    
    I think it’s almost impossible not to break redstone.
    Perhaps two chunks/servers could communicate fast and precise enough for redstone.
    But Imagine the corner of a chunk receiving updates from two different servers.
inyorgroove 4 years ago

Does this solution handle PvP style play? Interesting project, keep up the great work!
- Arch-TK 4 years ago
  
  So obviously I have not actually played PvP on a worldql server but looking at the videos, even if there is some delay: playing at 20tps with some small overhead is much much better than playing at 4tps with no overhead. The PvP experience should be much better than the alternative.
mentat 4 years ago

Have you looked at something like fly.io to do physical placement near players globally while still supporting consistency?

tdeck 4 years ago

I'm always amazed by how so much Minecraft innovation is driven entirely by hobbyists. For years Minecraft was unplayable on many machines without the Optifine plugin. Even with a huge corporation behind it, Microsoft didn't build the capacity for large-scale servers, and the very large paid servers seem to use region based sharding with portals. This project is truly exciting.

c7DJTLrn 4 years ago

I liked to hack around with Minecraft in the past but I'd always end up in despair at the state of the game from a technical POV. The Java version of the game is just horribly slow and unoptimised. The game doesn't even have a real modding API, it's all reverse engineered. Seeing chunks pop in really takes away from the experience. The C++ version ("Bedrock") performs much better but is not compatible with the Java edition, doesn't have the same player count, isn't moddable, and is heavily Xbox-inated and Microsoft-ened.
bobsmooth 4 years ago

Like all good things it started small and has had decades of building atop old code.
Natfan 4 years ago

Minecraft servers aren't really great at region-based sharding.
Mojang have not yet added a "transfer packet"[0] which would allow for region-based switching.
For the most part, one would need to re-join the server under a different proxy pool located in their desired region, usually accessible via a subdomain (us.example.com, eu.example.com)
[0]: https://hypixel.net/threads/why-do-we-need-transfer-packets....

stjo 4 years ago

A couple of weeks ago I made https://playmcnow.com with the opposite idea - have many super lightweight servers for small groups of players. Instead of saying to friends “register to digital ocean, pay $5 and get a Minecraft box”, I can host thousands of worlds for them.

I made a custom minecraft proxy (similar to bungeecord but A LOT less resource intensive) that starts the real server when someone attempts to connect to it.

200 lines of go for the proxy and about as much for the http front end in django :).

ianhawes 4 years ago

You should honestly look to monetize this. The MC server community is constantly plagued with hosts that will cram as many servers onto a box as possible, but they insist on the “always on” approach which IMO is pointless. Someone did an automated survey a year ago of all public Minecraft servers and found the number of servers exceeded the number of players.
Kinrany 4 years ago

Wonder if it can be pushed further: have more servers than players, with different mods but with the ability to move items (and players) between servers as long as they're compatible.
- imtringued 4 years ago
  
  There are mods that let you create a new dimension for the purpose of mining. If dimensions could be hosted on different servers/cores by default that would be good enough for a lot of use cases.
c-fe 4 years ago

This looks very nice, would have loved this back when I played more MC with friends. I havent tried it since I am not on my gaming computer, but how would this work in practice? Will I have access to the server to e.g. download the world I'm playing on as a backup?
- stjo 4 years ago
  
  I haven’t gotten around to implementing such functionality. Although I wish I could monetise it, right now it’s just a hobby project and don’t put much work into it.
tekknik 4 years ago

How is this different than realms? I think realms is even cheaper in some cases.
https://www.minecraft.net/en-us/realms-for-java

63 4 years ago

Impressive work! There's a lot of potential here, but probably within a pretty limited market. Outside of 2b2t and maybe Hypixel, is there really anyone who wants 1000 players on the same world? Is the small market the only reason why a solution like this hasn't been created before, or are there bigger complications that aren't listed or haven't come up yet, or maybe the technology wasn't available? Surely someone like Hausemaster who stands to gain substantial amounts of money from this has looked into it before, right?

7373737373 4 years ago

> is there really anyone who wants 1000 players on the same world
Of course! One of the few games that is massively multiplayer all in the same persistent, connected world is Eve Online, and the dynamics that arise from its economy and faction warfare are fascinating!
- JCharante 4 years ago
  
  also Planetside 2, think Battlefield (the FPS) but way bigger map (around 8kmx8km I think)
  - Atemu12 4 years ago
    
    Planetside 2 holds the Guinness world record for the largest online FPS battle with 1158 players: https://www.guinnessworldrecords.com/world-records/101165-mo...
xboxnolifes 4 years ago

Tons and tons of server would love having 1000 people in 1 world. The reason that hub servers generally have multiple servers of the exact same same concept (factions, towny, pixelmon, etc) is in no small part because of limitation in lag due to population. Hell, even using this to support the same (roughly standard) 300 person limited server but not have to disable the more lag inducing problems is a plus.

lwansbrough 4 years ago

Hmm, I still think spatially allocated servers offer the best scalability. You would just have to approach it a little differently.

Server boundaries could move based on where the population is via delaunay triangulation (instead of fixed boundaries), and servers could share high-importance information with their immediate bordering neighbours. (This would be recognized as ghost data on the neighbouring servers.) You could even go further and have neighbours share ghost data with their other neighbours at a lower fidelity/frequency.

A virtual distributed actor system could potentially be used to address any potential downtime or resource waste created by unpopulated zones.

I've been playing around with some of these ideas but haven't been able to turn them into an actual implementation, so kudos to you for actually doing it!

Jaxkr 4 years ago

Thanks for your comment. Spatially allocated servers was my original implementation! There’s a link to a video demo in the blog post. You are definitely correct that spatial sharding is easier and more robust.
This new approach was primarily made because I thought it would be fun and cool :P
mbo 4 years ago

This appears to be spatially allocated system that an acquaintance of mine worked on https://github.com/PureGero/MultiPaper

jonathanlydall 4 years ago

Interesting timing.

I play with a few friends on a modded 1.7.10 server and we’ve decided to restart with 1.16 due to the lag having become untenable.

We run it on an i7 machine at my house dedicated to it, so it’s not a hardware issue.

Like clockwork the server would freeze for about 5s about every 30s. Using opus our best guess is that it’s unloading chunks and the GC is happening.

We’re going to run 1.16 now which I hope has some performance enhancements and so that we can use more modern Java 16 runtime with its nicer GC systems.

I’m also hoping that Minecraft has at least since moved chunk generation off the main thread since there was no good reason for world exploration slowing down things like it did.

I also built a JavaScript redstone simulator website and I’m curious how you will handle that and other block updates at server boundaries.

lostlogin 4 years ago

I run it on a headless Ghost Canyon Nuc with a Xeon and plenty of ram. There are only 2 of us and if one of us types a message you know it’s about to come in as the game lags so hard.
There is something not quite right. 1.7.10

LennyWhiteJr 4 years ago

I used to play on a pretty popular MMO-style server called [CivCraft](https://www.reddit.com/r/Civcraft/) that unfortunately became a victim of its own success due to the large player count. It would often have 250+ players who would build these massive redstone machines and the server TPS would grind to a halt making it unplayable.

After a few years it attempted a re-launch using an approach similar to to the first one mentioned in the article. There were multiple worlds and as you approached the border of one world it would teleport you to another adjacent world. It was cool, but was jarring and suffered from its own complexity issues.

Anyway, this project is super cool. I would have loved to see something like it 10 years ago.

the_gipsy 4 years ago

Ah yes I immediately thought of civcraft too.
> a victim of its own success due to the large player count. It would often have 250+ players who would build these massive redstone machines
I always thought the redstone restrictions resulted in too many bots.
ianhawes 4 years ago

FWIW the sharding at worldborder approach is still the preferred way to shard in CivCraft-inspired servers. IIRC the CivCraft 3 sharding happened in the middle of a continent, so you could feasibly be chasing someone and both would be switched to a separate world mid-chase.
imtringued 4 years ago

There is a dedicated game like that: https://play.eco/

z3t4 4 years ago

Being able to serve 1000+ players in a spatial "MMO" game is like the holy grail of netcode programming...

> Here's a demonstration showcasing 1000 cross-server players, this simulation is functionally identical to real player load. The server TPS never dips below 20 (perfect) and I'm running the whole thing on my laptop.

If it can run on one laptop, why does it need horizontal server scaling? :P

You don't really know where the bottle-necks are until you put 1000 actual players on the same "server".

paraph1n 4 years ago

> Being able to serve 1000+ players in a spatial "MMO" game is like the holy grail of netcode programming...
What do you mean by holy grail? Is it not something that's already accomplished by several games/MMOs?
Unless "spatial MMO" means something specific here.
- z3t4 4 years ago
  
  With spatial I mean that the players have a position in the game world, like x,y,z coordinates. Most games solve it by dividing the world into shard/instances/zones. The article mentions this and the issues you get. There are no game that I am aware of that is able to handle 1000+ connected players (1) nearby like in the "wall of copycats" in the article. Most games have a limit of around 100 players on a beefy server.
  It's also about he level of trust you are willing to give the clients, you can for example offload all logic to the clients and just have the server broadcast all messages. But then you will have a problem with cheaters that use modified clients.
  The same "holy grail" exist in database too, where you want low latency, high throughput/concurrency, and high availability. Where the solution is, just like in "MMO" games, to use "sharding".
  1) Battle of B-R5RB in Eve online had 2,670 players on the same shard according to WikiPedia. Their solution to the problem was/is to lower the game physics tick-rate.
  https://en.wikipedia.org/wiki/Battle_of_B-R5RB
Jaxkr 4 years ago

Thanks for your comment!
That demo is primarily meant to demonstrate the efficiency of the message broker and packet code as if there were 1000 players on different MC servers all forwarding their positions through WorldQL. I’ll make it more clear.
- Jweb_Guru 4 years ago
  
  I would be a bit cautious about inferring server tick rates from the performance on one machine unless you are mostly planning to scale to lots of cores rather than lots of servers, since dealing with access latency can otherwise kill many promising attempts at parallelization even when the contention itself is no big deal. It's likely you are already aware of this and have designed the benchmark to compensate, of course :)
zachrip 4 years ago

The laptop can run multiple copies of a single threaded program (and spread those on diff cores)

kaolabear 4 years ago

I had the impression that most of the server's cpu time is used for simulating the game's content, not the players themselves. Most of the available cpu time is usually spent on monster AI and monster physics.

I assume that WorldQL is also used to store monsters besides the blocks, otherwise my understanding is that players cannot interact with monsters from other servers. Is it possible to create redstone circuits on different servers that then interact with each other?

L3viathan 4 years ago

Near the end of the article:
> We're planning to introduce redstone, hostile mob, and weapon support ASAP.

mintplant 4 years ago

Just to clarify, WorldQL is hosted-only software, right? No on-premise? If I tie my game to WorldQL, do I have any options if WorldQL goes under (other than re-engineering everything)? Not to diminish the impressive work on display here - I've just been wondering this since I saw your r/gamedev post.

Jaxkr 4 years ago

It’ll be self-hostable too. I need to be clear about that on the homepage. Thanks for your question.

jayd16 4 years ago

Interesting but this is kind of a big caveat.

>redstone, hostile mob, and weapon support

Aren't these the slow things? Is it just player position and map data that's synced through a central DB? I would assume the bulk of the work isn't done yet.

the_gipsy 4 years ago

Yes but crossing servers seamlessly is a huge step forwards.

TheFreim 4 years ago

I've been running and doing minor plugin work on minecraft servers for years, getting close to a decade, and I've wanted this for so long. This would be a game changer for some modes I play.

denvaar 4 years ago

Love the font on this site. Looks like you can find it here: https://brailleinstitute.org/freefont

ninjin 4 years ago

Indeed, it sadly looks like it comes with some custom non-commercial license though [1], rather than something more standard like the SIL Open Font License [2]. Feels like an odd choice for a non-profit entity, but non-commercial clauses tend to creep in with good, idealistic intentions as it can be difficult for those new to open/free culture to realise how hairy and counterproductive they can be in the end.
[1]: https://www.brailleinstitute.org/wp-content/uploads/2020/11/...
[2]: https://en.wikipedia.org/wiki/SIL_Open_Font_License
- Jaxkr 4 years ago
  
  I don’t know if this is correct. I’m getting Atkinson Hyperlegible from Google Fonts and it seems to allow commercial use: https://github.com/googlefonts/atkinson-hyperlegible/blob/ma...
  - ninjin 4 years ago
    
    I think you are correct. What probably happened is that I read the license wrongly when it started talking about “sales” (FSF apparently has a note related to this “oddity” [1]). Having a second look at it, I now think it looks a lot like the SIL Open Font License. Thank you, I stand corrected.
    [1]: https://www.gnu.org/licenses/license-list.html#Fonts

crazy_horse 4 years ago

How did MMOs like Asheron's Call and Wow handle this?

mysterydip 4 years ago

For WoW, at a very high level, instancing and sharding. Players are spread across realms.
The zones are so large that by nature players will be spread out leading to less interactions. Raids are instanced to just your party. In areas with natural congestion (such as auction houses), things could lag at times.
While combat and movement is realtime, it's mostly waiting for timers, so the latency and bandwidth requirements are reduced compared to a first person shooter for example.
- jonathanlydall 4 years ago
  
  With WoW they would obviously run each continent as an independent “instance”, but for Wrath, they clearly had multiple servers handling Northrend. There were places where mobs would not path over which was where the two “instances” would overlap.
  WoW later introduced a kind of dynamic overlay of the same open world area from two realms where it was not very populated in the area so that players would be more likely to actually see other players.
  There was also a feature where you could party up with a friend on a different realm and you would transparently be playing on their server in the open world (with restrictions like not being able to trade with them).
  I haven’t played WoW since Mysts, so I don’t know how it’s changed since, but WoW was definitely more complicated than just sharding and instancing.
  - mysterydip 4 years ago
    
    You're correct, I was referring to vanilla WoW which is all I played. It has definitely evolved since then, thanks for the info!
- tialaramex 4 years ago
  
  Yeah, WoW's combat is effectively just a MUD, to the extent your position matters it's only very approximate even though the animations look much more precise. The Darkmoon rabbit fight made this extremely obvious, because the mob is tiny (it's just a rabbit, it's a reference to The Killer Rabbit of Caerbannog in Monty Python and the Holy Grail) but it's a public fight with potentially dozens of people trying to attack this rabbit at the same time with huge swords and casting spells and stuff, which looks completely ridiculous, but it's a joke so, it's OK that it's silly.
vblanco 4 years ago

WoW used to run update only twice a second, which gives a lot of time for logic. The WoW game logic is very simple, as the players cant really affect the world other than the enemies they are fighting at that moment. Also, the players are spread around the world and that limits the N^2 problem of players sending data to other players. Events like Gates of Anquiraj made servers break due to clumping too many players in one location. The game also used to do a lot of logic on the clients, for example collision checks would only run in clients, allowing easy flight-hacks. The server really only broadcasts player positions to other players, and manages the combat logic which is basically turn based.
In private servers using code you can look at (Mangos) they manage to run 4000+ players on a single gaming-spec machine without much trouble. Its more of a design problem compared to something like minecraft where the simulation is much more detailed. In some research projects for wow private servers people have reached 20.000 simulated players in a given machine.
ScaleneTriangle 4 years ago

If you look at the developer blogs for EVE Online, you'll have endless reading about massive scale multiplayer servers. Possibly the most technically impressive massively multiplayer experience
- Thaxll 4 years ago
  
  EvE is one of the worst architecture for an MMO, there is almost nothing to do and yet they need to slow down the simulation loop when there are too many players. I mean when you gameserver is based on Python what else can you do?
  - rcxdude 4 years ago
    
    I think despite an unfortunate underlying architecture (which has especially bitten them in terms of development speed), they've achieved an impressive density. No other game I know of supports as many other players 'on screen' as EVE. The number of players you can have in a single fight is similar to the max number of online players on the larger WoW realmns
  - ScaleneTriangle 4 years ago
    
    Is there another game that supports direct interaction between everyone in a group of as many players as EVE supports?
  - bruce343434 4 years ago
    
    They also call servers not being able to keep up and thus the gameloop slowing down "tidi" (time dilation). Like yeah, that's just what happens when a gameserver can't keep up.
    
    JCharante 4 years ago
    
    Well no, TIDI had to be added in. It's purposefully slowing down in-game ticks.
    Normally your requests would just be dropped if it had to process everything within a 1s tick. Now all reload times etc get slowed down by a huge factor depending on load.
    
    rcxdude 4 years ago
    
    Exactly. It was not something that happened automatically (not would it in almost any architecture). Instead the game would either cease to function properly or players would get disconnected or the server would crash, usually all of those in roughly that order. With TiDi at least things can happen, albeit at a frustratingly glacial pace.

ZoomerCretin 4 years ago

I’m curious to know how or if this solved any of the issues inherent with multithreading. Moving the threads to new processes can’t fix race conditions.

Jaxkr 4 years ago

It utilizes a rollback-based system to mitigate races between Minecraft servers.
I’m writing up some formal documentation on it now, I really wanted to have that done before this project was exposed to the scrutinizing Hacker News community, but I can’t control what people share! :)
Stay tuned.
- winrid 4 years ago
  
  Using rollback is surprisingly popular in games. For example, CSGO servers will rollback if a client fires a shot at a player ducking around a corner, and on the shooter's screen the shot lands, but on the other players screen they had dodged the bullet - the server will "rewind" and say the shot hit.
imtringued 4 years ago

It didn't but it's a very beautiful bandaid. So beautiful in fact people might ignore that it is hiding a scar.

xupybd 4 years ago

How many hours have gone into this?

Was this all on top of a 9 to 5?

saulrh 4 years ago

Is there any chance of this supporting modded servers? Performance is universally the limiting factor on modded minecraft; if you're not careful you can start having performance problems only thirty hours into a single-player world, and I've never seen a modded server support more than three or four players without severe lag setting in quickly.

Jaxkr 4 years ago

I’ve gotten this question a lot. At the moment it’s not a priority because it’s frankly too challenging right now.
However, once the documentation is complete, someone could hypothetically implement it using https://github.com/magmafoundation/Magma-1.16.x
tialaramex 4 years ago

I've never ended up playing modded multi-player. I mostly play skyblock type packs, and it doesn't feel like a good match to that experience.
It would be cute if Azure credits could be paying for a VM that's just running Compact Machines I never visit the inside of anyway, while the places my character actually goes are running near me. For example, once you've built it, who visits the inside of that first Compact Machine in Claustrophobia that's just a self-powering battery full of uranium, water and thermoelectric generators? You'd drown in there anyway if you visit for more than a few seconds. But it still needs simulating.

chews 4 years ago

This is nice magical software. Well done.

Jaxkr 4 years ago

Thank you! It’s very early, but I hope it’ll be useful.

longcommonname 4 years ago

There are very few servers that would really benefit from this. But perhaps the reason is that it's incredibly difficult to form large communities in minecraft.

I could see this really changing how users interact with each other.

lostlogin 4 years ago

Perhaps, but the comments here suggest a lot of players would love to try it.

funshed 4 years ago

Getting the two players in different servers to sync and see each other seems a headache. Instead can you not dynamicly change the border(s) think Gerrymandering.

binkHN 4 years ago

FWIW: WorldQL is free for your first $50k in gross revenue.

bobsmooth 4 years ago

That's really freakin' neat. Something MC players have been dreaming about forever. I hope you're successful and every server starts using it!

andrewzah 4 years ago

Really excellent work. Is this going to be open-sourced at some point?

ThatPlayer 4 years ago

I wonder how well this would run on a Raspberry Pi 4 cluster.

blackcat333 4 years ago

amazing work. i have been designing my own back end for an mmo in my free time off work. this will help me alot. much thanks for this.

Settings

How we built an auto-scalable Minecraft server for 1000+ players

Keyboard Shortcuts