The Game Engine that would not have been made without Rust

#rust #programming #gamedev

Read about the gotchas of making a multiplayer engine in Rust and how it came to be

Software Engineer, CEO
Edited: October 1, 2025

In 2018, a man had a dream: make a multiplayer game with an emphasis on cooperation. Just fresh out of engineering school, he decided to create his company and get started with the project alone.

A few years down the line, enough experience had been gathered to write about it in a formal blog post, and now here we are.

Let me tell you about the creation of a game engine from scratch, and how it might have never been born without Rust.

You can take this article just as a raw experience dump, but also as an answer to the eternal question "Why didn't you make it in Unity / Unreal / Godot / Something else /... ?"

I want to make it extremely clear that this is not as criticism of other Game Engines. I'm not saying "mine is better and yours is bad", it's more along the lines of "The existing ones didn't really fit my needs so I went my own way". Different people, different times will probably lead to different outcomes, and that's perfectly fine.

First Design Considerations

It's well known that you can't just scrap multiplayer on an existing game without some major refactoring or without major drawbacks, of if you're unlucky, both. The famous game "Don't Starve Together" is not just an extension of "Don't Starve" exactly because of this reason: the only way they could add Multiplayer was to re-do everything with multiplayer in mind. Because of that, I needed to consider what kind of network syncing strategy I would use for my game engine.

There are a lot of techniques available to synchronise multiplayer states, and none of them are perfect. Some are even hybrids between multiple ideas. A lot of articles exist online to describe all of them, like this thorough article from Glenn Fiedler, but a dozen of others exist that are also extremely good.

As a side note, if you are interested in making your own, Gaffer on Games is definitely a website to bookmark. Sure, most of it you could figure out or adapt yourself, but why reinvent the wheel when other people have done it for you?

I knew I wanted reactive gameplay with 2 players or more, like an action game or a platformer, and not be limited to turn-based games.

I also knew I wanted highly interactive gameplay, so distributed authority or parallel simulations were out. How do you imagine you sync player state if a mage sends a fireball at the same time as the enemy stuns everyone?

Then the final nail in the coffin was the requirement to be playable over long distances. It's hard to find games to play in common when you're not living in the same continent, often because games require low ping between players to be playable. COVID especially showed us that there was a high demand, remember when Valheim had 1M sales in one week in the middle of the pandemic?

So in summary:

You probably guessed it, but I had crossed off everything but rollback-based multiplayer.

I am not going to explain how rollback works, there is already a ton of content available on the matter, from this Article from the GGPO Team, to this excellent high level overview by Core-A Gaming on youtube. A good chunk of this article is going to assume you know what it is, you've been warned.

I should mention that rollback is often used in the context of fighting games in Peer 2 Peer, but this time it's a Server-authoritative base with clients using rollback on their own.

Why Rust

Then came in the stack to use. At the time GGPO (the de-facto reference when talking about rollback netcode) was not Open-Source yet, plus it was only for Peer 2 Peer games. There was no game engine that had some kind of rollback implementation either out of the box or open-source, so I knew I had to build my own.

I looked a bit a C++ and C#, to see if some kind of helpful library existed, but found nothing that really fit my needs. Since by that point I had been doing Rust personally for a few years, I decided I would start my own, in Rust. By far one of the best decisions I've made.

The Ingredients of a Rollback-based Multiplayer Engine

A lot of things are necessary to make a Rollback-based Multiplayer Engine, and a few of them would have been a lot more difficult if not used in Rust. Strap in, because there's a lot of them.

Determinism

A core part of what makes rollback possible is cross-platform determinism. Not only because players need to see the exact same thing on their screens, but also because it's going to be possible for one client to replay the same inputs on the same frames multiple times in a row because of rollback.

There were a few important considerations.

Numbers

This is a highly debated topic and thankfully Gaffer on Games has an excellent breakdown of the problem. In summary, floating point operations are almost always deterministic, but not always. Some claim you can achieve this with a few flags, some others say it's impossible if the architecture is not the same, etc.

To clarify, some have managed to make deterministic floating point based engines, rapier is a good example. However, unlike what is being said in some developer comments from the above article, they don't just enable a few flags, they use non-std functions in their examples.

If you go up the dependency tree, you'll notice it comes down to libm's functions like sin, of which the implementations are only made of basic operations.

Basically it's a mess, and while I don't doubt there is probably a way of making this work, I chose not to.

Rust has an excellent fixed-point arithmetic crate, and I preferred using this over risking having problems with floating points in the future. Is this less efficient? Probably, although I haven't checked.

Did I need to make a few lookup tables for cos, acos functions? Yes.

Are the computations not accurate? Also yes.

But it gave me several advantages:

And the drawbacks of not being perfectly accurate? It's a game, if a projectile is moving at 5.56950 instead of 5.56945, realistically no one will care as long as everyone sees the same result on their screens.

As a sidenote, I wish fixed point was more available not only in Rust in the std, but also in other languages in general. They always are a 3rd tier citizen when they shouldn't be: they have their uses just as much as floating points.

Random

This is one of the points where Rust shines. In most languages, the standard library's random is OS-based and definitely not deterministic.

If you wanted to find another random generator, most likely you would struggle to find a good implementation that fits your needs of determinism. In Rust, rand_pcg is perfect: it uses the same API as ThreadRng or OsRng, does not use floating point, is just 2 u128 words (so cloneable easily and guaranteed deterministic cross-platform), and is available just with one line in Cargo.toml.

One of the most important aspects of game simulation, and the problem was solved with just one line. That's nothing but a win in my book.

Parallel Processing (or the lack of)

Unfortunately, within the simulation itself, I had to give up on threading. Some ECS libraries have this complex system of tree execution dependency where it would figure out that if in system 1 you need component A, and system 2 in component B, you could run both in parallel. As a solo developer on a small project, that kind of hassle wasn't worth the risk, so I designed a much simpler Entity-Component library.

It looks like this roughly:

for (entity_id, entity, gravity) in world.entities.iter_single::<Gravity>() {
    if !gravity.enabled.get() {
        continue;
    }

    let collision_body = entity.get::<CollisionBody>();

    let stepping_on = collision_body
        .map(|c| c.stepping_on.get().is_some())
        .unwrap_or(false);

    // is this entity stepping on something, apply the gravity
    if !stepping_on {
        let mut tmp = entity.speed.get();
        tmp.y -= gravity.fallaccel.get() * world.steps.playtime_step / F_UPS;
        tmp.y = std::cmp::max(tmp.y, -gravity.max_fallspeed.get());
        entity.speed.set(tmp);
    }
}

Random access of entities from an EntityId is possible, and random access of any component from an Entity is possible. The trade-off is that all components need to have interior mutability, but since nothing is parallel, we can wrap most of them in Cell and RefCell. At this point, some (if not most) will think of me as a filthy, unperformant swine, but I can assure you it's not that bad. Even after writing dozens of thousands of LoC with that architecture, having one get and set every once in a while is better than not having random access of component from anywhere in your code.

Back when I started the state of ECS libraries was very poor. Nowadays, it's much better, and yet still nothing to fit my needs although hecs comes pretty close.

More details on this in another article, another day.

Network Messaging

This covers the category of "how the hell do I send data over the internet?". It's not really a breakthrough for anyone who has delved into it, but I thought I would include it just in case.

Reliable Ordered UDP

Again, I can do nothing but recommend this excellent article about reliable ordered messaging.

The base logic is simple, instead of a continuous stream like TCP, we'll base it around messages, with a beginning and an end. Because of MTU of UDP being around 1500, if you want to send messages longer than 1500 bytes (which is going to be often), you need some kind of system to fragment your message around 1500-bytes chunks, and tell the remote how to reorder them.

Let's define a Fragment struct.

pub struct Fragment<T: AsRef<[u8]>> {
    pub sequence_id: u32,
    pub fragment_id: u8,
    pub fragment_total: u8,
    pub fragment_meta: FragmentMeta,
    pub data: T
}

The idea is simple: sequence_id a message, fragment_id represents which part of the message we are sending.

No matter the order in which we receive the fragment, we store them until the fragments for a given sequence_id equals fragment_total. Then we re-assemble the data from each fragment to get the original message.

Note that fragment_total is u8, which means that a message can at most do ~380kB. More than that and we will need some kind of super fragmenting or streaming. Or you could just go back to TCP for those messages specifically.

I don't recommend setting fragment_id and fragment_total to u16. While the maximum would then be in the hundreds of MB, realistically sending thousands of UDP messages at once would probably cause some kind of congestion somewhere, even in on today's internet.

struct FragmentMeta {
    kind: FragmentKind,
    // other meta here
}

#[repr(u16)]
enum struct FragmentKind {
    Forgettable = 0,
    Key = 1,
}

FragmentMeta would be a simple struct with "meta" info, the most obvious would be the kind of packet it is:

Forgettable means that if we don't receive every fragment because of packet loss, just drop the message and forget it. Key means that we have to send Ack messages to ensure the full message arrives at some point.

But a fragment is not the only type of message we can send over UDP, so we'll need message types:

pub enum Packet<P: AsRef<[u8]>> {
    Fragment(Fragment<P>),
    Ack(u32, u8),
    Syn,
    SynAck,
    Heartbeat,
    End,
}

Ack would be to say "Yes, I have received the data of (sequence_id, fragment_id), you don't have to send it in the future". Syn and SynAck are akin to TCP to initiate and confirm connection.

Heartbeat is a necessary evil in UDP, because if you send no messages in a certain amount of time, some routers will assume the connection is over and block incoming messages. A simple message every few seconds ensures that the connection stays alive.

Each packet would then be encoded, and for instance a Fragment would look like this:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|              CRC32            |          Message Type         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|          Sequence ID          | Frag ID |FragTot|  Frag Meta  |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
:                                                               :
:                            Payload                            :
:                                                               :
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

A CRC32 check of the bytes 4 to the end is included, and in Rust there are a load of crates that allow you to do just that. Don't forget to check it when receiving it!

This is of course a very high level overview, and the reality is a bit more complex than that, but anyone with a little bit of experience can build a library like that in less than a few days, Rust is that easy to work with.

One detail I'm leaving here because I don't know where else to put it: On Windows, UDP sockets have an annoying behavior where they can return ConnectionReset, which never happens on other platforms.

This is because of Windows' Virtual UDP circuit, which you can disable to make it more consistent with other platforms. I'm putting the code here because I don't remember seeing it anywhere else online, and since I always dev on Linux the only time I came to notice it was when it was in the hands of testers. That was a fun debug session.

use std::net::UdpSocket;
use windows::Win32::Networking::WinSock::{WSAIoctl, SOCKET, SIO_UDP_CONNRESET, WSAGetLastError};
use windows::Win32::Foundation::{BOOL};

/// Disables the Virtual UDP Circuit on Windows
///
/// Returns a `WSA_ERROR` if it failed.
pub fn disable_virtual_udp_circuit(udp_socket: &UdpSocket) -> Result<(), i32> {
    let socket = udp_socket.as_raw_socket();
    let mut bytes_returned = 0;
    let enable = BOOL::from(&false);
    unsafe {
        let r = WSAIoctl(
            SOCKET(socket as usize),
            SIO_UDP_CONNRESET,
            Some(&enable as *const _ as *const core::ffi::c_void),
            std::mem::size_of_val(&enable) as u32,
            None,
            0,
            &mut bytes_returned,
            None,
            None
        );
        if r == -1 {
            let e = WSAGetLastError();
            Err(e.0)
        } else {
            Ok(())
        }
    }
}

The Godsent: serde and bincode

So now we can roughly have an API that accept bytes, transfer them over the network and get those exact same bytes in return in another computer. What can we do with that?

It will seem obvious to anyone already using Rust, but for those who don't, the combination of serde and bincode is terrific.

Let's define two structs:

#[derive(Debug, Serialize, Deserialize)]
pub enum ClientMessage {
    InGameMessage(ClientInGameMessage),
    Quit { err: Option<String> },
    Handshake { name: String }
}
#[derive(Debug, Serialize, Deserialize)]
pub enum ServerMessage {
    InGameMessage(ServerInGameMessage),
    Quit { reason: Result <(), String> },
    HandshakeResponse(Result<(), HandshakeError>),
}

Serializing and Deserializing becomes trivial:

let result: Vec<u8> = bincode::encode_to_vec(
    input_client_message,
    bincode::config::standard()
).unwrap();

// assume result is passed over the network...
let (decoded_client_message, _): (ClientMessage, usize) = bincode::decode_from_slice(
    &result,
    bincode::config::standard()
).unwrap();

This may not seem like much, but something you would either write manually or spend days figuring out the exact templates for in C++ takes only a few minutes to code in Rust.

Rollback Logic

Let's explain what the rollback logic looks like in Rust:

struct Game {
    /// world displayed on screen
    played_world: World,
    /// the last valid world where we should rollback to if
    /// the inputs we predicted are not equal to the one we
    /// received from the server
    last_valid_world: World
}

struct World {
    tick: u32,
    // rest of the data here, not important in our case ...
}

Let's imagine 2 World, one is the played world being on screen, and the other being the last valid world available we know is true. Let's assign them two colors: red and blue. This will improve clarity later, but note that blue is not always the played world and red is not always the last valid world. You'll see why shortly.

Now, consider the scenario: we have received from the server the inputs from 31 to 33, while we are playing tick 34. We have to begin the rollback process:

First, swap Played World and Last Valid World, so that our "Played" is back in time:

std::mem::swap(&mut self.last_valid_world, &mut self.played_world);

Then, use the true inputs we just received from the server:

/// advance the Played state with true inputs
fn true_advance(&mut self, true_inputs: &[WorldInput]) {
    for true_input in true_inputs {
        self.played.update(&true_input);
    }
}

Once we don't have any more true inputs, dump the Played world into Last Valid, with a clone_from:

self.last_valid_world.clone_from(&self.played_world);

Lastly, restore the played position to where we where when we started the rollback process:

/// advance the Played state with predicted inputs
fn advance_to_played(&mut self, played_tick: WorldTick, predicted_inputs: &HashMap<WorldTick, WorldInput>) {
    while self.played_world.tick < played_tick {
        let predicted_input = predicted_inputs.get(self.played_world.tick).unwrap();
        self.played_world.update(&predicted_input)
    }
}

If you compare to the first diagram, you'll notice that it's roughly the same state, except blue and red have swapped variables. With this rollback method, every time a rollback occurs, those two variables are swapped in memory. Doing this logic of swapping instead of a basic clone from last valid to played, then clone from played to the new updated last valid, lowers the amount of cloning necessary from twice to only once a frame, which can be a game-changer on very populated worlds.

Performance

Because of the diagram above, the update function of the world will not only happen once every frame, but up to max_rollback times + 1 every frame. Because of this, update time has to be as low as possible.

Other rollback engines mostly don't have a lot to work with. Since most of them are Peer 2 Peer fighting games, 2 characters with a few projectiles is not very expensive to compute, even on slow engines.

But our case is different, if we have more than 2 players, with complex interactions on the world, the update might not be as free. The answer here is pretty simple: Rust is fast to begin with. Nothing specific to do to optimize performance, performance was there to begin with.

Cloning

As per the diagram above, you might have noticed that we use clone_from to deeply clone a struct.

Rust is not the only one being able to deeply clone a struct like that, but it's definitely the easiest.

In C++ you have copy assignment but it's not guaranteed to be implemented for all classes or structs you use, especially external ones, and there is no automatic generation of code. You'll have to manually overload this operator for every single one struct in your code. But still, it's possible.

In Rust we have #[derive(Clone)], but it does not derive clone_from. That's mostly fine because the compiler optimises a lot under the hood, except when the struct has some kind of allocation used. derivative allows you to auto write the clone_from methods as well as the clone methods, use it for all your structs that have any kind of allocation in their child members. For simple structs that only contain numbers, booleans, etc, keeping Clone without implementing clone_from is perfectly fine.

That's a small difference in code, but something this small can have quite an impact. Implementing clone_from on World and most of its children resulted in a clone time reduction of roughly 30% and 40%, depending on how populated the world was.

Conclusion

Individually, these points might not seem like much, but together they form a compelling argument: a solo dev like myself, with limited time, would probably not have been able to build such an engine so quickly if it weren't for Rust.

Visual Comparison

I'd like to show what this huge wall of text results in, and especially how they handle ping.

All of these comparisons have simulated ping with netns, with heavy packet reordering (~20%) and a 0.1% packet loss.

20ms RTT

At 20ms rollback is barely noticeable, only 1~2 frame of rollback is required if you have one frame of input delay.

70ms RTT

At 70ms, you can definitely start to notice some rollback, which is about ~3 full frames, plus a little bit of input delay. That being said, while noticeable, nobody has complained about lagging at this range of ping.

140ms RTT

At 140ms, you can definitely see heavy rollback. This time it's about 6~7 frames of rollback, and it's probably close to the limit before your players start noticing and maybe complaining about the delay, but it's still playable.

Not a bad result given that ~150ms is about the RTT between the West Coast of North America, and the west coast of Europe. A real-time multiplayer game being playable at that distance is still a small win.

The result

The project that started it all in 2018 is on the back burner but still planned to be released. A new smaller one has been created this year, with exactly this engine and this same multiplayer logic, and you can check it out when it releases later in 2025.

Until then, there are probably a few more articles that are going to be released here. I hope you enjoyed the read now, and I hope you'll enjoy the read then.

Software Engineer, CEO
Edited: October 1, 2025