Try three: Python + Rust

Published on May 11, 2024, 9:55 p.m.
Last changed on Nov. 7, 2024, 10:55 a.m.

A new language

Let's start with a few fundamental statements:

I am not a Rust expert, just a guy who wanted to try it out
The experiment has benefited from enormous progress thanks to Rust
At the time of this writing, I am very unlikely to switch from Rust to anything else

Let's delve somewhat into the details of the above, and more.

Why Rust?

Naturally, having spent some time on the C version, I did ponder a while about C++. In a nutshell, I needed something that:

Was compiled and as a consequence, (usually **cough**) faster than Python
Was at least partially compatible with my "desire" for object-oriented design
Was well-documented
Enjoyed an active & friendly community and as a consequence, benefited from a rich ecosystem (modules/packages and all that)

It wasn't long before I read good things about Rust. I remember spending a very long train journey reading the official documentation and thinking: "Ok, this is good - and if all their so-called crates are documented with such quality, this is going to be a joy to use".

Now even though the Rust base documentation is absolutely great, especially when starting out, I found that many crates' documentation is lacking in examples, explanations, etc. Maybe it's a just me, though :-)

Anyway, Rust it was.

The porting and Rust expertise

I am a Rust "noobie". No illusion about it:

90% of the code is struct, impl and fn
traits are hardly used because objects are quite specialised (maybe that's the wrong reason!)

And the main culprit: I see Rust as a means to an end, a (very) useful tool to have the experiment: 1) do what I want 2) with an acceptable speed. Now if the Rust implementation leads me to become better at the language, great (and I'm sure it will). But unless confronted with new technical problems and I have to learn and implement new things, I pretty much stick to "cookie cutter solutions"

Structure

To summarise: the experiment's Rust code is very basic: crates, sub crates, structs and functions. That's is. Nothing fancy, as dictated by the objects very linear generation and lifetimes.

Something nice?

A bit of context: for obvious reasons, the experiment tries to be as realistic as performance allows, when it runs to simulate stars, planets, moons, etc. One of the consequences of that is the "most granular" (at this time) entities in the code are quite low-level:

Elements: from Hydrogen to Oxygen to Cerium and Dysprosium.
Compounds: from Oxygen and Methane to Olivine and Argentite

Both these entity types manifest as strucs (naturally)… and there are quite many of them:

Elements: 98
Compounds: 308 and counting

The natural consequence of this is huge files; for example the compound.rs file contains around 13,000 lines. And I hate long files (who doesn't?): they are hard to read, a pain to maintain, error-prone and generally frustrating to update. But in my case, they were necessary.

So I became a heretic and did this: generated Rust code using Python.

I know, but this was the situation and the requirements to speed up/ease the work:

Hundreds of entities to manage
Each with fields to be filled in (duh) but sometimes also to be updated
Ability to keep track of which entities were done, which were still pending updates/additions, etc
Ability to search, filter, etc.
Ability to track usage statistics which Elements are used, how often, etc, across the Compound population
and more

All the above is not a joy to do in the context of long files. Really.

How it works

Quite well technically, but also philosophically: the compound/element rust entities are pretty much a lazy database (once_cell::sync::Lazy) used at runtime, to determine the aggregated nature/state of their parent "containers" -- it turns out this aligns perfectly with the storage of the raw data in an actual database, in this case a good old MySQL.

Now, working on the Django admin's UI has its drawbacks. But it is good enough, and extensible enough, to handle my needs. In fact, working as described below has increased QoL by an order of magnitude:

Update, create and delete as needed in the django admin
Perform regular dumpdata backups
On demand, re-generate the rust files from the data
Compile rust

The "compile rust" step is wrapped inside a management command able to handle compilation both on Windows and Linux; this management command also unsintalls and re-installs the resulting wheel.

A typical view in the Django admin:

Where it is extremely easy to know what is done, what remains to be done and sort/filter by pretty much anything you like, given a few reasonably cleverly annotated querysets (Case, When and the like).

This worked so well for me that I applied it to several other entity types. Which means step 3. above is actually composed of several sub-steps of rust files generation.

On multithreading

None is implemented at the moment. I will probably have to come round to doing it but wait & see.

Edit June 8th 2024: some multithreading is gradually being integrated.

Versus pure Python

Remember the table from this post? Well, thanks to Rust, the numbers are now similar to the following:

Cube density	Number of cubes	Number of stars	Total time	~Time per star
10	100	1000	5ms	~0.005ms
20	100	2000	10ms	~0.005ms
10	1000	10000	50ms	~0.005ms

…which represents a speedup of around 20, after the math was made massively more complex and more flexible.

Now this may feel disappointing when compared with the speedup obtained thanks to C; however, the C code at the time was doing about 15% of the work done by the Rust code. Hence the final performance gains are not really comparable. The same holds true against the initial pure Python version.
Functionally, the rust version is now far ahead of anything that came before it. For then on, only this version will be covered, with a potentially a few nostalgia(?) driven exceptions here and there :-)

In closing

And that concludes the "which language" series! As mentioned in the about page, further posts will come as a chaotic pace. As may be guessed by now, the data explorer is still in a constant state of change. both from a functional perspective but also from a data generation one. In any case, yours truly hope you will enjoy at least at little walk across the proposed procedural galaxies!