Rust compiler: code optimizations

Posted Feb 04, 2024  ‐  5 min read

Intro

In this article I want to explore how to optimize generated code in Rust compiler. Most of these ways are well known. If you are a seasoned Rust developer, most likely, you know about them.

At the end of article I'm presenting two less know optimizations. These optimizations are available in the nightly version of the compiler. These optimizations have a significant impact. Scroll down if you know everything about optimizations in the stable Rust and have not heard about optimizations in nightly.

There are many ways of how compiler can optimize code. Here, we will focus on optimizations related to compiled binary size. Linus Torvalds noticed that optimizations for size may be significant for GUI applications.

Stable Rust

Let's create a new project and check its binary size for development and release profiles.

cargo new prs-opt-tst
[..]
cd prs-opt-tst/
cargo build
[..]
ls target/debug/prs-opt-tst
╭───┬──────────────────────────┬──────┬─────────┬────────────────╮
│ # │           name           │ type │  size   │    modified    │
├───┼──────────────────────────┼──────┼─────────┼────────────────┤
│ 0 │ target/debug/prs-opt-tst │ file │ 3.6 MiB │ 29 seconds ago │
╰───┴──────────────────────────┴──────┴─────────┴────────────────╯

cargo build --release
[..]
ls target/release/prs-opt-tst
╭───┬────────────────────────────┬──────┬───────────┬────────────────╮
│ # │            name            │ type │   size    │    modified    │
├───┼────────────────────────────┼──────┼───────────┼────────────────┤
│ 0 │ target/release/prs-opt-tst │ file │ 410.9 KiB │ 24 seconds ago │
╰───┴────────────────────────────┴──────┴───────────┴────────────────╯

That is a huge difference.

Let's also have a look at compilation time. I'll use a hyperfine tool to measure it.

hyperfine --warmup 5 'cargo clean; cargo build'
Benchmark 1: cargo clean; cargo build
  Time (mean ± σ):     239.5 ms ±   3.0 ms    [User: 168.7 ms, System: 86.1 ms]
  Range (min … max):   235.8 ms … 246.4 ms    12 runs

hyperfine --warmup 5 'cargo clean; cargo build --release'
Benchmark 1: cargo clean; cargo build --release
  Time (mean ± σ):     202.3 ms ±   1.7 ms    [User: 143.1 ms, System: 63.7 ms]
  Range (min … max):   200.1 ms … 205.7 ms    14 runs

That is counterintuitive. Build time of the hello world program in release profile with all default optimizations takes less time than build in development profile. But it's a super simple application. Release builds for larger apps takes more time usually.

What are the differences between dev and release profiles? Let's have a look at Profiles section in Cargo Reference.

Here is a default dev profile.

[profile.dev]
opt-level = 0
debug = true
split-debuginfo = '...'  # Platform-specific.
strip = "none"
debug-assertions = true
overflow-checks = true
lto = false
panic = 'unwind'
incremental = true
codegen-units = 256
rpath = false

Here is a default release profile.

[profile.release]
opt-level = 3
debug = false
split-debuginfo = '...'  # Platform-specific.
strip = "none"
debug-assertions = false
overflow-checks = false
lto = false
panic = 'unwind'
incremental = false
codegen-units = 16
rpath = false

Two key differences are "opt-level" and "debug" parameters. Of course the most impact on binary size have debugging symbols.

Let's enable debuging on release profile.

[package]
name = "prs-opt-tst"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]

[profile.release]
debug = true
cargo build --release
[..]
ls target/release/prs-opt-tst
╭───┬────────────────────────────┬──────┬─────────┬──────────╮
│ # │            name            │ type │  size   │ modified │
├───┼────────────────────────────┼──────┼─────────┼──────────┤
│ 0 │ target/release/prs-opt-tst │ file │ 3.6 MiB │ now      │
╰───┴────────────────────────────┴──────┴─────────┴──────────╯

Now we see an impact of "debug" setting on a binary size.

Strip

Let's try to save more space by stripping binary from symbols. We will use "strip" option to achieve this.

[profile.release]
strip = true

This setting removes debugging symbols from binary file. You can read more about strip option in "The Cargo Book".

cargo build --release
ls target/release/prs-opt-tst
╭───┬────────────────────────────┬──────┬───────────┬──────────╮
│ # │            name            │ type │   size    │ modified │
├───┼────────────────────────────┼──────┼───────────┼──────────┤
│ 0 │ target/release/prs-opt-tst │ file │ 350.4 KiB │ now      │
╰───┴────────────────────────────┴──────┴───────────┴──────────╯

We saved 60 KiB. Difference will be more significant in the larger apps.

How does it affected compilation time?

hyperfine --warmup 5 'cargo clean; cargo build --release'
Benchmark 1: cargo clean; cargo build --release
  Time (mean ± σ):     210.8 ms ±  12.7 ms    [User: 144.0 ms, System: 71.3 ms]
  Range (min … max):   198.6 ms … 237.7 ms    14 runs

Not much. But of course it's a small program.

LTO

Let's have a look at another optimization - link time optimization.

[profile.release]
lto = true
strip = true

This setting tries to perform optimization across all crates. You can read more about lto option in "The Cargo Book".

How does it affected size?

cargo build --release
[..]
ls target/release/prs-opt-tst
╭───┬────────────────────────────┬──────┬───────────┬──────────╮
│ # │            name            │ type │   size    │ modified │
├───┼────────────────────────────┼──────┼───────────┼──────────┤
│ 0 │ target/release/prs-opt-tst │ file │ 318.4 KiB │ now      │
╰───┴────────────────────────────┴──────┴───────────┴──────────╯

32 KiB. Not that bad. Taking into account that LTO works best for large applications with many dependencies.

How about compilation time?

hyperfine --warmup 5 'cargo clean; cargo build --release'
Benchmark 1: cargo clean; cargo build --release
  Time (mean ± σ):      3.089 s ±  0.032 s    [User: 3.000 s, System: 0.094 s]
  Range (min … max):    3.052 s …  3.158 s    10 runs

LTO affects compilation time in a significent way.

opt-level

Let's check the last one. With "opt-level" we can choose a general compiler optimization level.

[profile.release]
lto = true
opt-level = "s"
strip = true

This setting turns on optimization for size in compiler for generated code. You can read more about opt-level option in "The Cargo Book".

cargo build --release
[..]
ls target/release/prs-opt-tst
╭───┬────────────────────────────┬──────┬───────────┬──────────╮
│ # │            name            │ type │   size    │ modified │
├───┼────────────────────────────┼──────┼───────────┼──────────┤
│ 0 │ target/release/prs-opt-tst │ file │ 310.4 KiB │ now      │
╰───┴────────────────────────────┴──────┴───────────┴──────────╯

Ok, another 8 KiB.

How about compilation time?

hyperfine --warmup 5 'cargo clean; cargo build --release'
Benchmark 1: cargo clean; cargo build --release
  Time (mean ± σ):      2.641 s ±  0.061 s    [User: 2.546 s, System: 0.099 s]
  Range (min … max):    2.594 s …  2.762 s    10 runs

Code compiles faster with this optimization.

Nightly Rust

Let's have a look at optimizations that are available in the nightly version of a compiler. To use them, we need to have a nightly version installed. You can install it with the following commands.

rustup toolchain install nightly
rustup component add rust-src --toolchain nightly-x86_64-unknown-linux-gnu
build-std

Now let's try to use new build-std feature from nightly. What it does?

"Enable Cargo to compile the standard library itself as part of a crate graph compilation."

cargo +nightly build --release -Z build-std --target x86_64-unknown-linux-gnu
[..]
ls target/x86_64-unknown-linux-gnu/release/prs-opt-tst
╭───┬─────────────────────────────────────────────────────┬──────┬───────────┬──────────╮
│ # │                        name                         │ type │   size    │ modified │
├───┼─────────────────────────────────────────────────────┼──────┼───────────┼──────────┤
│ 0 │ target/x86_64-unknown-linux-gnu/release/prs-opt-tst │ file │ 274.4 KiB │ now      │
╰───┴─────────────────────────────────────────────────────┴──────┴───────────┴──────────╯

We saved another 36 KiB.

But how does it affect a compilation time?

hyperfine --warmup 5 'cargo clean; cargo +nightly build --release -Z build-std --target x86_64-unknown-linux-gnu'
Benchmark 1: cargo clean; cargo +nightly build --release -Z build-std --target x86_64-unknown-linux-gnu
  Time (mean ± σ):     19.676 s ±  0.184 s    [User: 29.346 s, System: 2.490 s]
  Range (min … max):   19.436 s … 20.025 s    10 runs

It affected compilation time in a significant way. But this is a price of compilation of optimized version of a standard library.

build-std-features

Now let's try build-std-features option. What it does?

"Configure features enabled for the standard library itself when building the standard library."

cargo +nightly build --release -Z build-std -Z build-std-features --target x86_64-unknown-linux-gnu
[..]
ls target/x86_64-unknown-linux-gnu/release/prs-opt-tst
╭───┬─────────────────────────────────────────────────────┬──────┬──────────┬──────────╮
│ # │                        name                         │ type │   size   │ modified │
├───┼─────────────────────────────────────────────────────┼──────┼──────────┼──────────┤
│ 0 │ target/x86_64-unknown-linux-gnu/release/prs-opt-tst │ file │ 66.4 KiB │ now      │
╰───┴─────────────────────────────────────────────────────┴──────┴──────────┴──────────╯

That's 208 KiB :)

Let's check compilation times.

hyperfine --warmup 5 'cargo clean; cargo +nightly build --release -Z build-std -Z build-std-features --target x86_64-unknown-linux-gnu'
Benchmark 1: cargo clean; cargo +nightly build --release -Z build-std -Z build-std-features --target x86_64-unknown-linux-gnu
  Time (mean ± σ):     15.485 s ±  0.180 s    [User: 21.076 s, System: 1.925 s]
  Range (min … max):   15.225 s … 15.835 s    10 runs

It builds a litlle faster.

Real world application

Let's try to check how these optimizations work on a regular application. Let's use Nushell.

I removed all default optimizations that Nushell uses. Here are the results for a not optimized version.

cargo build --release
[..]
ls target/release/nu
╭───┬───────────────────┬──────┬──────────┬────────────────╮
│ # │       name        │ type │   size   │    modified    │
├───┼───────────────────┼──────┼──────────┼────────────────┤
│ 0 │ target/release/nu │ file │ 44.0 MiB │ 13 seconds ago │
╰───┴───────────────────┴──────┴──────────┴────────────────╯

Let's check how it looks like after enabling our ogressive optimizations.

cargo +nightly build --release -Z build-std -Z build-std-features --target x86_64-unknown-linux-gnu
[..]
ls target/x86_64-unknown-linux-gnu/release/nu
╭───┬────────────────────────────────────────────┬──────┬──────────┬────────────────╮
│ # │                    name                    │ type │   size   │    modified    │
├───┼────────────────────────────────────────────┼──────┼──────────┼────────────────┤
│ 0 │ target/x86_64-unknown-linux-gnu/release/nu │ file │ 21.8 MiB │ 17 seconds ago │
╰───┴────────────────────────────────────────────┴──────┴──────────┴────────────────╯

Conclusions

Results are good. Without going down to assembly level. Experiments shows that it's possible to go down:

  • from 3.6 MiB (dev profile) to 208 KiB in hello world application,
  • from 450.9 MiB (dev profile) to 21.8 MiB in regular application

It's all possible with using Rust compiler optimizations settings.

Are these optimizations worth it? As always, it depends. You need to measure how they impact your application. Rust is one of the greenest programming languages. It may use a lot less CPU and memory resources than other languages. We pay for this in slower compilation time. But after all, later program runs much faster.

So, is it worth it? If you can lower your servers costs because of better code optimization, than, of course it's worth it. If you care about lesser elecrticity consumption by users, than, of course it's worth it.

Remember that all these optimizations that slows down compilation process goes only for release profile binaries. These optimizations don't slow you down in your work.

Tools usage disclosure. I'm not a native English speaker. I use the following tools to improve my wording: Google Translate, Grammarly, and Hemingway Editor. These tools use artificial intelligence. I use them only to improve my writing, not to generate content.