Rust compiler: code optimizations (upd.)
Posted Feb 04, 2024. Updated Dec 16, 2024 ‐ 9 min read
Intro
In this article, I want to explore how to optimize generated code in the Rust compiler. Most of these ways are well known. If you are a seasoned Rust developer, most likely, you know about them.
I'm also presenting two less-known optimizations. These optimizations are available in the nightly version of the compiler. These optimizations have a significant impact. Scroll down if you know everything about optimizations in the stable Rust and have not heard about optimizations in nightly.
There are many ways in which a compiler can optimize code. Here, we will focus on optimizations related to compiled binary size. Linus Torvalds noticed that optimizations for size may be significant for GUI applications.
Hello world application
Stable Rust
Let's create a new project and check its binary size for development and release profiles.
cargo new prs-opt-tst
[..]
cd prs-opt-tst/
cargo build
[..]
ls target/debug/prs-opt-tst
â•â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â•®
│ # │ name │ type │ size │ modified │
├───┼──────────────────────────┼──────┼─────────┼──────────┤
│ 0 │ target/debug/prs-opt-tst │ file │ 3.7 MiB │ now │
╰───┴──────────────────────────┴──────┴─────────┴──────────╯
cargo build --release
[..]
ls target/release/prs-opt-tst
â•â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â•®
│ # │ name │ type │ size │ modified │
├───┼────────────────────────────┼──────┼───────────┼──────────┤
│ 0 │ target/release/prs-opt-tst │ file │ 409.8 KiB │ now │
╰───┴────────────────────────────┴──────┴───────────┴──────────╯
That is a huge difference.
Let's also have a look at compilation time. I'll use a hyperfine tool to measure it.
hyperfine --warmup 5 'cargo clean; cargo build'
Benchmark 1: cargo clean; cargo build
Time (mean ± σ): 300.0 ms ± 3.2 ms [User: 187.5 ms, System: 133.6 ms]
Range (min … max): 292.8 ms … 304.2 ms 10 runs
hyperfine --warmup 5 'cargo clean; cargo build --release'
Benchmark 1: cargo clean; cargo build --release
Time (mean ± σ): 256.4 ms ± 2.3 ms [User: 147.6 ms, System: 116.5 ms]
Range (min … max): 252.4 ms … 260.1 ms 11 runs
That is counterintuitive. Building time of the hello world program in the release profile with all default optimizations takes less time than building in the development profile. But it's a super simple application. Release builds for larger apps take more time usually.
What are the differences between dev and release profiles? Let's have a look at Profiles section in Cargo Reference.
Here is a default dev profile.
[profile.dev]
opt-level = 0
debug = true
split-debuginfo = '...' # Platform-specific.
strip = "none"
debug-assertions = true
overflow-checks = true
lto = false
panic = 'unwind'
incremental = true
codegen-units = 256
rpath = false
Here is a default release profile.
[profile.release]
opt-level = 3
debug = false
split-debuginfo = '...' # Platform-specific.
strip = "none"
debug-assertions = false
overflow-checks = false
lto = false
panic = 'unwind'
incremental = false
codegen-units = 16
rpath = false
Two key differences are "opt-level" and "debug" parameters. Of course, the most impact on binary size have debugging symbols.
Let's enable debugging on the release profile.
[package]
name = "prs-opt-tst"
version = "0.1.0"
edition = "2021"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
[profile.release]
debug = true
cargo build --release
[..]
ls target/release/prs-opt-tst
â•â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â•®
│ # │ name │ type │ size │ modified │
├───┼────────────────────────────┼──────┼─────────┼──────────┤
│ 0 │ target/release/prs-opt-tst │ file │ 3.7 MiB │ now │
╰───┴────────────────────────────┴──────┴─────────┴──────────╯
Now we see an impact of the "debug" setting on a binary size.
Strip
Let's try to save more space by stripping binary from symbols. We will use the "strip" option to achieve this.
[profile.release]
strip = true
This setting removes debugging symbols from binary file. You can read more about strip option in "The Cargo Book".
cargo build --release
ls target/release/prs-opt-tst
â•â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â•®
│ # │ name │ type │ size │ modified │
├───┼────────────────────────────┼──────┼───────────┼──────────┤
│ 0 │ target/release/prs-opt-tst │ file │ 342.5 KiB │ now │
╰───┴────────────────────────────┴──────┴───────────┴──────────╯
We saved 67 KiB. The difference will be more significant in the larger apps.
How does it affect compilation time?
hyperfine --warmup 5 'cargo clean; cargo build --release'
Benchmark 1: cargo clean; cargo build --release
Time (mean ± σ): 261.7 ms ± 15.1 ms [User: 155.0 ms, System: 114.9 ms]
Range (min … max): 251.6 ms … 304.7 ms 11 runs
Not much. But of course, it's a small program.
LTO
Let's have a look at another optimization - link time optimization.
[profile.release]
lto = true
strip = true
This setting tries to perform optimization across all crates. You can read more about lto option in "The Cargo Book".
How does it affect size?
cargo build --release
[..]
ls target/release/prs-opt-tst
â•â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â•®
│ # │ name │ type │ size │ modified │
├───┼────────────────────────────┼──────┼───────────┼──────────┤
│ 0 │ target/release/prs-opt-tst │ file │ 314.4 KiB │ now │
╰───┴────────────────────────────┴──────┴───────────┴──────────╯
28 KiB. Not that bad. Taking into account that LTO works best for large applications with many dependencies.
How about compilation time?
hyperfine --warmup 5 'cargo clean; cargo build --release'
Benchmark 1: cargo clean; cargo build --release
Time (mean ± σ): 2.864 s ± 0.036 s [User: 2.719 s, System: 0.154 s]
Range (min … max): 2.832 s … 2.940 s 10 runs
LTO affects compilation time in a significant way.
opt-level
Let's check another one. With "opt-level" we can choose a general compiler optimization level.
[profile.release]
lto = true
opt-level = "s"
strip = true
This setting turns on optimization for size in the compiler for generated code. You can read more about opt-level option in "The Cargo Book".
cargo build --release
[..]
ls target/release/prs-opt-tst
â•â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â•®
│ # │ name │ type │ size │ modified │
├───┼────────────────────────────┼──────┼───────────┼──────────┤
│ 0 │ target/release/prs-opt-tst │ file │ 306.4 KiB │ now │
╰───┴────────────────────────────┴──────┴───────────┴──────────╯
Ok, another 8 KiB.
How about compilation time?
hyperfine --warmup 5 'cargo clean; cargo build --release'
Benchmark 1: cargo clean; cargo build --release
Time (mean ± σ): 2.818 s ± 0.188 s [User: 2.652 s, System: 0.175 s]
Range (min … max): 2.653 s … 3.169 s 10 runs
Code compiles faster with this optimization.
[profile.release]
lto = true
opt-level = "z"
strip = true
"z" option may give better results than "s" option in some situations. In this situation, there is no difference. "z" option disables loop vectorization, so the produced code may be slower.
cargo build --release
[..]
ls target/release/prs-opt-tst
â•â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â•®
│ # │ name │ type │ size │ modified │
├───┼────────────────────────────┼──────┼───────────┼──────────┤
│ 0 │ target/release/prs-opt-tst │ file │ 306.4 KiB │ now │
╰───┴────────────────────────────┴──────┴───────────┴──────────╯
hyperfine --warmup 5 'cargo clean; cargo build --release'
Benchmark 1: cargo clean; cargo build --release
Time (mean ± σ): 2.675 s ± 0.022 s [User: 2.529 s, System: 0.155 s]
Range (min … max): 2.639 s … 2.707 s 10 run
Code compiles faster with this optimization.
panic
Let's check another one. With "panic" we can choose which panic strategy is used.
[profile.release]
lto = true
opt-level = "s"
panic = "abort"
strip = true
This setting allows you to choose which panic strategy is used. The default "unwind" strategy may produce a significantly larger binary files in larger programs. You can read more about panic option in "The Cargo Book".
cargo build --release
[..]
ls target/release/prs-opt-tst
â•â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â•®
│ # │ name │ type │ size │ modified │
├───┼────────────────────────────┼──────┼───────────┼──────────┤
│ 0 │ target/release/prs-opt-tst │ file │ 302.4 KiB │ now │
╰───┴────────────────────────────┴──────┴───────────┴──────────╯
Ok, another 4 KiB.
How about compilation time?
hyperfine --warmup 5 'cargo clean; cargo build --release'
Benchmark 1: cargo clean; cargo build --release
Time (mean ± σ): 2.818 s ± 0.188 s [User: 2.652 s, System: 0.175 s]
Range (min … max): 2.653 s … 3.169 s 10 runs
Code compiles just like in opt-level = "s" optimization test.
codegen-units
Let's check the last one. With "codegen-units" we can choose how many "code generation units" are used.
[profile.release]
codegen-units = 1
lto = true
opt-level = "s"
panic = "abort"
strip = true
This setting allows you to choose how many "code generation units" are used. The default "16" value may produce a significantly larger binary file in larger programs. You can read more about codegen-units option in "The Cargo Book".
cargo build --release
[..]
ls target/release/prs-opt-tst
â•â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â•®
│ # │ name │ type │ size │ modified │
├───┼────────────────────────────┼──────┼───────────┼────────────────┤
│ 0 │ target/release/prs-opt-tst │ file │ 302.4 KiB │ 38 seconds ago │
╰───┴────────────────────────────┴──────┴───────────┴────────────────
Ok, in a simple Hello World program there is no difference.
How about compilation time?
hyperfine --warmup 5 'cargo clean; cargo build --release'
Benchmark 1: cargo clean; cargo build --release
Time (mean ± σ): 2.683 s ± 0.030 s [User: 2.530 s, System: 0.162 s]
Range (min … max): 2.640 s … 2.722 s 10 runs
Code compiles faster with this optimization.
Nightly Rust
Let's have a look at optimizations that are available in the nightly version of a compiler. To use them, we need to have a nightly version installed. You can install it with the following commands.
rustup toolchain install nightly
rustup component add rust-src --toolchain nightly-x86_64-unknown-linux-gnu
build-std
Now let's try to use the new build-std feature from nightly. What does it do?
"Enable Cargo to compile the standard library itself as part of a crate graph compilation."
cargo +nightly build --release -Z build-std=std,panic_abort --target x86_64-unknown-linux-gnu
[..]
ls target/x86_64-unknown-linux-gnu/release/prs-opt-tst
â•â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â•®
│ # │ name │ type │ size │ modified │
├───┼─────────────────────────────────────────────────────┼──────┼───────────┼──────────┤
│ 0 │ target/x86_64-unknown-linux-gnu/release/prs-opt-tst │ file │ 232.6 KiB │ now │
╰───┴─────────────────────────────────────────────────────┴──────┴───────────┴──────────╯
We saved another 69 KiB.
But how does it affect compilation time?
hyperfine --warmup 5 'cargo clean; cargo +nightly build --release -Z build-std=std,panic_abort --target x86_64-unknown-linux-gnu'
Benchmark 1: cargo clean; cargo +nightly build --release -Z build-std=std,panic_abort --target x86_64-unknown-linux-gnu
Time (mean ± σ): 26.079 s ± 0.511 s [User: 36.577 s, System: 2.342 s]
Range (min … max): 25.492 s … 26.930 s 10 runs
It affected compilation time in a significant way. But this is the price of compilation of an optimized version of a standard library.
build-std-features
Now let's try build-std-features option. What does it do?
"Configure features enabled for the standard library itself when building the standard library."
cargo +nightly build --release -Z build-std=std,panic_abort -Z build-std-features --target x86_64-unknown-linux-gnu
[..]
ls target/x86_64-unknown-linux-gnu/release/prs-opt-tst
â•â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â•®
│ # │ name │ type │ size │ modified │
├───┼─────────────────────────────────────────────────────┼──────┼──────────┼──────────┤
│ 0 │ target/x86_64-unknown-linux-gnu/release/prs-opt-tst │ file │ 35.2 KiB │ now │
╰───┴─────────────────────────────────────────────────────┴──────┴──────────┴──────────╯
That's 197,4 KiB :)
Let's check compilation times.
hyperfine --warmup 5 'cargo clean; cargo +nightly build --release -Z build-std=std,panic_abort -Z build-std-features --target x86_64-unknown-linux-gnu'
Benchmark 1: cargo clean; cargo +nightly build --release -Z build-std=std,panic_abort -Z build-std-features --target x86_64-unknown-linux-gnu
Time (mean ± σ): 20.377 s ± 0.464 s [User: 26.036 s, System: 1.837 s]
Range (min … max): 19.681 s … 20.962 s 10 runs
It builds a little faster.
Nushell - real world application
Let's try to check how these optimizations work on a regular application. Let's use Nushell.
This article is by no means a critique of the default build optimizations of Nushell. Developers used options that they used because they had their own reasons. I like this shell and I needed an open-source example of real-world application.
Stable Rust
Let's enable debugging on the release profile.
[profile.release]
debug = true
cargo build --release
[..]
ls target/release/nu
â•â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â•®
│ # │ name │ type │ size │ modified │
├───┼───────────────────┼──────┼───────────┼────────────────┤
│ 0 │ target/release/nu │ file │ 580.6 MiB │ 37 minutes ago │
╰───┴───────────────────┴──────┴───────────┴────────────────╯
Here are the results for Nushell with disabled default optimizations and enabled debugging.
cargo build --release
[..]
ls target/release/nu
â•â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â•®
│ # │ name │ type │ size │ modified │
├───┼───────────────────┼──────┼──────────┼────────────────┤
│ 0 │ target/release/nu │ file │ 44.0 MiB │ 17 seconds ago │
╰───┴───────────────────┴──────┴──────────┴────────────────╯
Now, let's check the result with default Rust optimizations.
[profile.release]
Here are the results for Nushell with disabled default optimizations that are configured for this shell by its developers.
cargo build --release
[..]
ls target/release/nu
â•â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â•®
│ # │ name │ type │ size │ modified │
├───┼───────────────────┼──────┼──────────┼──────────┤
│ 0 │ target/release/nu │ file │ 51.7 MiB │ now │
╰───┴───────────────────┴──────┴──────────┴──────────╯
Now let's check the result with default optimizations that were configured by Nushell developers.
[profile.release]
opt-level = "s" # Optimize for size
strip = "debuginfo"
lto = "thin"
Here are the results for Nushell with its default optimizations.
cargo build --release
[..]
ls target/release/nu
â•â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â•®
│ # │ name │ type │ size │ modified │
├───┼───────────────────┼──────┼──────────┼──────────┤
│ 0 │ target/release/nu │ file │ 37.7 MiB │ now │
╰───┴───────────────────┴──────┴──────────┴──────────╯
As you can see, optimizations that were applied by Nushell developers had a significant impact on the size of a binary file. Let's check if we can do better.
[profile.release]
opt-level = "s" # Optimize for size
strip = true
lto = "thin"
Here are the results when we switch to strip = true.
cargo build --release
[..]
ls target/release/nu
â•â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â•®
│ # │ name │ type │ size │ modified │
├───┼───────────────────┼──────┼──────────┼──────────┤
│ 0 │ target/release/nu │ file │ 29.3 MiB │ now │
╰───┴───────────────────┴──────┴──────────┴──────────╯
That's 8,4 MiB in real-world application.
Now let's try to change the "lto" optimization.
[profile.release]
lto = true
opt-level = "s" # Optimize for size
strip = true
Here are the results.
cargo build --release
[..]
ls target/release/nu
â•â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â•®
│ # │ name │ type │ size │ modified │
├───┼───────────────────┼──────┼──────────┼──────────┤
│ 0 │ target/release/nu │ file │ 24.1 MiB │ now │
╰───┴───────────────────┴──────┴──────────┴──────────╯
That's another 5,2 MiB of smaller binary size in a real-world application. Let's explore more.
Now let's check opt-level = "z" impact.
[profile.release]
lto = true
opt-level = "z"
strip = true
Here are the results.
cargo build --release
[..]
ls target/release/nu
â•â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â•®
│ # │ name │ type │ size │ modified │
├───┼───────────────────┼──────┼──────────┼──────────┤
│ 0 │ target/release/nu │ file │ 23.0 MiB │ now │
╰───┴───────────────────┴──────┴──────────┴──────────╯
As you can see in the real-world application impact of this option is much larger than in Hello World. We saved 1,1 MiB in binary file size. The question is if you care more about binary size or performance. In my opinion, in most cases, you should try to use opt-level = "s", unless you are targeting a really small binary file size.
Let's check now the impact of panic = "abort".
[profile.release]
lto = true
opt-level = "z"
panic = "abort"
strip = true
Here are the results.
cargo build --release
[..]
ls target/release/nu
â•â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â•®
│ # │ name │ type │ size │ modified │
├───┼───────────────────┼──────┼──────────┼──────────────┤
│ 0 │ target/release/nu │ file │ 21.5 MiB │ a minute ago │
╰───┴───────────────────┴──────┴──────────┴──────────────╯
That's 1,5 MiB less. And we haven't finished yet :)
The last option that we can try in stable Rust is codegen-units = 1.
[profile.release]
codegen-units = 1
lto = true
opt-level = "s"
panic = "abort"
strip = true
Here are the results.
cargo build --release
[..]
ls target/release/nu
â•â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â•®
│ # │ name │ type │ size │ modified │
├───┼───────────────────┼──────┼──────────┼──────────┤
│ 0 │ target/release/nu │ file │ 20.8 MiB │ now │
╰───┴───────────────────┴──────┴──────────┴──────────╯
That's another 0,7 MiB that we saved here.
Nightly Rust
Let's check what it looks like after enabling our aggressive optimizations.
build-std
cargo +nightly build --release -Z build-std=std,panic_abort --target x86_64-unknown-linux-gnu
[..]
ls target/x86_64-unknown-linux-gnu/release/nu
â•â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â•®
│ # │ name │ type │ size │ modified │
├───┼────────────────────────────────────────────┼──────┼──────────┼──────────┤
│ 0 │ target/x86_64-unknown-linux-gnu/release/nu │ file │ 19.1 MiB │ now │
╰───┴────────────────────────────────────────────┴──────┴──────────┴──────────╯
We saved here 1,7 MiB.
build-std-features
cargo +nightly build --release -Z build-std=std,panic_abort -Z build-std-features --target x86_64-unknown-linux-gnu
[..]
ls target/x86_64-unknown-linux-gnu/release/nu
â•â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â•®
│ # │ name │ type │ size │ modified │
├───┼────────────────────────────────────────────┼──────┼──────────┼──────────┤
│ 0 │ target/x86_64-unknown-linux-gnu/release/nu │ file │ 19.0 MiB │ now │
╰───┴────────────────────────────────────────────┴──────┴──────────┴──────────╯
That's another 0,1 MiB that we saved here.
Conclusions
The results look good. Without going down to the assembly level. Experiments show that it's possible to go down:
- from 3.7 MiB (dev profile) to 35.2 KiB in the Hello World application,
- from 580.6 MiB (dev profile) and 37.7 MiB (default Nushell optimizations) to 19.0 MiB in the regular application
It's all possible by using Rust compiler optimization settings.
Are these optimizations worth it? As always, it depends. You need to measure how they impact your application. Rust is one of the greenest programming languages. It may use a lot less CPU and memory resources than other languages. We pay for this in slower compilation time. But after all, the later program runs much faster.
So, is it worth it? If you can lower your server's costs because of better code optimization, then, of course, it's worth it. If you care about lesser electricity consumption by users, then, of course, it's worth it.
Remember that all these optimizations that slow down the compilation process go only for release profile binaries. These optimizations don't slow you down in your work.
Additional tips
Bevy's way
It's common knowledge among developers who use the Bevy engine, but it may not be a widely known performance tip that is useful in development.
Here is what it looks like.
[profile.dev.package."*"]
opt-level = 3
This configuration instructs cargo to compile dependency crates with a higher opt-level. It is useful if you want to use more optimized dependencies during development. The idea is that your code is still compiled with default development optimizations that make it compile as fast as possible with the current Rust compiler, but dependency crates are much better optimized.
Unless you add a new crate dependency, change features configuration, or update existing dependencies, your crate dependencies are not compiled frequently during development. So it may be worth using a higher optimization level to have a better performance during development. It's situational optimization. It may work very well or it may cause additional compile time slowdown during development.
Development build without debugging symbols
During the development of games in Bevy or web services in async Rust most likely you don't use debugger. It's much more convenient to use tracing infrastructure or info! macro.
You can easily disable debugging symbols with the following configuration.
[profile.dev]
debug = 0
Here are the benchmark results for building Hello World with debugging symbols.
hyperfine --warmup 5 'cargo clean; cargo build'
Benchmark 1: cargo clean; cargo build
Time (mean ± σ): 300.0 ms ± 3.2 ms [User: 187.5 ms, System: 133.6 ms]
Range (min … max): 292.8 ms … 304.2 ms 10 runs
Here are the benchmark results for building Hello World without debugging symbols.
hyperfine --warmup 5 'cargo clean; cargo build'
Benchmark 1: cargo clean; cargo build
Time (mean ± σ): 261.4 ms ± 3.2 ms [User: 151.4 ms, System: 130.4 ms]
Range (min … max): 254.8 ms … 266.7 ms 11 runs
The difference is more significant for a larger program. You don't need to pay with your time for things that you don't need.
Enabling only features that are needed
It's quite common to just add crate to dependencies with all of its features enabled. For example
[dependencies]
tokio = { version = "1.42.0", features = ["full"] }
cargo build --release
[..]
ls target/release/prs-opt-tst
â•â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â•®
│ # │ name │ type │ size │ modified │
├───┼────────────────────────────┼──────┼───────────┼──────────┤
│ 0 │ target/release/prs-opt-tst │ file │ 486.5 KiB │ now │
╰───┴────────────────────────────┴──────┴───────────┴──────────╯
You can disable not needed features with default-features = false option and only enable features that you need.
[dependencies]
tokio = { version = "1.42.0", default-features = false, features = [
"io-util",
"macros",
"net",
"rt-multi-thread",
] }
cargo build --release
[..]
ls target/release/prs-opt-tst
â•â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”¬â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â•®
│ # │ name │ type │ size │ modified │
├───┼────────────────────────────┼──────┼───────────┼────────────────┤
│ 0 │ target/release/prs-opt-tst │ file │ 446.5 KiB │ 22 seconds ago │
╰───┴────────────────────────────┴──────┴───────────┴────────────────╯
My Bevy config
Below, you can find a configuration that I'm using in my Bevy project.
[dependencies]
bevy = { version = "0.15.0", default-features = false, features = [
"animation",
"bevy_color",
"bevy_gltf",
"bevy_pbr",
"bevy_scene",
"bevy_state",
"bevy_ui",
"bevy_winit",
"bevy_window",
"default_font",
"multi_threaded",
"png",
"tonemapping_luts",
"x11"
] }
rand = { version = "0.8.5", default-features = false, features = [
"std",
"std_rng"
] }
[profile.dev]
debug = 0
panic = "abort"
[profile.dev.package."*"]
opt-level = 3
[profile.release]
codegen-units = 1
lto = true
opt-level = "s"
panic = "abort"
strip = true
With this config, I have two times faster development build times compared to a default configuration.