Compilers are complicated beasts. Our high-level source code goes through many transformations until it winds up becoming machine code that runs on real or virtual hardware. Assembly is the final destination before machine code, and it doesn't have to be menacing! Whether you intend to write assembly directly or not, knowing how your code translates to assembly can drastically improve your ability to analyze programs from the standpoint of performance.
Yes, we need numbers to guide us towards improvements, and yes, that means having benchmarks. Arguments over performance that don't include data are conjecture but understanding assembly gives you a magnifying glass to help guide you in your optimization adventures. With some experience, we can learn how to look at assembly and determine such things as whether or not the assembly contains efficient instructions, chunks of code are replaced with constant values, and so on. Benchmarks and analyzing assembly can go hand in hand, but how do you even get at the assembly in the first place?
If you want to look at Rust's assembly in your project using just cargo
, there
are two ways. You can call
$ cargo rustc --release -- --emit asm <ARGS>
--release
is optional here. The primary argument that's needed is --emit asm
. ARGS
is the list of arguments you want to pass to rustc
that might
influence compilation. By default, rustc
generates AT&T syntax. Still, you can
change to Intel syntax if that's what you prefer by passing -C llvm-args=--x86-asm-syntax=intel
, which may not matter to you if this is your
first foray into analyzing assembly, but it can be fun to see as an experiment!
If you want a good starting point for flags, try using:
<snip>
-C target-cpu=native -C opt-level=3
<snip>
These two codegen options instruct the compiler to emit code specifically for
the processor it guesses you are running the compiler on as well as using all
optimizations. You can also pass opt-level=z
or opt-level=s
if you want to
optimize for total disk space, instead. As a note, fewer instructions doesn't
necessarily mean efficient code. A short set of instructions may end up taking
more cycles than the more verbose alternative.
If, instead, you want to call the standard cargo build
, you can pass all these
arguments with the RUSTFLAGS
environment variable. For example:
$ RUSTFLAGS="--emit asm -C opt-level=3 -C target-cpu=native" cargo build --release
When the build finishes, the assembly will live in a file with the suffix of.s
under target/debug/deps/CRATE_NAME-HASH.s
ortarget/release/deps/CRATE_NAME-HASH.s
, depending on whether or not you
builtwith the --release
flag. If I run the above command on a crate with the
nameproject
I'll get something like the following:
$ find . -name "*.s" -type f
./target/release/deps/project-1693e028130a9fa3.s
Keep in mind that there may be several of these outputs. If you are confused,
which is the latest, you can try cargo clean
and building fresh. By default,
the names are going to look pretty weird in the output due to mangling!
Mangling ensures that names for identifiers are unique across the process of
compilation. You can try feeding the resulting assembly into
rustfilt to get cleaner names:
$ find . -name "*.s" -type f | xargs cat | rustfilt
Ok, this is great if you have a project going, but maybe you have some transient
code in the Rust playground and want to know what
the assembly is there. You can emit assembly there, too! If you click on the
ellipses next to the Run
button, you'll get a menu that has several options.
Select ASM
for assembly output in another tab. There isn't much control over
compilation options with the Rust playground approach besides picking stable,
beta, or nightly. A more fully-featured web version for picking apart assembly
is godbolt, describes itself as a "compiler explorer"
and provides a lot of features to aid you in exploration over the above
bare-bones approaches. Advantage of using godbolt include:
- Viewing highlighted segments of our source code and where they line up to the assembly
- Access to a bevy of compilers from a wide variety of languages, even selecting which version of Rust you want to use
- Passing arbitrary flags to influence how the generated output is produced
- Diffing changes in assembly between source code assembly
- Looking up the documentation for instructions on the fly
You now know three ways to emit assembly, whether it's on your machine, the Rust playground, or godbolt! To the uninitiated, this can be overwhelming, but opening the hood can be liberating and allow us to start exploring the various instructions and how they all tie together.
To reiterate, you don't always have to look at assembly to guide performance optimization. Benchmarks are crucial at guiding us towards real-world results. Try to make it a habit to look at assembly when you're curious about what's going on under the hood. If you start optimizing, it can be interesting to compare how assembly changes as you make high-level changes. If things seem to speed up, try to explore how the assembly itself has changed!
Update May 4 2020, 2:12PM
u/ibeforeyou
on
Reddit
mentioned cargo-asm
to help alleviate a
lot of the pain of dumping out the raw assembly above with cargo
. By default,
it will produce Intel syntax, and it can even overlay the rust code over the
lines of assembly. The twist is that you need to give a path to the assembly you
want to see dumped. If you want to see function foo
of the crate crate_name
,
you could specify the path:
$ cargo asm --rust crate_name::foo
I did have to shuffle around the flags to get it to emit AT&T syntax for me, in the end, this ended up working:
$ cargo asm --rust --asm-style att crate_name::foo
Running cargo asm
dumps all the available paths that you can list, which is
pretty neat if you're confused about which path to put down. What I like about
this is you can jam it into a feedback
loop
using something like cargo-watch
or entr
. This way you can make changes on
an individual function and watch how the benchmarks and assembly change without
having to invoke commands manually!