I would like to temper the "rewrite in Rust" meme

Question

I would like to temper the "rewrite in Rust" meme

a little bit. While it is usually absolutely an excellent idea to rewrite C++ in Rust, a Simple Haskell compiler greatly differs from the usual C++ app. If you both know how to write fast Haskell and fast Rust, Rust is not obviously better for your compiler. Rust actually has several performance drawbacks, and you have to work significantly harder to get performance which merely matches fast Haskell, or work your butt off to get performance which exceeds it.

Rust ADTs are padded to uniform size. This makes mutation simpler and more flexible, but for AST processing we want small footprint more often than mutation and array storage. There's no zero-cost workaround, you either bloat your AST or introduce more indirections, as the Rust compiler itself does, and the latter solution adds some ugly noise to your code.

Rust owned pointer allocation is slow for ASTs, so we have to use arenas. Now, arenas also add significant noise to the code, and they are not faster than default GHC heap allocation, in my experience they are actually slower. And they are even more annoying when sometimes we actually have to GC.

When do we actually have to GC? While it is certainly possible to write a compiler which does not use a GC, this favors "dumb" and slow compilers which do passes by walking ASTs, copying terms and performing substitution/rewriting. It is better to do as much as possible with normalization-by-evaluation and/or abstract interpretation. Even for something as trivial as instantiating polymorphic types in System F, NbE outperforms the naive substitution which we might write in Rust. And NbE really needs GC; without that, we either deep copy ourselves to death or just leak space.

In compilers, GC throughput is much more important than latency/pauses. GHC GC is very good at throughput, and we can still speed it up in big ways if we throw free memory at it with +RTS -Ax. In my benchmarks, the V8, JVM and .NET GC-s all appear to be crappy compared to GHC GC in small allocation workload.

Some things are better in Rust:

Mutable data structures. Hashtables are the most relevant, which are far better in Rust. However, if we use interned strings, as we should in any implementation, the actual performance gap is probably not that great, as we only do exactly one map lookup for each source identifier string, and after that we only do array indexing.

Zero-cost abstraction. In Rust, typeclasses are completely monomorphized, as well as all generic data structures. Hence, we can generally write more generic code in fast Rust than in fast Haskell. In Haskell, sometimes we have to duplicate data types and write intentionally monomorphic code.

Some features which could have a large impact, are missing both from Rust and Haskell:

Support for programming directly with packed ASTs.

Runtime code generation. Many optimization passes can sensibly factor through generated code: we go from AST to machine code, then run the machine code which yields analysis output about the input AST. Now, in principle this could be achieved both in Haskell and Rust (easier in Rust), but it requires working our butt off to make it work. In Javascript, this is much easier by simply using JIT eval (but of course js has many other serious drawbacks).

I've been investigating typechecking performance on and off for a couple of years. I've considered using Rust for this purpose, but I decided against it for the above reasons. If you are willing to write your own RTS and GC, like the Lean devs, then Rust is an OK choice. Otherwise, if you're already an expert at writing high-performance Haskell, then it's much easier to get a very fast "production strength" compiler in Haskell.

#haskell #programming #russian

0

23.05.2020

1 ответов

38 просмотров

Kakadu · Accepted Answer

Kakadu

А откуда текст?

0

23.05.2020

169 похожих чатов

I would like to temper the "rewrite in Rust" meme

1 ответов

Похожие вопросы