Language levels problems with Rust while learning day 1

I started learning Rust recently as on of the other maintainers of Drogon tries and likes it. And C++ have it's own pile of problems. Not saying I don't like C++ anymore just that I'm trying to learn something new. In the process I found a few places I dislike about Rust. Especially from the point of view of a HPC programmer.

Forced to use Arc to share atomic variables

One thing we do a lot in highly concurrent enviroment is to use a shared atomic variable to communicate between threads. For example, in my search engine, there's an atomic integer that counts the active connections. Then each worker decides to spawn more crawlers based on that. In C++ it's easy:

// In Crawler. `atomic<size_t> activeConnections_` is a member variable
// `dispatchCraw` is called from multiple threads
void Crawler::dispatchCraw() {
    size_t activeConnections = activeConnections_.fetch_add(1, std::memory_order_acq_rel);
    if (activeConnections < maxConnections_) {
        // spawn async task

    activeConnections_.fetch_sub(1, std::memory_order_acq_rel);

// To start crawling. Psuedo code
void start() {
    static Crawlers crawler;
    for(int i = 0; i < numThreads_; i++) {
        thread t(&Crawler::dispatchCraw, &crawler);

However, due to Rust's borrow checker, it's not possible to share a mutable reference across threads. The solution is to wrap that atomic variable in an Arc clone it and pass it to other threads. Arc (atomic reference count) is basically atomic<shared_ptr> in C++. Wasting cycles when it can be perfectly avoided by using a known, good static lifetime variable.

fn dispatch_craw(crawler: &Arc<Crawler>) {
    let active_connections = crawler.active_connections.fetch_add(1, Ordering::AcqRel);
    if active_connections < crawler.max_connections {
        // spawn async task

    crawler.active_connections.fetch_sub(1, Ordering::AcqRel);

fn start() {
    // !!! Forced to create Arc for the crawler. Even though in C++ it's not necessary
    let crawler = Arc::new(Crawler::new());
    for _ in 0..crawler.num_threads {
        let crawler = crawler.clone(); // Atomic operation, not free
        thread::spawn(move || dispatch_craw(&crawler));

This goes for all multiple read or write variables. Like a concurrent queue, concurrent hash map, etc. It's really not ideal.

Harder-then-necessary explicit parallelism control

In C++, OpenMP does a very good job at hiding the details of parallelism. The reduction reduces the local varliable at the end of the parallel execution. The following programs calculates Pi by integrating a quarter circle in parallel. The "Hello World" of parallel programming or so. This should be easy for any language to implement.

double sum = 0;
size_t steps = 1000000000;
double step = 1.0/(double) steps;
#pragma omp parallel for reduction(+:sum)
for (i=0; i < steps; i++) {
    double x = (i+0.5)*step;
    sum += 4.0 / (1.0+x*x); 

It's still easy with TBB. Which does not require any special compiler level support. We just have to use a vector to store the local sum.

size_t steps = 1000000000;
double step_size = 1.0/(double) steps;

vector<double, tbb::cache_aligned_allocator> partial_sums(tbb::this_task_arena::default_num_threads());
tbb::parallel_for(0, steps, [&](size_t i) {
    thread_local auto& partial_sum = partial_sums[tbb::this_task_arena::current_thread_index()];
    double x = (i+0.5)*step_size;
    partial_sum += 4.0 / (1.0+x*x);
double sum = std::accumulate(partial_sums.begin(), partial_sums.end(), 0);

In Rust, the best when I ask my local Rust community is the following. It's clean, but they gave up on solving the false sharing problem.

let steps = 1000000000;
(0..steps).par_iter().fold(vec![x, sum], |mut acc, i| {
  let x = (i+0.5)*step;
  acc[1] += 4.0 / (1.0+ x * x);
  acc[0] = x; 


Anyway, rant over. I feel Rust is doing a lot of correct stuff. But I don't like how it's slow when I'm asking it for the highest performance. Gonna keep learning it, hope to find solutions. (No, it'll not be unsafe)

Author's profile. Photo taken in VRChat by my friend Tast+
Martin Chang
Systems software, HPC, GPGPU and AI. I mostly write stupid C++ code. Sometimes does AI research. Chronic VRChat addict

I run TLGS, a major search engine on Gemini. Used by Buran by default.

  • marty1885 \at
  • Matrix:
  • Jami: a72b62ac04a958ca57739247aa1ed4fe0d11d2df