Using static (global) ArcSwap with thread_local Cache

Question

Using static (global) ArcSwap with thread_local Cache

thargy opened this issue 3 years ago · comments

Hi,

I'm new to Rust, so forgive me if this is dumb. However, the examples in the Common Patterns and Cache docs both use scoped variables. However, I'd like to use caching with the scenario mentioned in Motivation.

Namely:

static CURRENT_CONFIG: Lazy<ArcSwap<Config>> = Lazy::new(|| {ArcSwap::from_pointee(Config::default())});

My suspicion is that the best approach is to use thread_local! to hold a Cache per thread. However:

thread_local! {
    static THREAD_CACHE:Cache<Arc<arc_swap::ArcSwapAny<Config>>, Config> = Cache::new(Arc::clone(Lazy::force(&CURRENT_CONFIG)));
}

Errors, even though the code is accepted in a local scope:

    |
71  | / thread_local! {
72  | |     static THREAD_CACHE:Cache<Arc<arc_swap::ArcSwapAny<Config>>, Config> = Cache::new(Arc::clone(Lazy::force(&CURRENT_CONFIG)));
73  | | }
    | |_^ the trait `RefCnt` is not implemented for `Config`
    | 
   ::: C:\Users\tharg\.cargo\registry\src\github.com-1ecc6299db9ec823\arc-swap-1.3.0\src\lib.rs:293:26
    |
293 |   pub struct ArcSwapAny<T: RefCnt, S: Strategy<T> = DefaultStrategy> {
    |                            ------ required by this bound in `ArcSwapAny`
    |
    = note: this error originates in the macro `$crate::__thread_local_inner` (in Nightly builds, run with -Z macro-backtrace for more info)

error: aborting due to 4 previous errors

For more information about this error, try `rustc --explain E0277`.
error: could not compile `configuration`

To learn more, run the command again with --verbose.

This is because Config expects an Arc<..> not an ArcSwap<...> and all the other examples use an Arc<ArcSwap<...>>; but the static example uses Lazy<...> instead of Arc<...>.

Is this even the recommended approach? Will de-referencing a thread-local Cache be faster than accessing the shared CURRENT_CONFIG?

Craig Dean · Answer 1 · Tue Aug 17 2021 21:16:37 GMT+0800 (China Standard Time)

I found this code in your benchmarks, which led me to this approach:

thread_local! {
    static THREAD_CACHE:Cache<&'static arc_swap::ArcSwapAny<Arc<Config>>, Arc<Config>> =Cache::from(&CURRENT_CONFIG as &ArcSwap<Config>);
}

Is this right? If so perhaps you could add an example to the Cache documents?

Craig Dean · Answer 2 · Tue Aug 17 2021 21:55:19 GMT+0800 (China Standard Time)

OK, I've managed to at least get it 'working' without a Cache:

use arc_swap::{ArcSwap, Guard};
use once_cell::sync::Lazy;
use std::sync::Arc;
use unic_langid::{langid, LanguageIdentifier};

#[cfg(test)]
mod tests;

#[derive(Debug, Clone)]
struct Config {
    pub debug_mode: bool,
    pub language: LanguageIdentifier,
}

#[allow(dead_code)]
impl Config {
    /// Resets the config to it's default state.
    pub fn reset() {
        CURRENT_CONFIG.store(Arc::new(Config::default()));
    }

    pub fn current() -> Guard<Arc<Config>> {
        //THREAD_CACHE.with(|c| c.load())
        CURRENT_CONFIG.load()
    }

    pub fn update(self) {
        CURRENT_CONFIG.store(Arc::from(self));
    }

    pub fn set_debug_mode(debug_mode: bool) {
        /*
        THREAD_CACHE.with(|c| {
            let existing_config = c.load().as_ref();
            if existing_config.debug_mode != debug_mode {
                let mut new_config = existing_config.clone();
                new_config.debug_mode = debug_mode;
                CURRENT_CONFIG.store(Arc::new(new_config));
            }
        });*/
        let c = CURRENT_CONFIG.load();
        if c.debug_mode == debug_mode {
            return;
        }
        let mut a = c.as_ref().clone();

        a.debug_mode = debug_mode;
        CURRENT_CONFIG.store(Arc::from(a));
    }

    pub fn set_language(language: &str) {
        let l: LanguageIdentifier = language.parse().expect("Could not set language!");
        let c = CURRENT_CONFIG.load();
        if c.language == l {
            return;
        }
        let mut a = c.as_ref().clone();

        a.language = l;
        CURRENT_CONFIG.store(Arc::from(a));
    }
}

impl Default for Config {
    fn default() -> Self {
        Config {
            debug_mode: false,
            language: langid!("en-US"),
        }
    }
}

static CURRENT_CONFIG: Lazy<ArcSwap<Config>> =
    Lazy::new(|| ArcSwap::from_pointee(Config::default()));
/*
thread_local! {
    static THREAD_CACHE:Cache<&'static arc_swap::ArcSwapAny<Arc<Config>>, Arc<Config>> =Cache::from(&CURRENT_CONFIG as &ArcSwap<Config>);
}*/

Would really appreciate any help on how to improve this?

Michal 'vorner' Vaner · Answer 3 · Wed Aug 18 2021 01:58:57 GMT+0800 (China Standard Time)

Hello

I think you got lost in the fact that Cache's second argument is not the ArcSwap, but the thing that's inside the ArcSwap (eg. Arc<whatever>). Besides, cache needs mutable access to load.

Would this fragment help you?

use std::sync::Arc;
use std::ops::Deref;
use std::cell::RefCell;

use arc_swap::ArcSwap;
use arc_swap::cache::Cache;
use once_cell::sync::Lazy;

#[derive(Debug, Default)]
struct Config;

static CURRENT_CONFIG: Lazy<ArcSwap<Config>> = Lazy::new(|| ArcSwap::from_pointee(Config::default()));

thread_local! {
    static CACHE: RefCell<Cache<&'static ArcSwap<Config>, Arc<Config>>> = RefCell::new(Cache::from(CURRENT_CONFIG.deref()));
}

fn main() {
    CACHE.with(|c| {
        println!("{:?}", c.borrow_mut().load());
    });
}

What's the thing you'd appreciate in the examples? Something like this, with explicit types, thread-local storage and therefore the RefCell?

As for the performance, I think it should be faster ‒ even the ArcSwap internals access some thread locals and do some (relatively) expensive atomic operations. The Cache avoids the latter in the optimistic case when nothing changed. But as with any performance-sensitive code, you'd better measure it to verify (the chances are that the difference could be so small it wouldn't be worth the added complexity).

Craig Dean · Answer 4 · Wed Aug 18 2021 03:00:37 GMT+0800 (China Standard Time)

Thank you for responding so quickly! I clearly have a lot to learn as I hadn't really come across RefCell yet 🤯! Your example at least compiles now.

If I understand right then I can do my set_XXX methods like so:

    pub fn set_debug_mode(debug_mode: bool) {
        THREAD_CACHE.with(|c| {
            let mut b = c.borrow_mut();
            let config = b.load();
            if config.debug_mode == debug_mode {
                return;
            }
            let mut a = config.as_ref().clone();
    
            a.debug_mode = debug_mode;
            CURRENT_CONFIG.store(Arc::from(a));
        });
    }

Which I think grabs the cached value to check for changes. If it has changed it clones it as mutable, modifies the value and then stores the new value.

Similarly, the make_currentmethod stays as is:

    pub fn make_current(self) {
        CURRENT_CONFIG.store(Arc::from(self));
    }

As there is no need to check the thread_local copy before updating the static shared value.

However, I'm not sure how best to implement current(), is this right?

    pub fn current() -> Arc<Config> {
        THREAD_CACHE.with(|c| c.borrow_mut().load().clone())
    }

Am I right in thinking that this doesn't actually 'clone' the Arc<Config> but just gets a reference increment so should be fast?.

What's the thing you'd appreciate in the examples? Something like this, with explicit types, thread-local storage and therefore the RefCell?

Yes, assuming my code is now right, a complete example like this would really show how Cache can be used alongside a static for sharing a singleton globally (like a configuration struct):

use arc_swap::{ArcSwap, Cache};
use once_cell::sync::Lazy;
use std::{cell::RefCell, ops::Deref, sync::Arc};

#[cfg(test)]
mod tests;

#[derive(Default, Debug, Clone)]
pub struct Config {
    pub debug_mode: bool,
}

#[allow(dead_code)]
impl Config {
    /// Resets the config to it's default state.
    ///
    /// # Examples
    ///
    /// ```Rust
    /// Config::reset();
    /// ```
    pub fn reset() {
        CURRENT_CONFIG.store(Arc::new(Config::default()));
    }

    /// Gets an immutable copy of the current state from thread local storage.
    ///
    /// # Examples
    ///
    /// ```Rust
    /// let config = Config::current();
    /// ```
    pub fn current() -> Arc<Config> {
        THREAD_CACHE.with(|c| c.borrow_mut().load().clone())
    }

    /// Sets the configuration as the current one.
    ///
    /// # Examples
    ///
    /// ```Rust
    /// Config{ debug_mode: true }.make_current();
    /// ```
    pub fn make_current(self) {
        CURRENT_CONFIG.store(Arc::from(self));
    }

    /// Updates the `debug_mode` of the current configuration.
    ///
    /// # Examples
    ///
    /// ```Rust
    /// Config::set_debug_mode(true);
    /// ```
    pub fn set_debug_mode(debug_mode: bool) {
        THREAD_CACHE.with(|c| {
            let mut b = c.borrow_mut();
            let config = b.load();
            if config.debug_mode == debug_mode {
                return;
            }
            let mut a = config.as_ref().clone();

            a.debug_mode = debug_mode;
            CURRENT_CONFIG.store(Arc::from(a));
        });
    }
}

/// Holds the shared current configuration
static CURRENT_CONFIG: Lazy<ArcSwap<Config>> =
    Lazy::new(|| ArcSwap::from_pointee(Config::default()));

thread_local! {
    /// Caches a copy of the current configuration per thread for speed.
    static THREAD_CACHE: RefCell<Cache<&'static ArcSwap<Config>, Arc<Config>>> = RefCell::new(Cache::from(CURRENT_CONFIG.deref()));
}

Craig Dean · Answer 5 · Wed Aug 18 2021 03:06:42 GMT+0800 (China Standard Time)

As for the performance, I think it should be faster ‒ even the ArcSwap internals access some thread locals and do some (relatively) expensive atomic operations. The Cache avoids the latter in the optimistic case when nothing changed. But as with any performance-sensitive code, you'd better measure it to verify (the chances are that the difference could be so small it wouldn't be worth the added complexity).

I've still to learn how best to do performance benchmarking, you have some examples in your codebase, but they look quite complicated! Once I've figured it out I will benchmark, though for now, the above code hides the ArcSwap and thread_local! usage from consumers, so I should be able to change it without impacting usage elsewhere.

I've also written a localization module, which exposes a macro that calls Config::current().language every time you look up a string, so my hope is that current() is fast and cheap.

Michal 'vorner' Vaner · Answer 6 · Mon Aug 23 2021 02:37:52 GMT+0800 (China Standard Time)

Hello

Few notes:

When setting things, you probably can just skip any kind of thread-local storage, because storing is already expensive and the load will get hidden in that. Besides, when adjusting the existing value, you may want to prefer rcu.
For the current, cloning the Arc is probably about as expensive, or more, than the load without the cache (it is a read-write atomic operation on a shared memory location ‒ there'll be contention and ping-pong between CPU cores). You'll want to avoid that. You want to do the operations inside the THREAD_LOCAL.with and not clone. I'm not sure if you actually care about this low-level speed difference, but if not, you can probably just skip all the THREAD_LOCAL caching altogether.
The easiest benchmarking is usually running a longer computation end-to-end and measuring that. You could probably do some micro-benchmark around the Config::current().language or so, but that avoids the comparison with the rest of the program.