Shared State | Concurrency Tutorial

Rust Concurrency: Shared State

Rust's ownership and borrowing system is a powerful tool for preventing data races in concurrent programs. However, sometimes you need to share data between threads. This is where things get more complex, and Rust provides several mechanisms to manage shared state safely.

The Problem: Data Races

A data race occurs when multiple threads access the same memory location concurrently, and at least one of them is writing, without any synchronization. This can lead to unpredictable and often difficult-to-debug behavior.

Rust's compiler actively prevents data races at compile time through its ownership and borrowing rules. However, these rules become restrictive when you need to share data.

Mechanisms for Shared State

Rust offers several ways to safely share state between threads:

Mutex<T> (Mutual Exclusion Lock)
RwLock<T> (Read-Write Lock)
Atomic<T> (Atomic Types)
Channels (Message Passing) - While not strictly shared state, they're a crucial concurrency primitive.

Let's explore each of these in detail.

1. `Mutex<T>`

Purpose: Provides exclusive access to data. Only one thread can hold the lock at a time.
How it works: A Mutex wraps a value T. To access the value, a thread must acquire the lock. Once acquired, no other thread can acquire the lock until the first thread releases it.
Safety: Guarantees that only one thread can modify the data at any given time, preventing data races.
Usage:

use std::sync::{Mutex, Arc};
use std::thread;

fn main() {
    // Wrap the shared data in a Mutex.
    let counter = Arc::new(Mutex::new(0)); // Arc for sharing across threads

    let mut handles = vec![];

    for _ in 0..10 {
        let counter = Arc::clone(&counter); // Clone the Arc, not the Mutex!
        let handle = thread::spawn(move || {
            let mut num = counter.lock().unwrap(); // Acquire the lock
            *num += 1; // Modify the data
            // Lock is automatically released when `num` goes out of scope
        });
        handles.push(handle);
    }

    for handle in handles {
        handle.join().unwrap();
    }

    println!("Result: {}", *counter.lock().unwrap()); // Access the final result
}

Key Points:
- Arc<Mutex<T>>: Arc (Atomic Reference Counting) is used to share ownership of the Mutex across multiple threads. Without Arc, the Mutex would be moved into the first thread and inaccessible to others.
- lock().unwrap(): Acquires the lock. unwrap() handles potential poisoning (explained below).
- Lock Guard: The lock() method returns a MutexGuard. This guard provides access to the underlying data. The lock is automatically released when the MutexGuard goes out of scope.
- Poisoning: If a thread panics while holding the lock, the Mutex becomes poisoned. Subsequent attempts to lock the poisoned Mutex will return an Err result. This indicates that the data might be in an inconsistent state. unwrap() will panic in this case. You can use lock().expect() for a more descriptive error message, or handle the Err case explicitly.

2. `RwLock<T>`

Purpose: Allows multiple readers or a single writer.
How it works: RwLock allows multiple threads to read the data concurrently, but only one thread can write to the data at a time. This is useful when reads are much more frequent than writes.
Safety: Prevents data races by ensuring exclusive write access.
Usage:

use std::sync::{RwLock, Arc};
use std::thread;

fn main() {
    let data = Arc::new(RwLock::new(vec![1, 2, 3]));

    let mut handles = vec![];

    // Multiple readers
    for i in 0..5 {
        let data = Arc::clone(&data);
        let handle = thread::spawn(move || {
            let read_guard = data.read().unwrap();
            println!("Reader {}: {:?}", i, *read_guard);
        });
        handles.push(handle);
    }

    // Single writer
    let data = Arc::clone(&data);
    let handle = thread::spawn(move || {
        let mut write_guard = data.write().unwrap();
        write_guard.push(4);
        println!("Writer: Added 4");
    });
    handles.push(handle);

    for handle in handles {
        handle.join().unwrap();
    }

    println!("Final data: {:?}", *data.read().unwrap());
}

Key Points:
- read(): Acquires a read lock. Multiple threads can hold read locks simultaneously.
- write(): Acquires a write lock. Only one thread can hold a write lock at a time, and no threads can hold read locks while a write lock is held.
- Performance: RwLock can be more efficient than Mutex when reads are much more frequent than writes.

3. `Atomic<T>`

Purpose: Provides atomic operations on primitive types.
How it works: Atomic types guarantee that operations on the value are performed as a single, indivisible unit. This avoids data races without the need for explicit locking.
Safety: Guarantees atomicity, preventing data races for simple operations.
Usage:

use std::sync::atomic::{AtomicUsize, Ordering};
use std::thread;

fn main() {
    let counter = AtomicUsize::new(0);

    let mut handles = vec![];

    for _ in 0..10 {
        let counter = &counter;
        let handle = thread::spawn(move || {
            for _ in 0..1000 {
                counter.fetch_add(1, Ordering::SeqCst); // Atomic increment
            }
        });
        handles.push(handle);
    }

    for handle in handles {
        handle.join().unwrap();
    }

    println!("Result: {}", counter.load(Ordering::SeqCst));
}

Key Points:
- AtomicUsize, AtomicI32, etc.: Rust provides atomic wrappers for various primitive types.
- fetch_add(), load(), store(), compare_exchange(): Atomic operations.
- Ordering: Specifies the memory ordering constraints. SeqCst (Sequential Consistency) is the strongest and most intuitive ordering, but it can be the slowest. Other orderings (e.g., Relaxed, Acquire, Release) offer different performance trade-offs. Choosing the correct ordering is crucial for performance and correctness. Understanding memory ordering is a complex topic.

4. Channels (Message Passing)

While not directly shared state, channels are a fundamental concurrency primitive that often avoids the need for shared state altogether.

Purpose: Allows threads to communicate by sending and receiving messages.
How it works: A channel has a sender and a receiver. The sender sends messages, and the receiver receives them.
Safety: Avoids data races by transferring ownership of data between threads.
Usage: (See separate documentation on channels for a more detailed example)

use std::sync::mpsc;
use std::thread;

fn main() {
    let (tx, rx) = mpsc::channel();

    thread::spawn(move || {
        let val = String::from("hello");
        tx.send(val).unwrap();
    });

    let received = rx.recv().unwrap();
    println!("Got: {}", received);
}

Choosing the Right Mechanism

Mutex: General-purpose locking for exclusive access. Use when you need to protect complex data structures from concurrent modification.
RwLock: Optimized for scenarios with many readers and few writers.
Atomic<T>: For simple, atomic operations on primitive types. Avoids locking overhead.
Channels: Often the best choice when you can avoid shared state altogether. Promotes a more robust and maintainable concurrent design.

Important Considerations

Deadlock: Occurs when two or more threads are blocked indefinitely, waiting for each other to release locks. Carefully design your locking strategy to avoid deadlocks.
Livelock: Similar to deadlock, but threads are not blocked; they are constantly retrying operations that always fail.
Performance: Locking can introduce overhead. Minimize the time spent holding locks. Consider using atomic operations or channels when appropriate.