Module: Collections

Strings

Rust Programming: Strings

Rust has a strong emphasis on memory safety, and this extends to how it handles strings. Unlike some other languages, Rust has a few different types for working with strings, each with its own characteristics and use cases. This document will cover the core concepts.

1. String vs. &str

This is the most fundamental distinction to understand.

  • String: A growable, heap-allocated, owned string. Think of it as a vector of bytes that's guaranteed to be valid UTF-8. You own the data, meaning you're responsible for its memory management. String is mutable.
  • &str: A string slice. It's a reference to a sequence of UTF-8 encoded bytes. It doesn't own the data; it borrows it from somewhere else (like a String or a string literal). &str is immutable by default.

Analogy:

Imagine a house.

  • String is like owning the house. You can renovate it, add rooms, etc.
  • &str is like renting a room in the house. You can look at it, but you can't change the house itself.

Example:

fn main() {
    // String (owned)
    let mut s = String::from("hello");
    s.push_str(", world!"); // Mutable - can modify
    println!("{}", s); // Output: hello, world!

    // &str (string slice - borrowed)
    let greeting: &str = "hello"; // String literal - creates a &str
    println!("{}", greeting); // Output: hello

    let part = &s[0..5]; // Create a &str slice from the String
    println!("{}", part); // Output: hello
}

2. Creating Strings

  • String Literals: String literals (e.g., "hello") are of type &str.
  • String::from(): Creates a String from a string literal or another &str.
  • String::new(): Creates an empty String.
  • to_string() method: Many types have a to_string() method that converts them to a String.
  • String::with_capacity(): Creates a String with a pre-allocated capacity, which can improve performance if you know the approximate size of the string beforehand.
fn main() {
    let s1 = String::from("hello");
    let s2: String = "world".to_string();
    let s3 = String::new();
    let s4 = String::with_capacity(10); // Pre-allocate space for 10 bytes
}

3. String Manipulation

Rust provides various methods for manipulating strings.

  • push_str(): Appends a &str to a String.
  • push(): Appends a single character to a String.
  • insert_str(): Inserts a &str at a given index.
  • insert(): Inserts a character at a given index.
  • remove(): Removes and returns a character at a given index.
  • pop(): Removes and returns the last character of a String.
  • replace(): Replaces all occurrences of a substring with another substring.
  • trim(): Removes leading and trailing whitespace.
  • split(): Splits a string into an iterator of substrings based on a delimiter.
  • contains(): Checks if a string contains a substring.
  • starts_with()/ends_with(): Checks if a string starts or ends with a substring.
fn main() {
    let mut s = String::from("hello");

    s.push_str(", world!");
    println!("{}", s); // Output: hello, world!

    s.insert_str(5, " beautiful");
    println!("{}", s); // Output: hello beautiful, world!

    let removed_char = s.remove(5);
    println!("Removed char: {}", removed_char); // Output: Removed char: b
    println!("{}", s); // Output: hello beautiful, world!

    let trimmed = s.trim();
    println!("{}", trimmed); // Output: hello beautiful, world!

    for part in s.split(", ") {
        println!("{}", part);
    }
    // Output:
    // hello beautiful
    // world!

    if s.contains("world") {
        println!("String contains 'world'");
    }
}

4. String Formatting

Rust provides powerful string formatting capabilities using the format! macro.

fn main() {
    let name = "Alice";
    let age = 30;

    let message = format!("My name is {} and I am {} years old.", name, age);
    println!("{}", message); // Output: My name is Alice and I am 30 years old.

    // Using named arguments
    let message2 = format!("{name} is {age} years old.", name = "Bob", age = 25);
    println!("{}", message2); // Output: Bob is 25 years old.
}

5. UTF-8 and Characters

Rust strings are UTF-8 encoded, meaning they can represent characters from any language. However, this also means that a single character might be represented by multiple bytes.

  • chars(): Returns an iterator over the characters in a string. This is the correct way to iterate over characters, as it handles multi-byte characters correctly.
  • bytes(): Returns an iterator over the bytes in a string.
  • len(): Returns the number of bytes in a string, not the number of characters.
  • chars().count(): Returns the number of characters in a string.
fn main() {
    let s = "你好,世界!"; // Chinese characters

    println!("Length in bytes: {}", s.len()); // Output: Length in bytes: 12
    println!("Length in chars: {}", s.chars().count()); // Output: Length in chars: 6

    for c in s.chars() {
        println!("{}", c);
    }
    // Output:
    // 你
    // 好
    // ,
    // 世
    // 界
    // !
}

6. Converting Between String and &str

  • String to &str: You can easily create a &str from a String using the dereference operator (&).

    let s = String::from("hello");
    let slice: &str = &s; // Borrowing the String as a &str
    
  • &str to String: Use to_string() or String::from().

    let slice: &str = "world";
    let s: String = slice.to_string(); // or String::from(slice);
    

Best Practices

  • Prefer &str when possible: If you don't need to modify the string, use &str to avoid unnecessary allocations and ownership transfers.
  • Use String::with_capacity() when appropriate: If you know the approximate size of the string beforehand, pre-allocating capacity can improve performance.
  • Be mindful of UTF-8: When working with strings, always use chars() to iterate over characters correctly. Avoid indexing directly into a string using byte indices, as this can lead to panics if you encounter a multi-byte character boundary.
  • Consider using string interning: For frequently used strings, consider using a string interning library to reduce memory usage.

This covers the essential aspects of working with strings in Rust. Remember to choose the appropriate string type (String or &str) based on your needs and to be mindful of UTF-8 encoding. The Rust documentation provides more detailed information and advanced features: https://doc.rust-lang.org/std/primitive.string.html