Something I see somewhat regularly when engineers adopt Rust, almost regardless
of their level of experience, is the confusion between
&str and
String. To
add to the confusion, both are used in different situations, one of them can
transparently convert to the other under the correct conditions, and there is
generally an optimal choice to be made.
An important fact to notice is that &str is a reference, which means it does
not own the memory holding the string, it is merely a pointer to a string
elsewhere. This means it has a known sizeThe same as a
usize. and can thus be stored on the
stack instead of on the heap, but limits where it can be passed. It can only be
alive as long as its source also is alive.
It is worth mentioning that most of this also holds for
Vec vs.
slice.
Function Arguments
If you are writing a function, it depends on whether you need a String within
your function, or if a &str will do. In general, we want to be as permissive as
possible when accepting arguments, which means accepting a &str.
fn show(s: &str) {
println!("{s}");
}
show("foo");
show(&String::from("foo"));
This works because String has an implementation for Deref<Target = str>, which
means it can be dereferenced into a &str that is just a pointer to its
internal string slice. This in turn implies that the &str generated from
dereferencing is only alive as long as the String is alive, as discussed
above.
Should the function require a String, for example to insert it into a struct,
the best solution is to accept an impl Into<String>.
struct Foo {
s: String,
}
impl Foo {
fn new(s: impl Into<String>) -> Self {
Self {
s: s.into(),
}
}
}
Foo::new("foo");
Foo::new(String::from("foo"));
This interface allows the caller to pass either type, and if they pass a
String the .into() gets optimized away into a no-op. In comparison,
accepting a &str and constructing a String within the function would be
similarly permissive, but wasteful if the caller already had a String.
Function Return Values
When returning a string from a function, the decision depends a bit more on
context. Return a &str in one of two situations:
- You are returning a static string that you don’t expect to be mutated later on, such as an error or log message.
fn foo() -> &'static str {
"foo"
}
- You are extracting the returned string from one of the function arguments,
such as a substring matching some precondition. In this case it is usually
prudent to not pay the cost to allocate a new
Stringif the caller might not need it, and leave it up to the caller instead.
fn start(s: &str) -> &str {
&s[..5]
}
Otherwise, return a String. The compiler will often force you to do so when
appropriate.
Struct Fields
Struct fields mostly match the second rule of function return values,Embedding static strings in a struct is possible, but I have yet to see a
use case for it. embed
a &str in a struct only if the struct will relate to the origin of the &str.
Again, thinking about finding substrings, one could imagine this struct:
struct SubstringMatch<'a> {
start: usize,
text: &'a str,
}
The compiler will force you to spell out the lifetimes, commonly <'a> by
convention, which in this case dictates that the entire SubstringMatch struct
can only be alive as long as the &str in the text field is, otherwise we
would have a pointer pointing to unknown memory. This means SubstringMatch is
only useful in contexts where the entire original string is also alive in, which
is not an unreasonable assumption. Incidentally it also means the original
string cannot be mutated while a SubstringMatch is holding a reference to it.
In return we gain some efficiency by embedding a pointer into the original
string instead of constructing a new String every time, using up memory to
store duplicate strings and paying the performance penalty for extra
allocations.