All of the String types

Started: January 14, 2026

Finished: March 10, 2026

Released: March 10, 2026

Last Revision: April 16, 2026

String types, can’t live with them, can’t live without them.

I wrote this because I couldn’t find a comprehensive list anywhere else.

Lots of languages have various string types depending on their use case. I come from rust where we have 3 main string types String, &str and Vec<char>. This already is a lot more string types than what some people would want. So let’s recount all languages I can find strings types on.

Rust

Rust has String, and &str primarily. So what do they all do?

None of these types are null-terminated.

A char is a unicode code point (32 bit integer).

String is a growable array of chars essentially. The String type is like a Vec<u8> with associated functions. Under the hood String is one field called vec with the type Vec<u8>.

There is also str which is an unsized type, that is represented as a sequence of bytes. strs also are always UTF-8 compliant. Unsized types in rust means the size of the type is not known at compile time. This inability of knowing the size at compile-time makes it so that you cannot create a str ever. This unsizedness creates problems that &str solves.

So what is &str? We don’t know the size of it in memory but the & can reference it and make it sized for us.

There is also &mut str. &mut str is a non-growable array of characters in which each individual byte can be mutated.

String and &str have a unique relationship as well. &str can act as a slice, which in rust terms mean it’s a reference to a portion of String’s contents.

There is also OsString and OsStr which are intermediates between Rust and an OS’ string type.

OsString is to OsStr as String is to &str. OsString is represented differently per platform. If your OS is windows then it will be represented as wtf-8, if you’re on linux it will be represented as plain bytes.

CString and CStr have the same relationship as String and &str. They are also encoded as a list of bytes that end in a zero.

C

char * / char [] is the most common string type in C and usually is immutable and non-growable. You also don’t always own the memory associated with it. It’s also null-terminated with the \0 byte. chars are at least 1 byte in size as well.

There are times where you want to own the memory and have it be growable and mutable. You can achieve this growable string through allocating a char * which you own and have to manage the memory of.

There is const char * which doesn’t allow the char to be mutable.

There is also char [N] where N is the length of the array; which doesn’t need to be null-terminated, but can be.

There is wchar_t * which is can be 2 or 4 bytes in size. It has the same situation as char * and is typically used as a utf-16 or utf-32.

There is also char16_t, char32_t, and char8_t which are used for their respective unicode code point sizes.

C++

Since C++ is a superset of C, it has many of the same features but more!

C++ has string which is a basic_string<char>. So what is basic_string then?

basic_string is a sequence of character-like objects. basic_string is not dependent on the character, or the operations on the character, which makes it extremely generic.

Under the hood basic_string<T> is also growable and each T is mutable. Internally basic_string stores its length as well.

Here is a table of C++ strings and their equivalent types:

Type	Definition
`std::string`	`std::basic_string<char>`
`std::wstring`	`std::basic_string<wchar_t>`
`std::u8string`	`std::basic_string<char8_t>`
`std::u16string`	`std::basic_string<char16_t>`
`std::u32string`	`std::basic_string<char32_t>`

Go

Go has string, []byte, and []rune.

None of these types are null-terminated.

string is a simple string type that in memory is a pointer to bytes with a length. This is also immutable and non-growable, as it just exists somewhere in memory.

[]byte is an array of bytes. This array of bytes can be grown to any size and is mutable. These bytes don’t account for unicode encoding, which is where the rune comes in.

[]rune is a array of runes which is also growable and mutable. runes are unicode aware characters. They are essentially 32 bit long integers.

Zig

Zig technically doesn’t have a string type by name, but it does have []const u8, []u8, [:0]const u8, [N]u8, [N:0]u8, and ArrayList(u8).

[]const u8 is an immutable, non-null-terminated, non-growable slice of bytes. This is equivalent to rust’s &str or go’s string.

[]u8 is a mutable, non-growable, non-null-terminated slice of bytes. Equivalent to go’s []byte.

[:0]const u8 is a immutable, non-growable, null-terminated slice of bytes. This is more similar to C’s char [] but immutable.

[:0]u8 is a mutable, null-terminated, non-growable slice of bytes. This is equivalent to C’s char [].

[N]u8 is an slice that is size N that is non-growable, mutable, and non-null-terminated.

[N:0]u8 is null-terminated, and otherwise the same as [N]u8.

ArrayList(u8) is a growable, non-null-terminated, mutable array of bytes.

All of these types combine in different use cases in zig to be used as strings.

Zig strings typically aren’t unicode compliant by default, and that is by design. We can get around this by using a u21, which is a unicode code point. You can create a ArrayList(u21) which is a list of unicode code points similarly to rust’s Vec<char> type.

Java

String which is an immutable non-growable byte array that is unicode-aware.

StringBuilder which is a mutable growable byte array that is unicode aware. This is typically used for building strings.

There is also StringBuffer which is a mutable growable byte array that is unicode aware but is also thread-safe.

You can also use char which is a utf-16 code point. This isn’t used much typically. But you can use it as an array as string type.

C#

In C# there is string, and StringBuilder

C# also has the char type which is 16 bits in width.

string is immutable, non-growable array of UTF-16 encoded chars. It is also an alias for the System.String class.

Along with this there is StringBuilder which is almost identical to java’s StringBuilder. It’s a mutable, growable, non-null-terminated array of charaters.

Python

Python string types come in the form of str and list[str].

str is an immutable, non-growable list of characters.

Counter intuitively list[str] is a mutable, growable list of characters. You can modify this in-place.

Swift

Swift has a non-traditional approach to strings. It has these string types String, and NSString.

Swift Strings are composed of Characters which are a grapheme cluster. This means that Characters are variable length depending on the character. One character is one human perceived character. e is one character, and é is also one character. This type also can be mutated, which copies on write.

Along with Characters there are Unicode.Scalars which act as utf-32 codepoints, similar to rust’s char.

There is also UInt16, and UInt8, which are used as unicode characters.

Swift also has NSString which is an Objective-C string type that is utf-16 based and is a reference type. There is a mutable variant of this called NSMutableString.

Pascal

Pascal has String, ShortString, AnsiString, UnicodeString, UTF8String, UTF16String, and WideString.

String can either be a ShortString, or an AnsiString, depending on what {$H} is. Unless you’re familiar with Pascal you probably don’t know what {$H} is. {$H} is a compiler directive that turns AnsiStrings on. If {$H} is on String is AnsiString instead of ShortString.

So what is ShortString? ShortStrings have a maximum of 255 characters. The internal structure is an array of characters but the first byte will be the length of the string. Each character is one byte.

AnsiStrings have no length limit and are reference counted and are always null-terminated. Internally there are 8 bytes dedicated to the reference count, and 8 bytes dedicated to the length of the string. AnsiStrings are allocated to the heap as well. Each character is also one byte.

UnicodeString has the same structure as AnsiString, but instead of using normal characters which are one byte each; they use WideChars which are 2 bytes in size.

In older versions of pascal (before FPC 2.7.1) UTF8String was an alias to AnsiString. In more recent versions of pascal (equal to, or above FPC 2.7.1) UTF8String is now defined as UTF8String = type AnsiString(CP_UTF8). Each character can now be 1 to 4 bytes in length.

UTF16String is an alias to WideString. What is WideString then?

WideString has the layout of 8 bytes dedicated to the length, and then a sequence of characters afterward. The sequence is null-terminated. Each character is UTF16 encoded making them 2 bytes in size.

Haskell

I do not think I can do it justice so I suggest reading: https://hasufell.github.io/posts/2024-05-07-ultimate-string-guide.html

Table of Contents