What is “small string optimization”?
Standard C++ string stores its data on the heap. But that is only true if the string grows over an implementation-dependent size. That predefined size for std::string is/was 15 for MSVC and GCC and 23 for Clang. That is: the C++ string stays “small” if you have not asked for bigger than 15/23 sized strings. The string will not attempt to grow its storage on the heap if it can stay small.
Heap memory allocations/de-allocations are taking a lot of time when compared to most standard C run time calls.
Thus if you avoid them your program will run faster and will consume less memory.
In the case of strings (plural, there are several predefined string types in C++) you do this by always making strings of a certain “smallish” predefined size so that majority of your program string usage does not use heap. But still operates on usable strings.
So, in essence, you always want to create a string of a certain usable size/capacity, before it is being used. And yes, 15 is very small in size. So, basically, each time you need to specifically reserve some larger string and then use it. And that is tedious, error-prone and easy to forget or avoid.
For you, I have prepared a string utility function that will encapsulate making a string of predefined size. And this is how one would use it.
1 2 3 4 5 6 7 |
/* (c) 2018 by dbj.org */ auto optimized_small_string = dbj::str::optimal<char>() /* size and the capacity of the above string are 255 it is also initialized with 255 ends of strings aka '\0' */ |
If your team or you always use this to create strings, it is very likely the resulting programs will be faster and will take less memory. The predefined size in there is 255. Probably it took many meetings to arrive at it but still 255 is an arbitrary size.
For any program, you should try different sizes and measure the results. Of course, any fundamental char type can also be used. Few examples:
1 2 3 4 5 |
auto os2 = dbj::str::optimal<wchar_t>( 1024 ); auto os3 = dbj::str::optimal<char16_t>( 512 , u'='); auto os4 = dbj::str::optimal<char32_t>( 128 , U'+' ); |
Please try this little utility. You can achieve sometimes dramatic gains in speed and memory consumption.
And this is the code for your little library of utilities.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
namespace dbj::str { // probably the 0xFF has surfaced as a winner after // many weeks of glorious battles constexpr inline size_t optimal_string_size = 0xFF; /* (c) dbj@dbj.org, https://dbj.org/license_dbj Make a string optimized for small sizes */ template < typename CT, typename string_type = std::basic_string< CT >, typename char_type = typename string_type::value_type, typename size_type = typename string_type::size_type > inline string_type optimal ( size_type OPT_SIZE = optimal_string_size, char_type init_char_ = static_cast<char_type>(0) ) { // in DEBUG builds stop the rebelious behaviour assert( OPT_SIZE < optimal_string_size ); return string_type( OPT_SIZE, init_char_ ); } } // dbj::str |
How is this working? That code uses the std::basic_string
constructor that pre-allocates memory of the required size the string will be using. It does not rely on inbuilt small size optimization at all.
Caveat Emptor
Of course, that is not the actual small string optimization tamed. The whole std::basic_string<>
machinery stays inside ready to start managing the internal dynamic storage as soon as you step out of bounds. And that is where it is back to slow.