P0618RO: “…The entire header <codecvt>
(which does not contain the class codecvt
!) is deprecated, as are the utilities wstring_convert
and wbuffer_convert
. These features are hard to use correctly, and there are doubts about whether they are even specified correctly. Users should use dedicated text-processing libraries instead…”
(In a rush? For the comprehensive “all in one” solution please goto here )
Therefore: C++17:codecvt
is, officially, irrevocably, gone. For good. Deprecated. And there is this highly suspicious: “Text-processing libraries” advice. Panic? Please don’t. My advice? “Stay cool calm and collected and all things will fall into place”. Read on.
Enter Standard C++
Let us assume you want to transform from let’s say wide string type to std::string type
. Here is the standard C++ (17 and beyond) solution:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
/* Transform any std string or string view into any of the 4 std string types, (c) 2018-2022 by dbj at dbj dot org https://dbj.org/license_dbj or CC BY SA 4.0 */ template<typename T, typename F> inline T transform_to ( F str ) noexcept { // note: F has to have // the empty()method if (str.empty()) return {}; // note: F must be able to work // with std begin and end return { std::begin(str), std::end(str) }; // also the above line requires, T has a constructor // that will take begin and end values of type F. }; |
Just one function. Almost simple. I could have optimized it by checking if the type to be transformed is the same as the target type, but I will bravely speculate no sane programmer will transform from std::string to std::string.
NOTE 1: This is a standard C++ standard way. For WIN32 aficionados this is apparently not exactly the way. They would need to use something like cppWINRT to_string.
NOTE2: In case you see nothing wrong with this approach: it is indeed standard. And somewhat controversial, at the same time. This code is doing nothing but casting the chars from one to another std
char type. And this works for the first 127 chars, for the English language speaking users and developers, that is. But not for the others. For a good introductory text please see here.
[Update 2021-09-05] Here is the link to the whys and hows of Unicode, with a focus on Windows. I, at last, managed to find the time.
NOTE3: In case you have thoughts like: “How do I transform utf8 to utf16 ..”, or in case you are looking for a true text internationalization and localization solution for your project, please start from here.
Usage
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
using namespace std; using namespace std::string_view_literals; using namespace std::string_literals; // from wide char string view to std::string string ss_ = transform_to<string>(L"WIDE CHAR STRING VIEW"sv) ; // from narrow char string view to wide char string wstring ws_ = transform_to<wstring>("CHAR STRING VIEW"sv) ; // from std string to wide std string wstring ws_2 = transform_to<wstring>("CHAR STRING"s) ; |
Yes, I am using C++ string literals. They are brilliant inventions. That transform_to()
first template argument will take anything that is an std string, or std string view.
1 |
template<typename T, typename F> inline T transform_to ( F str ) noexcept; |
The F type has to be an std string or std string_view. All 5 std string or view types will work. A little reminder on who are they, follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
using string = basic_string<char, char_traits<char>, allocator<char>>; using wstring = basic_string<wchar_t, char_traits<wchar_t>, allocator<wchar_t>>; using u16string = basic_string<char16_t, char_traits<char16_t>, allocator<char16_t>>; using u32string = basic_string<char32_t, char_traits<char32_t>, allocator<char32_t>>; // // char8_t is a very deep rabbit hole // https://stackoverflow.com/questions/57402464/is-c20-char8-t-the-same-as-our-old-char // my advice is not to use it using u8string = basic_string<char8_t, char_traits<char8_t>, allocator<char8_t>>; |
I could add type mismatch traps, in this code here. I decided not to. In case anybody uses this with the wrong types she will be greeted with very long compiler errors. Thus the mistake will be obvious.
Also, I could have done a lot more template jockeying in here. Using std::enable_if
and such. Again I have decided that is counter-intuitive for the majority of readers/users and achieves little. Illegal usage will be simply stopped by a compiler.
Let’s deal with the natives
Back to the subject. The above solution will not work for native string literals. Try it. What should we do? We could stop people trying to compile those pesky native string literals. How? By deleting the overload that has a pointer argument:
1 2 3 4 5 6 7 8 9 10 11 12 |
// stop the use with pointers template<typename T, typename F> T transform_to( F * str) noexcept = delete; // the following is now a compiler error // required overload is explicitly deleted auto s_ = transform_to<string>(L"WIDE NATIVE LITERAL"); char chararr[]{"narrow char array"}; // also does not compile, since chararr // automaticaly decays to char * auto s_ = transform_to<wstring>(chararr); |
Great! I could bar any native literals usage and “force” users into standard C++ and standard C++ string and view literals only.
But that is not very beginner-friendly. Also, I like to provide comfortable APIs. So here is the overload that takes care of native string literals.
1 2 3 4 5 6 7 8 9 10 11 12 |
template<typename T, typename F, size_t N> T transform_to( const F (&str)[N]) noexcept { // there is nothing to transform if constexpr (N < 1) { return {}; } else { // else transform and return return { std::begin(str), std::end(str) }; } }; |
In standard C/C++, a native string literal is compiled into a char array. So far we have two functions. One solution. In case you are fond of “advanced”, for an advanced version that consists of one function and does all of this, plus any other standard character sequence type please jump here.
Testing
Now let us imagine the solution sketched here, is all implemented. Here is one (almost) comprehensive test “suite”:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
#define ST(x) #x #define TT(x) \ std::cout << "\n" << ST(x) << "\n\t-->" << (x) using namespace std; using namespace std::string_view_literals; using namespace std::string_literals; template<typename T> void transformation_test() { TT(transform_to<T>(L"WIDE CHAR STRING VIEW"sv)); TT(transform_to<T>(L"WIDE CHAR STRING"s)); TT(transform_to<T>(L"CHAR STRING VIEW"sv)); TT(transform_to<T>(L"CHAR STRING"s)); char carr[]{ "CHAR ARRAY" }; TT(transform_to<T>(carr)); TT(transform_to<T>("NATIVE LITERAL")); TT(transform_to<T>(L"WIDE NATIVE LITERAL")); } // using the above int main() { // transform to all 4 std strings // ignoringthe the 5th trouble maker transformation_test<string>(); transformation_test<wstring>(); transformation_test<u16string>(); transformation_test<u32string>(); return 1; } |
To be 100% comprehensive there are more tests one can imagine here. I am sure if you have been reading until this point, you will understand they might be redundant.
And that is it. No codecvt
in sight. Enjoy the standard C++.
CAVEAT EMPTOR: I am consciously avoiding char8_t
. My reasons are explained here.