C++ : codecvt deprecated. Panic?

So little C++ so much good!

P0618RO:  “…The entire header <codecvt> (which does not contain the class codecvt!) is deprecated, as are the utilities wstring_convert and wbuffer_convert. These features are hard to use correctly, and there are doubts about whether they are even specified correctly. Users should use dedicated text-processing libraries instead…”

(update: for the comprehensive “all in one” solution please head here )

Therefore: C++17:codecvt is, officially, irrevocably, gone. For good. Deprecated.  And there is this highly suspicious: “Text-processing libraries” advice. Panic?  Please don’t.

Does this apply to you? My advice: “Stay cool calm and collected and all things will fall into place”. Read on.

Enter Standard C++

Let us assume you want to transform from let’s say wide string type to std::string type. Here is the standard C++ (17 and beyond) solution:

Just one function. Almost simple. I could have optimized it by checking if type to be transformed is the same as target type, but I will speculate no sane programmer will transform from std::string to std::string.

NOTE 1: This is a standard C++ standard way that does not cover the full UTF-8 variety of glyphs. WIN32 aficionados can use something like cppWINRT to_string. and cppWINRT to_hstring. But as all of you already knows, this is only to convert wchar_t and strings based on it, to form UTF-8.  Alas, that is not a portable solution. See NOTE3.

NOTE2: In case you see nothing wrong with this approach: it is indeed standard. And somewhat controversial, at the same time. This code is doing nothing but casting the chars from one to another C++ std chars type. And this works for the first 127 chars, for the English language speaking users and developers, that is. But not for the others. For a good introductory text please see here.

NOTE3:  In case you have thoughts like: “How do I transform utf8 to utf16 ..”, standard header <cuchar> is exactly what you need. Alas, still not fully implemented as C++20 specified. Not in any of three compilers.

As a remedy, I have found a very high-quality source and turned it into a single header C lib. Please see here. I am using it here.

NOTE4: In case you are looking for a true text internationalization and localization solution for your project, please start from here.

But now back to tiny dbj solution that covers 95% of use cases. Or is it more?

Usage

Yes, I am using C++ string literals. They are a brilliant invention.

That  transform_to() the argument will take anything that is an std string, or std string view.

The F type has to be an std string or std string_view.  The T type has to be an std string. All 5 std string or view types will work. And what are those std types? A little explicit reminder follows:

To “enforce” this rule, I could add type mismatch traps, in this code here. I decided not to. In case anybody uses this with wrong types she will be greeted with very long compiler errors.

Also, I could have done a lot more template jockeying in here. Using std::enable_if and such. Again I have decided that is counter-intuitive for the majority of readers/users and achieves little. Illegal usage will be simply stopped by a compiler.

When C++20 compilers become officially available, I might add some simple requires clauses.

Let’s deal with the natives

Back to the subject. The above solution will not work for native string literals. Try it.

What should we do? We could stop people trying to compile those pesky native string literals. How? By deleting the overload that has a pointer argument:

Great! I could bar any native literals usage and “force” users into standard C++ and standard C++ string and view literals only. But that is not very beginner-friendly.  Also, I like to provide comfortable API’s. So here is the overload that takes care of native string literals.

In standard C/C++, a native string literal is compiled into a char array. So far we have two functions. One solution.

For an advanced version which consists of one function and does all of this, plus any other standard character sequence type please jump here.

Testing

Now let us imagine the solution sketched here, is all implemented. Here is one (almost) comprehensive test “suite”:

To be 100% comprehensive there are more tests one can imagine here. I am sure if you have been reading until this point, you will understand they might be redundant.

And that is it. No codecvt required. Enjoy the standard C++.

So little C++ so much good!
So little C++ so much good!