C++ : codecvt deprecated. Panic?

So little C++ so much good!

 

P0618RO:  “…The entire header <codecvt> (which does not contain the class codecvt!) is deprecated, as are the utilities wstring_convert and wbuffer_convert. These features are hard to use correctly, and there are doubts about whether they are even specified correctly. Users should use dedicated text-processing libraries instead…”

(In a rush? For the comprehensive “all in one” solution please goto here )

Therefore: C++17:codecvt is, officially, irrevocably, gone. For good. Deprecated.  And there is this highly suspicious: “Text-processing libraries” advice. Panic?  Please don’t. My advice? “Stay cool calm and collected and all things will fall into place”. Read on.

Enter Standard C++

Let us assume you want to transform from let’s say wide string type to std::string type. Here is the standard C++ (17 and beyond) solution:

Just one function. Almost simple. I could have optimized it by checking if the type to be transformed is the same as the target type, but I will bravely speculate no sane programmer will transform from std::string to std::string.

NOTE 1: This is a standard C++ standard way. For WIN32 aficionados this is apparently not exactly the way. They would need to use something like cppWINRT to_string.

NOTE2: In case you see nothing wrong with this approach: it is indeed standard. And somewhat controversial, at the same time. This code is doing nothing but casting the chars from one to another std char type. And this works for the first 127 chars, for the English language speaking users and developers, that is. But not for the others. For a good introductory text please see here.

[Update 2021-09-05] Here is the link to the whys and hows of Unicode, with a focus on Windows. I, at last, managed to find the time.

NOTE3:  In case you have thoughts like: “How do I transform utf8 to utf16 ..”, or in case you are looking for a true text internationalization and localization solution for your project, please start from here.

Usage

Yes, I am using C++ string literals. They are brilliant inventions. That  transform_to()  first template argument will take anything that is an std string, or std string view.

The F type has to be an std string or std string_view.  All 5 std string or view types will work. A little reminder on who are they, follows:

I could add type mismatch traps, in this code here. I decided not to. In case anybody uses this with the wrong types she will be greeted with very long compiler errors. Thus the mistake will be obvious.

Also, I could have done a lot more template jockeying in here. Using std::enable_if and such. Again I have decided that is counter-intuitive for the majority of readers/users and achieves little. Illegal usage will be simply stopped by a compiler.

Let’s deal with the natives

Back to the subject. The above solution will not work for native string literals. Try it. What should we do? We could stop people trying to compile those pesky native string literals. How? By deleting the overload that has a pointer argument:

Great! I could bar any native literals usage and “force” users into standard C++ and standard C++ string and view literals only.

But that is not very beginner-friendly.  Also, I like to provide comfortable APIs. So here is the overload that takes care of native string literals.

In standard C/C++, a native string literal is compiled into a char array. So far we have two functions. One solution. In case you are fond of “advanced”, for an advanced version that consists of one function and does all of this, plus any other standard character sequence type please jump here.

Testing

Now let us imagine the solution sketched here, is all implemented. Here is one (almost) comprehensive test “suite”:

To be 100% comprehensive there are more tests one can imagine here. I am sure if you have been reading until this point, you will understand they might be redundant.

And that is it. No codecvt in sight. Enjoy the standard C++.

CAVEAT EMPTOR: I am consciously avoiding char8_t . My reasons are explained here.