C++ : codecvt deprecated. Panic?

So little C++ so much good!

P0618RO:  “…The entire header <codecvt> (which does not contain the class codecvt!) is deprecated, as are the utilities wstring_convert and wbuffer_convert. These features are hard to use correctly, and there are doubts about whether they are even specified correctly. Users should use dedicated text-processing libraries instead…”

(update: for the comprehensive “all in one” solution please head here )

Therefore: C++17:codecvt is, officially, irrevocably, gone. For good. Deprecated.  And there is this highly suspicious: “Text-processing libraries” advice. Panic?  Please don’t.

Does this apply to you? My advice: “Stay cool calm and collected and all things will fall into place”. Read on.

Enter Standard C++

Let us assume you want to transform from let’s say wide string type to std::string type. Here is the standard C++ (17 and beyond) solution:

/* 
Transform any std string or string view
into any of the 4 the std string types,
Apache 2.0 (c) 2018 by DBJ.ORG
*/
template<typename T, typename F>
inline T 
  transform_to ( F str ) noexcept
{
// note: F has to have 
// the empty()method
if (str.empty())
    return {};
// note: F must be able to work 
// with std begin and end
   return { std::begin(str), std::end(str) };
// also the above line requires, T has a constructor
// that will take begin and end values of type F.
};

Just one function. Almost simple. I could have optimized it by checking if type to be transformed is the same as target type, but I will speculate no sane programmer will transform from std::string to std::string.

NOTE 1: This is a standard C++ standard way that does not cover the full UTF-8 variety of glyphs. WIN32 aficionados can use something like cppWINRT to_string. and cppWINRT to_hstring. But as all of you already knows, this is only to convert wchar_t and strings based on it, to form UTF-8.  Alas, that is not a portable solution. See NOTE3.

NOTE2: In case you see nothing wrong with this approach: it is indeed standard. And somewhat controversial, at the same time. This code is doing nothing but casting the chars from one to another C++ std chars type. And this works for the first 127 chars, for the English language speaking users and developers, that is. But not for the others. For a good introductory text please see here.

NOTE3:  In case you have thoughts like: “How do I transform utf8 to utf16 ..”, standard header <cuchar> is exactly what you need. Alas, still not fully implemented as C++20 specified. Not in any of three compilers.

As a remedy, I have found a very high-quality source and turned it into a single header C lib. Please see here. I am using it here.

NOTE4: In case you are looking for a true text internationalization and localization solution for your project, please start from here.

But now back to tiny dbj solution that covers 95% of use cases. Or is it more?

Usage

using namespace std;
using namespace std::string_view_literals;
using namespace std::string_literals;

string ss_ 
    = transform_to<string>(L"WIDE CHAR STRING VIEW"sv) ;

wstring ws_ 
    = transform_to<wstring>("CHAR STRING VIEW"sv) ;

wstring ws_2 
    = transform_to<wstring>("CHAR STRING"s) ;

Yes, I am using C++ string literals. They are a brilliant invention.

That  transform_to() the argument will take anything that is an std string, or std string view.

The F type has to be an std string or std string_view.  The T type has to be an std string. All 5 std string or view types will work. And what are those std types? A little explicit reminder follows:

using string 
  = basic_string<char, char_traits<char>, allocator<char>>;

using wstring 
  = basic_string<wchar_t, char_traits<wchar_t>, allocator<wchar_t>>;

using u16string 
  = basic_string<char16_t, char_traits<char16_t>, allocator<char16_t>>;

using u32string 
   = basic_string<char32_t, char_traits<char32_t>, allocator<char32_t>>;

// C++20 and beyond
using u8string 
   = basic_string<char8_t, char_traits<char8_t>, allocator<char8_t>>;

To “enforce” this rule, I could add type mismatch traps, in this code here. I decided not to. In case anybody uses this with wrong types she will be greeted with very long compiler errors.

Also, I could have done a lot more template jockeying in here. Using std::enable_if and such. Again I have decided that is counter-intuitive for the majority of readers/users and achieves little. Illegal usage will be simply stopped by a compiler.

When C++20 compilers become officially available, I might add some simple requires clauses.

Let’s deal with the natives

Back to the subject. The above solution will not work for native string literals. Try it.

What should we do? We could stop people trying to compile those pesky native string literals. How? By deleting the overload that has a pointer argument:

// stop the use with pointers
template<typename T, typename F> 
T transform_to( F * str) noexcept = delete;

// the following is now a compiler error
// required overload is explicitly deleted
auto s_ = transform_to<string>(L"WIDE NATIVE LITERAL");

char chararr[]{"narrow char array"};
// also does not compile, since chararr
// automatically decays to char *
auto s_ = transform_to<wstring>(chararr);

Great! I could bar any native literals usage and “force” users into standard C++ and standard C++ string and view literals only. But that is not very beginner-friendly.  Also, I like to provide comfortable API’s. So here is the overload that takes care of native string literals.

template<typename T, typename F, size_t N>
T transform_to( const F (&str)[N]) noexcept
{
// there is nothing to transform
if constexpr (N < 1) {
    return {};
}
else {
// else transform and return
    return { std::begin(str), std::end(str) };
}
};

In standard C/C++, a native string literal is compiled into a char array. So far we have two functions. One solution.

For an advanced version which consists of one function and does all of this, plus any other standard character sequence type please jump here.

Testing

Now let us imagine the solution sketched here, is all implemented. Here is one (almost) comprehensive test “suite”:

#define ST(x) #x
#define TT(x) \
 std::cout << "\n" << ST(x) << "\n\t-->" << (x)

using namespace std; 
using namespace std::string_view_literals; 
using namespace std::string_literals;

template<typename T>
void transformation_test() {
TT(transform_to<T>(L"WIDE CHAR STRING VIEW"sv));
TT(transform_to<T>(L"WIDE CHAR STRING"s));
TT(transform_to<T>(L"CHAR STRING VIEW"sv));
TT(transform_to<T>(L"CHAR STRING"s));
char carr[]{ "CHAR ARRAY" };
TT(transform_to<T>(carr));
TT(transform_to<T>("NATIVE LITERAL"));
TT(transform_to<T>(L"WIDE NATIVE LITERAL"));
}
// using the above
int main() 
{

// transform to all 5 std strings
transformation_test<string>();
transformation_test<wstring>();
transformation_test<u16string>();
transformation_test<u32string>();
// C++20
transformation_test<u8string>();
return 1;
}

To be 100% comprehensive there are more tests one can imagine here. I am sure if you have been reading until this point, you will understand they might be redundant.

And that is it. No codecvt required. Enjoy the standard C++.

So little C++ so much good!
So little C++ so much good!