C++ Windows Unicode Console Output

Update 2022 12 13

This is me thinking. Windows is UTF16 OS; why try to make it UTF8 OS? Just leave it alone and let it be wide-char OS.

And all is dandy in the winland of sugar and candy …

Update 2018 12 07

The key problem, as explained here, is now acknowledged by Visual Studio / MSVC team. Please follow here.

Update 2018 12 05

Many thanks to many of you visiting this post every day.  As a “reward”, I have made the small console app to help you set up your Windows console to show UTF-8 text properly.

DBJ UTF8 Console set up aid

This little program is, of course, free and you can download it (zipped) from here, straight away

For the developer in you and if you want to see how is UTF-8 exactly to be done for Windows console program please visit the  GitHub repository here.

Enjoy.

2018JAN10

Note: I have this in a form of a modern C++ lib. It is just not yet presentable. Stay tuned.

Note: bellow is (some kind of) research notes in reverse chronological order. Newest on the top. So what is bellow might be invalidated with what came latter or up that is.

Note: Choice of  console font is here what gives results to this problem. Yet another unknown in the windows console equation, but alas necessary. Yes my lib to be released allows for console font change too.

Note: After decades of substandard console experience, users of Windows REDSTONE 4 will at last experience an much improved console. ANSI color escape codes and the rest.  We shall see and test again.

By default, Windows console is not Unicode capable.

if you are surprised by this and you are a beginner to a mid-level C++ developer, I might suggest you want to read this post.

2017-04-16

I have bumped today into THIS article. It seems author is some very important person when Unicode programming is concerned. Of course, I have tried that praised solution, which is in a nutshell same as my original gist, and is of course crashing the UCRT. [ps2id id=’UCRT_CRASH’ target=”/]

As far as I do remember (I have not re-checked). Both printf and wprinf will have to behave exactly the same. But I assume that MSDN assumption is from before UCRT times. As the above article is.

So the original solution (without printf()) presented in the article in 2008 does not work today (2017) in my windows console on my Windows 10 x64 machine. It works partially that is. Cyrilic part displaysys OK. Asian part of the unicode string does not display.

The solution bellow does display this ok. Just by using the 1252 Windows code page. And not using FILE * based stdio.h, but stdout handle and Console API.

ps: copy paste this into notepad and the whole string will be revealed.

2017-04-10

One very short simple and important post to read, before entering this post is here. It explains the vagaries of Win Console, WIN32 and in particular C++ I/O streams library vs these issues.

Among other things you will not be wondering anymore should you or should you not, and when should you, use sync_with_stdio()

Unfortunately, on my WIN10 PRO x64 machine, this does not work. No crash but no (much) output either.

In case you want to fork a project, here it is.

2017-04-02

WARNING:

The gist presented bellow originally is a “hack” and should not be applied to the application environment oin the global level. Instead, it can and should be used inside functions in a local manner. Neither “C” API or C++ stdlib, can work normally if this is executed. If this (as initially presented) is applied probably no ANSI console output will work at all.

EOF WARNING

Please note that we are discussing here Windows Console issues. C++ stdlib i/o and otherwise is not designed with that as a requirement. MSVC people have implemented stdlib with this in mind but as of today it certainly has issues.

Do not use C++ stdlib mixed with low level Windows console output.

Good old locale() does help somewhat but again MSVC stdlib is not helping here.

For me, above is a warning that things are not completely stable yet in the  C++ Winland. (Trivia or not: I have noticed on GitHub Kenny Kerr is using printf() almost always, or his c++ lib based on it). And also the locale switching from “C” to your system-wide default local will very likely not help in outputting the Unicode extended charset.

Also. If low-level console “hack” is used to switch to UTF16 any other output but wide will provoke an unhandled exception in ucrtbased.dll (a system dll). And no catching in your code will help.

Above crashes because wprintf() is not used. Replace printf() with wprintf() and all is dandy. But to match console output mode with actual stdio used requires a lot of discipline and imposes a lot of rigidity on other developers.

Please do not forget we are talking “Universal CRT” here. As we are all moving inevitably towards WIN10 there is no point developing for the past. UCRT is inevitable. Sadly, above crashes inside ucrtbased.dll.

Thus please use the gist provided bellow BUT never as presented. That is: do not use it on the application/global level. It is the best actually to forget about it. Fine. Ok. And where do we go from here?

Scoped Solution

The scope is Windows Console Unicode output.

And the solution seems to be, not to use C++ stdlib, or even stdio, but high-level Console API.

It is as simple as that. No C++ stdlib I/O, in this case, I am afraid. Just use Code Page Windows.1252 , output your Unicode “squiggly bits”, and then upon exiting the scope, revert back to whatever was the Code Page in use.

Enjoy. Or not…

Original Text

Out-of-the-box console output of Unicode strings will not produce what is expected in out-of-the-box C++ console applications (or C).

For example:

Above will not produce (on a Win console) those Unicode string as you see them in your code. For this, to work you can simply add my header to your Win32 console project and just include it. This is API with no API, thanks to C++. No calling of anything is necessary.

What this little C++ clever code does is the somewhat low-level initialization of the windows console. Code (for that) is actually rather simple. Just if you know where to dive in the MSDN ocean. And thanks to C++, this Windows Unicode console initialization happens “automagically”. Just by including the dbj_win_unicode_console_config.h (found if you click on the link)

I might be so bold to write, the actual automatic initializer C++ mechanism might not look that simple. But I might just be able to prove the opposite, just if you look carefully into it.

It is a single static instance of a struct UINT16CONSOLEINIT { }; with an instance counter inside it. Here is the counter. Yes, it is a function, with the actual “insider” counter_.

In the constructor, we increment the counter. But before that, we check if it is zero. When a counter_ is zero we do the console initialization and increment the counter. When this object goes out of scope, inside the destructor, we decrement this counter of instances. If we are at the last instance counter is zero again. This is very likely because of your application which is exiting, and the single static instance goes out of scope and destructor gets called. At that moment we perform the clean up of the console, as prescribed by MSDN.

And all of this is tucked away inside an anonymous namespace. Thus making it unreachable to the outer code.

The Code is perhaps not comprehensively documented. Please do not be shy to ask if there is anything not clear.

(c) Andysinger
(c) Andy Singer