Now where was I? I spent so much time spelunking around MSDN and Microsoft Docs, I almost forgot what was I trying to solve. How to code optimally, string ordinal comparisons, while inside Windows legacy code.
Why not MSDN ?
Currently MSDN is legacy documentation system in itself. Almost each MSDN entry on its top is pointing to “Microsoft Documents”. Perhaps it is high time to finalize on the url redirects, from MSDN to “Microsoft Documents”.
[leftblock]The key problem with MSDN or Microsoft Docs is absence of temporal primary key. In English: material is not organized “time wise”, or “by time”. There is no way of knowing when is what made or where it is in use, just by looking into either MSDN or Microsoft Docs, “MD” in further text. The very fact there are two entries into the MD (Microsoft Docs) on-line is confusing enough.[/leftblock]
From users point of view there is this “big issue”: there is more than one solution for each technical/development question the one innocently might dare to ask.
The best course of action might be therefore, to start from Wikipedia article on the particular subject . This is where (almost always) is the history and other necessary facts, to understand the context before diving into the ocean of both Microsoft on line documentation systems.
[leftblock]That multitude of (seemingly) solutions, might be somehow justified, but in there, there is almost no authoritative entry (on any subject) which advises on the temporal ordering, usability or deprecation of the particular entry. There are almost no contextual overarching articles. Actually there are deprecation warnings, but half true. That is: they never tell you the context of the deprecation. For example for which version of Windows is the deprecation about.[/leftblock]
General assumption is (it seems) that one who visits the MD, will always develop fresh brand new solutions, for the latest Microsoft OS. If one has to maintain legacy code for any Microsoft OS: Tough luck. The treacherous MSDN legacy swamp is waiting. Enter at your own peril.
Easy solution? Perhaps re-branding the MSDN into “Legacy Technologies” Where all the legacy entries will be clearly marked as a such. If you do that, just make it obvious please.
Example for today: c++ string comparisons
Simple question: Which Windows string comparison API should I use to improve some Windows legacy ( WINFILE ) code, to solve the Warning C6400?
Experience is a curse here. The more one (MSDN or MD user) knows at this point, the more is one in danger of not finding an straight answer in less than a day. For example, I know (and follow) about this WIN32 vs UCRT dichotomy. No article (I know of) tells me when and if, should I look into WIN32 , or should I look into UCRT. There is no straight answer. And yes there is also STL source. The C code in the core of the MSVC modern C++ std lib.
For example, if I look into any of them can I use my MSDN/MD findings to support e.g. Windows XP builds. At best information is in there but hidden in remarks and footnotes, scattered around the “swamp”.
[leftblock]Very quickly “the seasoned user/developer” remembers all the dark quick sand of unmarked points in the Windows legacy technologies swamp. Like (very) unfortunate Microsoft Unicode implementation (not using UTF8), “mbc’s strings” , TCHAR … and all the other “jewels”. Making a good material for casting the undisputed crown in bad documentation Olympics, that MSDN firmly holds.[/leftblock]
Let me reiterate this too: in the year 2018 one has the Microsoft modern c++ standard library implementation, too. To enjoy or to worry about. Depends on how experienced is the one. Again I drifted away ranting. Back to the subject.
So, what is the “normal” string comparison?
Comparing two strings by taking into the account the language context. That is, in which language are the two strings written or made.
There is yet another sizeable, Windows related on-line documentationn minefield related to this. Much more complex v.s. ordinal string comparisons.
We shall devote a future post to this subject. Hint is here, but yes with no pointers from MSDN to MD. Now onto the main subject, at last.
What is : string ordinal comparison ?
Comparing two strings byte by byte. Not linguistically (by the rules of the current language installed on your desktop) but as two byte arrays.
First a very simple riddle.
std::string s1{ "APPLE" }, s2{"PEAR"};
bool is_this_ordinal_comparison = (s1 == s2) ;
Question: You are happily inside Visual Studio on some Windows machine. Is the above code, locale sensitive or ordinal string comparison? [The answer]
The brave bystander might answer: Ha, I shall just use lstrcmp()
or lstrcmpi()
.. Next she will innocently lookup the MSDN entry and face the reality of what I am talking about here. Hint: The gate to the hades of MSDN, is in the “Remarks” section.
Next she will dutifully apply VS Code Analyzer. Just to find a lot of C6400 warnings, made by very bad children using lstcmpi()
.
Next, the brave bystander says: Eh? Is this really necessary today? Why do we not just use std::basic_string<T>
boolean operators and leave it to the std:: implementers? (Hint: they do not use lstrcmp
)
Perhaps in an ideal world where we might have an OS (and dev tools and technologies with it, aka “SDK”) which somehow never are in a legacy state. But in reality we are developing on the Windows OS which has many legacy sink holes.
So. You are (as we said) inside your Visual Studio IDE on some Windows machine. Maintaining the legacy code that is supposed to run on Windows XP for example. What advice you might have in the MSDN or MD (Microsoft Docs)?
Here at this point I will stop this essay. Instead of me instructing you how (not) to use the on-line documentation on the subject here, please just remember from time to time to read this article , on the subject of string comparisons. And the discussion bellow it, please. I have. And, I now know simply to ignore the available on-line documentation amassed in years after that. Instead I have developed …
My (current) solution for ordinal string comparisons
After almost two (three?) days of foolishly fighting with the legacy beasts in the depths of the MSDN swamp I have finally decided to “simply” follow how is Microsoft std:: string
comparison implemented. Logic is this: implementers of this library know what are they doing and which API to use and how.
[leftblock]Just by looking into the MSDN, it was impossible to decipher which two available function of many, should one use to compare strings as byte arrays and make it right for “all” Windows legacy code. It is two, because in Windows, unlike Linux, one has to separate Unicode from ascii string processing. So it is always one for the char
version and second for the wchar
_t version.[/leftblock]
So, after some spelunking in the debugger and the std:: space, I have found that current Windows c++ standard lib eventually goes all the way down into the one stl C source function available in xstrcoll.c file, as delivered by the stl part of the windows sdk. xwcscoll.c
is where the wchar
_t version resides.
If SDK is installed on your machine on a standard location, both are to be found here:
C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.13.26128\crt\src\stl
And in there the c++ std lib, simply uses memcmp() if there is no locale available, or uses the locale friendly string compare (if there is a locale) using stl internal __crtCompareStringA
or __crtCompareStringW
functions. Very simple and refreshingly not over engineered, standard C code.
So by looking in there, I managed to develop an easy simple and obvious string ordinal comparisons. In C.
And here is the proverbial on-line code (of the solution core) to prove it and show it. Everything else is an “obfuscation”, one might say.
[leftblock]You might ask: is this not a common sense? Is this not obvious? Well if one asks MSDN or Microsoft Docs for advice apparently it is not simple and obvious at all. If you want to use some “standard” Windows SDK string comparison function, by looking into MSDN/MD, you will never know if and when will it perform as you expect it to. There are simply too many self inflicted unknowns, in the world of MSDN/MD driven development.[/leftblock]
Why and how to use the code bellow? First try and analyse your code or some legacy Windows code, that is using lstrcmpl()
or lsptrcmpli()
. And then (in case you have them) try and remove all the C6400 warnings in there. This is why I have developed the code. I am offering you the core of my solution.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
int dbj_ordinal_compareA ( const char *_string1, const char *_end1, const char *_string2, const char *_end2) { // no of elements int n1 = (int)(_end1 - _string1) ; int n2 = (int)(_end2 - _string2) ; int ans = memcmp(_string1, _string2, n1 < n2 ? n1 : n2); int ret = (ans != 0 || n1 == n2 ? ans : n1 < n2 ? -1 : +1); return ret; } int dbj_ordinal_compareW( const wchar_t *_string1, const wchar_t *_end1, const wchar_t *_string2, const wchar_t *_end2) { // no of elements int n1 = (int)(_end1 - _string1); int n2 = (int)(_end2 - _string2); int ans = wmemcmp(_string1, _string2, n1 < n2 ? n1 : n2); int ret = (ans != 0 || n1 == n2 ? ans : n1 < n2 ? -1 : +1); return ret; } |
And yet again, the next question might be: why not just use C++ std:: string compare that delivers modern c++ and ordinal string comparison?
Answer is (already explained above): that requires the whole of c++ std:: . In code maintenance situations, more than few times not an feasible solution. Other than that, My C code above (if used) is also much easier to follow through a debugger, v.s. the c++ std:: lib, necessary meanderings., through a (very) long call stack.
Somehow it seems to me this code is also easier to maintain.[ps2id id=’answer_1′ target=”/]
Thank you for watching.
Answer: This is ordinal string comparison. Now look into std::basic_string
compare here. Which is not locale aware too. The next is to understand locale aware string comparisons. std::collate::compare is locale aware. It is not used by either of them. I challenge you to find this kind of straight answer anywhere in MSDN or Microsoft Docs.