We posted this report a few days ago as a reply to an older thread "Combining diacritics positioned incorrectly in Word 2016". It received some views but no actual response. Because we believe these are serious long-standing bugs that are in need of
urgent attention we have posted it again this time as a new thread. We also tried posting to Word's "Feedback" option from within the program itself but there is a
very low word limit on those postings so all we could do was ask Microsoft to reply to us with an address to send the full report. We are yet to receive any response. If someone can tell us a better, more direct way than this Community
forum to get this report to the relevant teams at Microsoft we would be extremely grateful.
In Microsoft Word for Windows 2016 there are three ways to insert Unicode encoded characters that are not accessible from the standard inbuilt keyboard (ie non ASCII). We have found significant flaws in
two out of these three methods. They appear to be bugs.
___________________________________________________________
(A) INSERT SYMBOL - generally works fine. No problem.
___________________________________________________________
(B} DIRECT UNICODE INPUT USING XXXX[Alt+X] METHOD - we have discovered serious, consistent, easily-reproduced flaws in this method. This input method is documented in Word Help and it's referred to as "Shortcut key" at the bottom of the Insert/Symbol
dialog box. This input method is limited to Word (doesn't work in Publisher or PowerPoint)
There are indications pointing to the problem on a number of internet forum posts eg
https://www.howtogeek.com/239321/how-to-manually-create-compound-characters-in-word/
(see the paragraph near the bottom which starts with "There is a situation where this second method doesn’t work...")
https://qualityandinnovation.com/2014/11/22/typing-x-bar-y-bar-p-hat-q-hat-and-all-that/
(see the very last post in the last paragraph beginning "I had mixed results simply using the “0305 Alt-x” shortcut")
We have exhaustively analyzed this problem and believe we have fully identified the issue:
SUMMARY OF THE PROBLEM:
If the last letter typed before you type the 4-character hexadecimal unicode is 0-9, a-f or A-G (in other words a character that COULD be a character in a valid hexadecimal unicode) then Alt+X incorrectly reads back 5 characters instead of 4 and the result
in most fonts is invariably an undefined character. If the font happened to contain that 5-character unicode then this is what would be displayed. Furthermore Word RETAINS the 5-character sequence and will regenerate it as a 5-character unicode (instead of
the starting letter plus 4-character Unicode) if you click Alt+X again. If the starting letter was Lowercase a-f then Alt+X will regenerate it as an Uppercase because it has read the first letter as part of a 5-character Unicode. Further, there is one other
letter outside the a-f range that results in the failure of the unicode input method - typing the unicode after x or X and clicking Alt+X produces no result and the unicode is not converted.
REPRODUCIBLE TEST:
In any font that contains Combining Diacritics (eg Calibri, Arial etc etc):
Type a0300 then Alt+X. You'll get the "undefined character" box (may contain question mark) because the Alt+X converter has included the "a" in its back-reading of the unicode value. The font does not contain the character with unicode A0300, hence the undefined
character box.
{Note that typing a0300 Alt+X SHOULD result in a lowercase a with Combining Grave (U+0300) on top)
Click Alt+X again and it's converted back to the "unicode" A0300 (with uppercase A, confirming that the original "a" has been wrongly intepreted as part of a Unicode.
Typing uppercase A followed by 0300 then Alt+X will fail similarly.
So will typing the number 9 followed by 0300 then Alt+X.
All indications are that above test will fail in exactly the same way whenever the first letter is a-f, A-F or 0-9 no matter what 4-character unicode is entered afterwards (not just 0300!)
BUT if you type g0300 then Alt+X, you'll get the correct output, ie a letter g with grave on top.
... and so on, right through to z EXCEPT for x and X.
(Type x0300 or X0300 then Alt+X and NOTHING happens - the unicode 0300 remains unconverted. Same result with all unicodes typed after x or X).
WHEN WOULD THIS BE A PROBLEM?
1. Trying to input a unicode letter (or a diacritic on top of a letter) WITHIN a word eg
The German preposition "für" ("for") can by typed two ways - one will fail and one will work:
(a) f1209[Alt+X]r will FAIL - the "f" is read by the Alt+X converter as part of a 5-character unicode value.
(b) fu0308[Alt+X]r will SUCCEED - Alt+X reads back only 4 characters because "u" could not form part of a unicode.
2. 5-character unicodes have existed for quite a few years so a further ambiguity has consequently been added into this already error prone process of getting Alt+X to read backwards. How does it know whether to read 4 characters back or 5? The terminal boundary
of the imputted unicode is unambiguous - it occurs at the point Alt+X is typed. But the beginning of the inputted unicode is currently ambiguous. If you just always read back 4 characters that will solve the above issue BUT it will rule out the input of a
5-character unicode. BOTH the beginning AND end boundaries of the inputted unicode need to be unambiguous - as with the Alt(hold)+XXXX(release) method or Mac's Unicode Hex input keyboard.
___________________________________________________________
(C) DIRECT UNICODE INPUT FROM CUSTOM SOFTWARE KEYBOARD
This issue was discovered during our use of a custom keyboard for inputting combining diacritics to easily create a large library of Sanskrit transliteration characters. The issue appears to be related to a misinterpretation of certain unicode keyboard inputs
by Word's Unicode keyboard processing engine.
The same issue occurs in Word, Publisher and PowerPoint.
We have a custom keyboard in which 11 diacritics from the Unicode Combining Diacritics range are encoded with their correct unicodes: 0300, 0301, 0303, 0304, 0306, 0307, 030D, 030E, 0310, 0323, 0331. They are accessed using the AltGr key in Caps Lock mode (equivalent
to Shift+AltGr mode).
The custom keyboard input method works perfectly for ALL 11 diacritics after ALL letters a-z, A-Z in Microsoft's own NotePad and WordPad and also in CorelDraw X9. The output in all situations - including entering the diacritics by themselves (ie no preceding
letter) - is 100% correct and robust.
Our saved NotePad and WordPad documents can be opened in Word and all 11 diacritics over all letters display correctly in all combinations. Also, the raw text copied from the NotePad and WordPad documents can be pasted directly into Word and again all diacritics
on all letters display correctly.
In Mac Word the corresponding custom keyboard input method works fine for ALL 11 diacritics after ALL letters. This Mac Word document containing all the letter+diacritic combinations opens and displays perfectly when opened in Windows Word.
In Windows Word, ALL 11 diacritics can be typed successfully after all vowels (a e i o u) and also after the consonant/proto-vowel y.
However, 4 particular diacritics 0300, 0301, 0303 and 0323 CANNOT BE SUCCESSFULLY TYPED BY KEYBOARD AFTER A CONSONANT (except for y) OR BY THEMSELVES. The input method fails and no character is displayed. The same result occurs in a many fonts including Calibri
and Arial.
It was initially suspected that because these 4 diacritics happen to be the only ones out of our 12 which have legacy glyph names (gravecomb, acutecomb, tildecomb, dotbelowcomb) from the Adobe Glyph List (v1.7) and therefore they do not have generic
uniXXXX names, the issue may be related to this difference. These non-uniXXXX names are quite standard across the majority of fonts. To test this theory, a custom font was created in which those 4 glyphs were assigned their generic uniXXXX format names instead
of the recommended Adobe Glyph List names ie they were re-named uni0300, uni0301, uni0303 and uni0323 respectively. This made NO DIFFERENCE to the behaviour in Word, however it seems too much of a coincidence for this glyph name anomaly not to be related to
the issue perhaps in some subtle way.
It is not known how many other characters this problem occurs with.
We can think of no valid typological nor linguistic reason for this behaviour and, in any case, we know it works fine if done from within the Insert Symbol window. From our testing in various versions of Word, it appears this issue has been part of Microsoft
Word's code at least as far back as Word 2003, possibly earlier.
REPRODUCIBLE TEST:
1. Set up a custom keyboard to input the 11 Combining Diacritics listed above: 0300, 0301, 0303, 0304, 0306, 0307, 030D, 030E, 0310, 0323, 0331. To investigate the possible relevance of the glyph name relevance further we suggest also including 0309 (hookabovecomb)
which is the only other combining diacritic with a non-uniXXXX format name.
2. Using any font that contains Combining Diacritics (eg Calibri, Arial etc etc), type any vowel a, e, i, o, u, or y and after it, in turn, type the 11 diacritics. You will see that they ALL display.
3. Now, with NO preceding letter type the 12 diacritics on their own - only 0300, 0301, 0303 and 0323 don't display. (Also check 0309 if included).
4. Now type any consonant (except y) and after it, in turn, type the 11 diacritics - only 0300, 0301, 0303 and 0323 don't display. (Also check 0309 if included).
5. Do the same process in NotePad and WordPad. There will be no display problems. Copy the NotePad and WordPad text into Windows Word - no display problems. Open the saved NotePad and WordPad documents in Windows Word - no display problems.
___________________________________________________________
We regard issue (C) and also the preceding one (B)
as extremely serious and urgent and we would be surprised if Microsoft doesn't feel the same way. These presumed bugs are affecting the functionality of the fonts and keyboard in a project that we have been involved with for 6 years which is awaiting release
pending the resolution of these issues. We can't imagine that Microsoft would want these bugs to persist any longer now that they have been brought to their attention. We are prepared to cooperate fully (testing, feedback etc) with Microsoft to reach a speedy
resolution and look forward to hearing from you on both these issues urgently.
**********************************************************
Kevin Brown's G R A P H I T Y ! Est.1979
DIGITAL TYPE SPECIALIST * GRAPHIC DESIGN
Member: The Unicode Consortium
Member: Australian Graphic Design Association
www.australianschoolfonts.com.au
**********************************************************
[Moved from: Office / Word / Windows 10 / Office 2016]