[Tickets] [Orxonox] #379: International characters in paths
Orxonox
trac at orxonox.net
Thu May 12 18:51:31 CEST 2011
#379: International characters in paths
------------------------------+---------------------------------------------
Reporter: rgrieder | Owner: nobody
Type: defect | Status: new
Priority: low | Milestone: Version 0.1 Codename: Arcturus
Component: GeneralFramework | Version: 0.0.4
Severity: normal | Keywords: unicode utf western 1252 codepage cegui
------------------------------+---------------------------------------------
Changes (by rgrieder):
* keywords: unicode => unicode utf western 1252 codepage cegui
* severity: critical => normal
Old description:
> When starting Orxonox in a directory like 'ásdf' on Windows 7, the CEGUI
> logger will not accept the logging file, leading to an exception. [[br]]
> We need to investigate whether this is a just a communication problem
> between Orxonox and CEGUI or whether we have serious issues with
> international characters in paths.
New description:
When starting Orxonox in a directory like 'ásdf' on Windows 7, the CEGUI
logger will not accept the logging file, leading to an exception. [[br]]
We need to investigate whether this is a just a communication problem
between Orxonox and CEGUI or whether we have serious issues with
international characters in paths.
'''EDIT''' [[br]]
It turns out that it was mostly a Problem in the CEGUI::DefaultLogger.
However that's not all. So I have to make a little detour (for Windows
only!):
On Windows, characters are encoded using the Microsoft codepage currently
in use, which could be any codepage on different systems. Codepages are
simply 8 bit ASCII characters extended by another 128 characters to
support whatever is needed. On systems in the US and Western Europe,
codepage 1252 is the standard.
CEGUI on the other hand uses UTF-32 (4 bytes) for their strings and
converts them to UTF-8 when calling c_str(). That is of course different
from the 1252 Western codepage used by Windows, so whatever we get from
CEGUI might not be useful directly for the Windows API. [[br]]
That's why for all the Windows API functions related to strings, there is
a second function with a 'W' suffix (or prefix, don't remember) that
accepts wchar_t. However, the usual standard is 4 bytes for that type
(UNIX), but Microsoft decided to go for 2 bytes and UTF-16 encoding.
[[br]]
That's exactly where the bug occurred: CEGUI converted to UTF-8 and fed
that to ofstream::open, which in turn was interpreted as a codepage 1252
character sequence. [[br]]
[[br]]
There is one more subtle detail left: How does CEGUI::String convert from
1252 to UTF-32 when assigning our std::string to it? Simple: according to
the documentation, the characters are interpreted as unencoded 8-bit
values. So a simple cast from 8 bit to 32 bit values is done. [[br]]
And how on earth could that ever be correct (it actually was)? It turns
out that 1252 is mostly identical to UTF-32 for the first 256 characters.
[[br]]
=== TODO ===
Not every user will have the 1252 codepage and therefore a lot of things
can go wrong. We somehow have to deal with this. [[br]]
On the other hand, the CEGUI problem, that this ticket was issued for, is
just a bug and not a general behaviour. CEGUI 0.6.2 might still have the
issues though. But since that only concerns Windows where we use CEGUI
0.7.5, we're safe. [[br]]
The other TODO is making a correct conversion from UTF-8 (standard Linux
encoding if I'm not wrong) to CEGUI::String because that's just a simple
cast and not a decoding.
--
--
Ticket URL: <www.orxonox.net/ticket/379#comment:2>
Orxonox <http://www.orxonox.net>
Orxonox Open Source game
More information about the Tickets
mailing list