Mega Search
23.2 Million


Sign Up

Make a donation  
TMemIniFile detection of unicode?  
News Group: embarcadero.public.delphi.language.delphi.general

I am working with XE5 on a language handling system that needs update
to unicode. The language strings are stored in files in inifile format
and in the old application we used TIniFile to read the strings.
On suggestion from Remy Lebeau I have switched to TMemIniFile instead
and it seems to work, Chinese characters in UTF-8 files show up as
they should.

But I have a question on TMemIniFile and Unicode:
If we do not use the optional TEncoding argument to
TMemIniFile.Create() will it nevertheless detect the file format, for
example by looking at the BOM if it exists or scanning the actual
data?

Right now we have mixed file formats: ASCII, ASCII with localized
characters like ÅÄÖ, UTF-8 and Unicode.
So I cannot really force Create to any specific encoding, instead I
hope it can see this by itself.

In the end I will have the files re-translated to some common encoding
(UTF-8).

But what would happen if someone translated a file to a new language
and saved in Unicode when TMemIniFile expects UTF-8?
Will the read fail?

Our idea is that distributors could make translations for their
country by themselves and here lies the danger of different Unicode
encodings.....

Vote for best question.
Score: 0  # Vote:  0
Date Posted: 17-Jan-2015, at 7:54 AM EST
From: Bo Berglund
 
Re: TMemIniFile detection of unicode?  
News Group: embarcadero.public.delphi.language.delphi.general
> {quote:title=Bo Berglund wrote:}{quote}
> On Sat, 17 Jan 2015 10:48:23 -0800, Remy Lebeau (TeamB)
>  wrote:
> 
> 
> >> But what would happen if someone translated a file to a new language
> >> and saved in Unicode when TMemIniFile expects UTF-8? Will the read fail?
> >
> >Yes, it will.  The bit patterns used by UTF-8 and UTF-16 are very different 
> >from each other.  They represent the same range of Unicode codepoints, but 
> >in very different ways.
> 
> Then I will create the "official" language files as UTF-8 complete
> with the BOM and then direct anyone who translates to a different
> language that they MUST start with the English file. This then
> probably will force UTF-8 with BOM on them. If they anyway save to a
> different encoding then they will have broken the system to their own
> disadvantage...
> 
> Thanks for the information on how TMemIniFile operates!
> 
> PS: I can now see the Chinese texts already in my own English Win7 PC!
> Also on menus. Can't read them though... DS

Bo 

have a app (or part of your main app for INI configuration) - even if it's just a TRichText Memo of the INI file that way you can control the saving and therefore the encoding 
.... if you're really paranoid (and I've done this for an app with the INI setting were very "interesting") checksum the name/value pairs (don't need all) and 
give a warning if you use an app where the INI has been editied externally (ie the CheckSum is incorrect) doesn't need to be real clever just a simple XOR / shift 
style will do

--
Linden
"Mango" was Cool but "Wasabi" was Hotter but remember it's all in the "source"

Vote for best answer.
Score: 0  # Vote:  0
Date Posted: 17-Jan-2015, at 10:37 PM EST
From: Linden ROTH
 
Re: TMemIniFile detection of unicode?  
News Group: embarcadero.public.delphi.language.delphi.general
On Sat, 17 Jan 2015 10:48:23 -0800, Remy Lebeau (TeamB)
 wrote:


>> But what would happen if someone translated a file to a new language
>> and saved in Unicode when TMemIniFile expects UTF-8? Will the read fail?
>
>Yes, it will.  The bit patterns used by UTF-8 and UTF-16 are very different 
>from each other.  They represent the same range of Unicode codepoints, but 
>in very different ways.

Then I will create the "official" language files as UTF-8 complete
with the BOM and then direct anyone who translates to a different
language that they MUST start with the English file. This then
probably will force UTF-8 with BOM on them. If they anyway save to a
different encoding then they will have broken the system to their own
disadvantage...

Thanks for the information on how TMemIniFile operates!

PS: I can now see the Chinese texts already in my own English Win7 PC!
Also on menus. Can't read them though... DS

Vote for best answer.
Score: 0  # Vote:  0
Date Posted: 17-Jan-2015, at 2:27 PM EST
From: Bo Berglund
 
Re: TMemIniFile detection of unicode?  
News Group: embarcadero.public.delphi.language.delphi.general
Bo wrote:

> But I have a question on TMemIniFile and Unicode:
> If we do not use the optional TEncoding argument to
> TMemIniFile.Create() will it nevertheless detect the file format,
> for example by looking at the BOM if it exists or scanning
> the actual data?

If you do not specify a TEncoding in the constructor, TMemIniFile will do 
the following:

1. when loading an .ini file, it will look for a BOM, and if none is found 
then TEncoding.Default will be used.

2. when saving an .ini file, if a TEncoding has not been assigned (either 
explicitly or through file detection), TEncoding.Default will be used.  Otherwise, 
a BOM will be written if a UTF-8/16 TEncoding is used.

TEncoding.Default represents the default codepage of the local PC at the 
time it is used, so it can represent different encodings on different PCs, 
or even different encodings on the same PC if the user changes their locale 
settings.

> Right now we have mixed file formats: ASCII, ASCII with localized
> characters like ÅÄÖ, UTF-8 and Unicode.

There is no such thing as "ASCII with localized characters", that is known 
as ANSI instead, which itself is a misnomer as it is not a single encoding 
but rather encompasses all non-ASCII non-UTF encodings.

> So I cannot really force Create to any specific encoding

You will have to detect the encoding manually before then loading the file. 
 TEncoding has a GetBufferEncoding() method (which TMemIniFile uses), but 
it only looks for BOMs and nothing else.  If your files do not have BOMs 
then you have to analyze the file data yourself, TMemIniFile will not do 
it for you.  ASCII and UTF encodings are easy to detect, as they have well-defined 
bit patterns.  But ANSI encodings are going to be difficult, as they are 
locale-dependant.

For example, 'Ã…' is encoded as byte $C5 in many ISO-8859-X encodings and 
some Windows-12XX encodings, but does not exist in other ISO-8859-X and Windows-12XX 
encodings.  A better example is the Euro '€', which is encoded as byte $88 
in Windows-1251, but as $80 in Windows-1252, and as $A4 in ISO-8859-7, but 
does not exist in many other ISO-8859-X encodings.

So detecting an ANSI encoding correctly is both difficult and important. 
 Which is why it is best not to rely on ANSI at all.  Use a UTF instead. 
 But if you must load an ANSI encoded file, TEncoding.Default is usually 
your best bet, provided the .ini file has not been moved around between PCs 
that have different locales, and the user does not change the locale of the 
PC when loading the .ini file on the same PC that that created it.

> But what would happen if someone translated a file to a new language
> and saved in Unicode when TMemIniFile expects UTF-8? Will the read fail?

Yes, it will.  The bit patterns used by UTF-8 and UTF-16 are very different 
from each other.  They represent the same range of Unicode codepoints, but 
in very different ways.

-- 
Remy Lebeau (TeamB)

Vote for best answer.
Score: 0  # Vote:  0
Date Posted: 17-Jan-2015, at 10:48 AM EST
From: Remy Lebeau (TeamB)