I am migrating an old BDS2006 application suite to XE5-XE7.
One reason is to make it possible to display unicode texts in the GUI.
We have used a language switching system where the user can switch
language and then all of the new texts are read from a language file
in the selected language. The language files are read as ini files
using the TInifile class and contain text strings for each component
on each form where the form class name is the section.
For European languages the files are plain ASCII text files, but for
Asian languages they are unicode.
Now I hoped that if I migrate to XE5 and select a language like
Japanese or Chinese the texts from the Unicode language files would
show up in the GUI. But for some reason I only get unintelligible
characters on the menus after switching to say Chinese.
Some other items on the GUI do translate and show up as Chinese
characters, but most do not.
I have tested various unicode schemes but it all looks the same.
My development PC is Windows7 x64 Professional fully updated.
With the åre-unicode applications I was able to display the Chinese
texts on Windows XP (US version) after setting Windows locale language
system to Chinese, but this setting is no longer available on Win7....
What could be the reason for this problem?
Are menu texts handled differently from other components?
On Wed, 7 Jan 2015 00:56:07 -0800, Bo Berglund
wrote:
Closing this thread with a final report:
I have now a fully working XE5 (Unicode) application where the
language files are all UTF-8 and whatever language is switched to it
shows up on the menus with the correct characters (including Chinese,
Trad and Simpl).
What I did was:
- Convert the application to work on XE5
- Modify the language file handling system to use TMemIniFile
- Supplement the ReadString function with a removal of quotes in code
It now works like a charm!
Thanks to all contributors in this thread!
Bo wrote:
> The Windows API PrivateProfile calls always remove these quotes
> on reading via the TIniFile class.
That is documented behavior in the API spec:
http://msdn.microsoft.com/en-us/library/windows/desktop/ms724353.aspx
{quote}
If the string associated with lpKeyName is enclosed in single or double quotation
marks, the marks are discarded when the GetPrivateProfileString function
retrieves the string.
{quote}
> To get rid of this I had to revert back to TIniFile
Or, you could simply remote the quotes yourself after TMemIniFile.ReadString()
exits, before using the string value where needed. Look at the RTL's AnsiExtractQuotedStr()
and AnsiDequotedStr() functions. For example:
{code}
S := FIniFile.ReadString(...);
if StartsText(#39, S) or StartsText(#34, S) then
S := AnsiDequotedStr(S, S[1])
else
S := Trim(S);
{code}
--
Remy Lebeau (TeamB)
On Thu, 8 Jan 2015 12:07:20 -0800, Bo Berglund
wrote:
>On Wed, 7 Jan 2015 16:00:32 -0800, Remy Lebeau (TeamB)
> wrote:
>
>It really does not matter in this case because the language files are
>not written by the application, only read.
>The developer will use a special debug function to create a master
>language file in English and this is the only time a file is written.
>So it is perfectly OK.
Now I have tested TMemIniFile for *reading* the language strings into
the Delphi 2007 version of the application. (I cannot really go to the
unicode version of this application yet because I am working on
finding all strings thta need translation; a smaller application has
been converted to XE5, though).
Anyway, I found a big (killer) problem in the implementation:
Our language strings are usually written on this format in the
language file:
[]
.Caption="some text"
The double quotes are used Ãn all texts so that we can store both
leading and trailing whitespace like this:
.Caption=" some text "
Now, after switching from TIniFile to TMemIniFile as described the
result is that the display everywhere now contains the quotes!
So all menu items are named like "File", "Edit" etc with the quotes
displayed.
The Windows API PrivateProfile calls always remove these quotes on
reading via the TIniFile class. It also removes leading and trailing
whitespace so we cannot start or end language texts with whitespace
unless we use the quotes.
To get rid of this I had to revert back to TIniFile and this brings
back the problem with unicode compliance in the XE5 version of this
app after migration...
Question:
Has this compatibility problem surfaced before and been solved in
Delphi versions between D2007 and XE5?
On Wed, 7 Jan 2015 16:00:32 -0800, Remy Lebeau (TeamB)
wrote:
>Bo wrote:
>
>> What happens to comments and such in the ini file on disk when
>> the UpdateFile call is made?
>
>TMemIniFile does not support comments. It parses comments when loading the
>file but they are discarded, and there is no way to specify new comments
>when saving the file.
>
Thanks for the clarification!
It really does not matter in this case because the language files are
not written by the application, only read.
The developer will use a special debug function to create a master
language file in English and this is the only time a file is written.
So it is perfectly OK.
I just wanted to know ffor other potential uses of TMemIniFile.
Bo wrote:
> I looked up TMemIniFile and it seems to inherit from TCustomIniFile.
> So I modified the Field type thus:
>
> FIniFile: TCustomIniFile;
>
> Then where it is initialized:
> FIniFile := TMemIniFile.Create(FLanguageManager.CurrentLanguageFile);
Keep in mind that TMemIniFile's constructor has an optional Encoding parameter
in D2009 and later, eg:
{code}
FIniFile := TMemIniFile.Create(FLanguageManager.CurrentLanguageFile, TEncoding.UTF8);
{code}
> Ran a syntax check and it was all OK.
Yes, by design. By switching to the TCustomIniFile base class, you can use
any INI implementation class you want (for instance, there is also a TRegistryIniFile
class in the System.Win.Registry.pas unit which read/writes data using the
Registry instead of an .ini file).
> You mention that the encoding is set for the file somewhere, but
> I cannot see that in the D2007 help.
That feature does not exist in D2007. It was added in D2009, when Delphi
switched its String type to Unicode.
> Was this not added until after D2009, maybe?
Yes.
> I would like to make the code work across the unicode border and
> TMemIniFile exists already before so this is probably a good change
> notwithstanding.
In D2007 and earlier, TMemIniFile reads/writes AnsiString values as-is, just
like TIniFile does. So, for instance, if the file is encoded as UTF-8, TMemIniFile.ReadString()
(and TIniFile.ReadString()) will return an UTF-8 encoded AnsiString.
That is not the case in D2009 and later. Strings are always UTF-16 encoded,
so the file data has to be encoded/decoded accordiingly.
> However, the encoding stuff (once found) probably has to be depending
> on compiler version, I guess?
If you are trying to write code that can be compiled in multiple Delphi versions,
then yes, you will have to use {$IFDEF} statements to wrap newer functionality
as needed. For example (assuming a UTF-8 encoded file):
{code}
{$IF CompilerVersion >= 24.0}
{$LEGACYIFEND ON} // prior to XE4, $IF must use $IFEND instead of $ENDIF
{$IFEND}
FIniFile := TMemIniFile.Create(FLanguageManager.CurrentLanguageFile
{$IF RTLVersion >= 20.0}
, TEncoding.UTF8
{$IFEND}
);
....
{$IF RTLVersion >= 20.0}
s := FIniFile.ReadString(...); // reads UTF-8 from file, auto decodes to
UTF-16
{$ELSE}
s := FIniFile.ReadString(...); // reads UTF-8 from file
s := Utf8ToAnsi(s); // manual decode from UTF-8 to local ANSI
w := UTF8Decode(s); // manual decode from UTF-8 to UTF-16 WideString
{$IFEND}
{code}
--
Remy Lebeau (TeamB)
Bo wrote:
> What happens to comments and such in the ini file on disk when
> the UpdateFile call is made?
TMemIniFile does not support comments. It parses comments when loading the
file but they are discarded, and there is no way to specify new comments
when saving the file.
TIniFile, using Microsoft's PrivateProfile API, actually parses an existing
file when saving, thus preserving existing content including comments. TMemIniFile
does not do that. It creates a new file and writes its current memory content
to that file.
> Is all of that stuff removed
Yes
> is it actually read into memory when the TMemInifile is created
It is read, but it is not saved in memory.
> so it can be later written together with the other data?
No.
--
Remy Lebeau (TeamB)
On Wed, 7 Jan 2015 15:27:24 -0800, Bo Berglund
wrote:
Further about the TMemIniFile.UpdateFile method.
From the help in D2007:
-----------------------
{quote}Call UpdateFile to copy INI file data stored in memory to the
copy of the INI file on disk UpdateFile overwrites all data in the
disk copy of the INI file with the INI file data stored in
memory.{quote}
So my question now is:
What happens to comments and such in the ini file on disk when the
UpdateFile call is made?
Is all of that stuff removed or is it actually read into memory when
the TMemInifile is created so it can be later written together with
the other data?
On Wed, 7 Jan 2015 14:04:42 -0800, Remy Lebeau (TeamB)
wrote:
> a. use TMemIniFile to load the file using that charset/codepage.
I looked up TMemIniFile and it seems to inherit from TCustomIniFile.
So I modified the Field type thus:
FIniFile: TCustomIniFile;
Then where it is initialized:
FIniFile :=
TMemIniFile.Create(FLanguageManager.CurrentLanguageFile);
Ran a syntax check and it was all OK.
But now my question:
You mention that the encoding is set for the file somewhere, but I
cannot see that in the D2007 help.
Was this not added until after D2009, maybe?
I would like to make the code work across the unicode border and
TMemIniFile exists already before so this is probably a good change
notwithstanding.
However, the encoding stuff (once found) probably has to be depending
on compiler version, I guess?
Bo wrote:
> I have a little problem here, it seems...
It seems you have files in different encodings. You have to know the encoding
of a file in order to read it, especially in a Unicode envionment like D2009+.
So you need to either:
1. normalize your files to a single encoding (prefer UTF-8), and then make
sure your code reads the files using that encoding (TIniFile does not support
that, but TMemIniFile does).
2. detect which encoding is being used by a file (which is error prone if
you detect wrong), then read the file using that encoding. One way would
be to put the charset/codepage value in the .ini file itself so you can read
it first, then either:
a. use TMemIniFile to load the file using that charset/codepage.
a. skip any T...IniFile class altogether and just use the Ansi PrivateProfile
API functions directly, and then manually convert the bytes to Unicode using
TEncoding or UnicodeFromLocaleChars().
> Some of the files are just plain ASCII files (at least they look like
> that when inspecting them). A few are unicode, but with different
> encodings.
Meaning what? Some are UTF-8 and some are UTF-16? Unicode is a character
set, not an encoding. UTFs are encodings of Unicode.
> For the unicode tests I have tried to convert the ASCII files to
> unicode using UltraEdit's functions. It might or might not have
> worked, I don't know especially in case of the Chinese files...
Chinese cannot be stored as ASCII.
> I was not aware of TMemIniFile before...
> It seems like it does no longer use Windows API functions for reading
> and writing but treats the file properly by itself, right?
Correct. TMemIniFile is a manual implementation of the INI format, and so
it has more flexibility than Microsoft's API.
> If I have to specifically tell TMemIniFile what encoding is in use,
> then how can I know beforehand?
You have to detect it beforehand. There are three options:
1. pick an encoding and stick with it for everything. This is not going
to be compatible with existing files, so you will have to convert them.
2. store the encoding somewhere that you can read it later. Also not likely
to be compatible with existing files.
3. if you need to read existing files, you will have to analyze their raw
bytes to detect the encoding used. Not 100% reliable for non-UTFs.
> What happens if the encoding on read is set to UTF-8 but the file itself
> is juat a plain ASCII text file?
ASCII is a subset of UTF-8, so that is perfectly fine. However, it is the
non-ASCII non-UTF encodings you have to worry about.
> Conversion of the file encodings is something I am not very good at.
> What would be the best practice to do such a re-encoding?
Well, first you have to know what the original encoding is. Without that,
the rest is useless. But if you can figure out the original encoding correctly,
converting it easy. Plenty of conversion tools available. Or just convert
the files in code using the TStreamReader and TStreamWriter classes.
> I tried to convert ASCII => UTF-8 on my Swedish language file
Swedish is not ASCII-compatible, either. I think you are confusing ASCII
(characters 0-127) with ANSI (characters 0-255, where 128-255 are charset/language
specific).
> and then looked at the binary of the file. No BOM up front! Should there
> not be a BOM if it is UTF-8???
Usually, no. In fact, the Unicode standard specifically states that UTF-8
encoded files should NOT use a BOM. Many apps (especially legacy apps) that
read text files do not handle a UTF-8 BOM.
> If not, then how can one know if it is in UTF-8???
If there is no BOM, you have to analyze the raw bytes. UTF-8 is very easy
to detect, as it uses a very distinct bit pattern (by design).
--
Remy Lebeau (TeamB)