Mega Search
23.2 Million


Sign Up

Make a donation  
TStringlist with special characters  
News Group: embarcadero.public.delphi.oodesign

I am using Delphi 7 and have a routine which takes a csv file with a series of records and imports them. This is done by loading it into a TStringList with MyStringList.LoadFromFile(csvfile) and then getting each line with line = MyStringList[i].

This has always worked fine but I have now discovered that special characters are not picked up correctly. For example, Rue François Coppée comes out as Rue François Coppée - the accented French characters are the problem.

Is there a simple way to solve this?

Thank you.

Vote for best question.
Score: 0  # Vote:  0
Date Posted: 15-Dec-2014, at 6:58 AM EST
From: Tim Stainton
 
Re: TStringlist with special characters  
News Group: embarcadero.public.delphi.oodesign
Just wrote:

> I didn't know that. Thanks. :)

Yeah, I found that out the hard way in BCB6, where UTF8Encode() and UTF8Decode() 
use a manual UTF-8 implementation that does not handle multi-byte sequences 
correctly, thus can corrupt data that uses higher codepoints.  That implementation 
still exists in D7, but with some small differences (maybe they were trying 
to fix it?), but eventually they ditched their implementation and switched 
to Microsoft's MultiByteToWideChar() and WideCharToMultiByte() functions 
instead (and then later added iconv support for cross-platform).
 
--
Remy Lebeau (TeamB)

Vote for best answer.
Score: 0  # Vote:  0
Date Posted: 16-Dec-2014, at 10:08 AM EST
From: Remy Lebeau (TeamB)
 
Re: TStringlist with special characters  
News Group: embarcadero.public.delphi.oodesign
On Mon, 15 Dec 2014 11:37:55 -0800, Remy Lebeau wrote:
> 
>  Also, UTF8Decode() is broken in D7, it doesn't implement UTF-8 correctly 
> (that was fixed in a later version).

I didn't know that. Thanks. :)

Vote for best answer.
Score: 0  # Vote:  0
Date Posted: 16-Dec-2014, at 12:49 AM EST
From: Just JJ
 
Re: TStringlist with special characters [Edit]  
News Group: embarcadero.public.delphi.oodesign
> {quote:title=Just JJ wrote:}{quote}
> On Mon, 15 Dec 2014 06:58:52 -0800, Tim Stainton wrote:
> > I am using Delphi 7 and have a routine which takes a csv file with a
> > series of records and imports them. This is done by loading it into a
> > TStringList with MyStringList.LoadFromFile(csvfile) and then getting each
> > line with line = MyStringList[i]. 
> > 
> > This has always worked fine but I have now discovered that special
> > characters are not picked up correctly. For example, Rue François Coppée
> > comes out as Rue François Coppée - the accented French characters are
> > the problem. 
> > 
> > Is there a simple way to solve this?
> > 
> > Thank you.
> 
> The text is UTF-8 encoded, so use UTF8Decode(). e.g.:
> 
> line:= UTF8Decode(MyStringList[i])
> 
> i.e. UTF8Decode() will decode the UTF-8 encoded text into Unicode.
> 
> If the "line" variable type is String, the Unicode text will be converted to
> ANSI using the current application's active code page.
> 
> If you still see incorrect characters, it means that some Unicode characters
> don't have the equivalent ANSI character in current active code page. In
> this case, you'll need to use WideString for the "line" variable type and
> never treat/mix/compare with String values. Otherwise, the Unicode string
> will be converted to ANSI and you'll loose some characters in the process.
> Keep in minds that Delphi 7 VCL is ANSI only.
> 
> For Delphi 7, there's Tnt Unicode Controls component to make an (almost
> fully) Unicode VCL application. But, if you can, I'd suggest using Delphi
> 2009 or newer for dedicated Unicode application.

Thanks for this. It now works fine, at least with the language characters that I have tried it with. I will stick to D7 for now but will have a look at the TNT unicode component as well.

I also note Remy's comments. Thanks.

Edited by: Tim Stainton on Dec 15, 2014 8:02 PM

Vote for best answer.
Score: 0  # Vote:  0
Date Posted: 15-Dec-2014, at 8:04 PM EST
From: Tim Stainton
 
Re: TStringlist with special characters  
News Group: embarcadero.public.delphi.oodesign
Just wrote:

> The text is UTF-8 encoded, so use UTF8Decode(). e.g.:
> 
> line:= UTF8Decode(MyStringList[i])
> 
> i.e. UTF8Decode() will decode the UTF-8 encoded text into Unicode.

Just keep in mind that UTF8Decode() returns a WideString, which is not as 
efficient as AnsiString (or UnicodeString in D2009+).  The rest of his code 
is using AnsiString since it is D7, so the overhead of converting individual 
lines from AnsiString to WideString back to AnsiString might not be desirable. 
 Also, UTF8Decode() is broken in D7, it doesn't implement UTF-8 correctly 
(that was fixed in a later version).

--
Remy Lebeau (TeamB)

Vote for best answer.
Score: 0  # Vote:  0
Date Posted: 15-Dec-2014, at 11:37 AM EST
From: Remy Lebeau (TeamB)
 
Re: TStringlist with special characters  
News Group: embarcadero.public.delphi.oodesign
On Mon, 15 Dec 2014 06:58:52 -0800, Tim Stainton wrote:
> I am using Delphi 7 and have a routine which takes a csv file with a
> series of records and imports them. This is done by loading it into a
> TStringList with MyStringList.LoadFromFile(csvfile) and then getting each
> line with line = MyStringList[i]. 
> 
> This has always worked fine but I have now discovered that special
> characters are not picked up correctly. For example, Rue François Coppée
> comes out as Rue François Coppée - the accented French characters are
> the problem. 
> 
> Is there a simple way to solve this?
> 
> Thank you.

The text is UTF-8 encoded, so use UTF8Decode(). e.g.:

line:= UTF8Decode(MyStringList[i])

i.e. UTF8Decode() will decode the UTF-8 encoded text into Unicode.

If the "line" variable type is String, the Unicode text will be converted to
ANSI using the current application's active code page.

If you still see incorrect characters, it means that some Unicode characters
don't have the equivalent ANSI character in current active code page. In
this case, you'll need to use WideString for the "line" variable type and
never treat/mix/compare with String values. Otherwise, the Unicode string
will be converted to ANSI and you'll loose some characters in the process.
Keep in minds that Delphi 7 VCL is ANSI only.

For Delphi 7, there's Tnt Unicode Controls component to make an (almost
fully) Unicode VCL application. But, if you can, I'd suggest using Delphi
2009 or newer for dedicated Unicode application.

Vote for best answer.
Score: 0  # Vote:  0
Date Posted: 15-Dec-2014, at 10:39 AM EST
From: Just JJ
 
Re: TStringlist with special characters  
News Group: embarcadero.public.delphi.oodesign
Gerrit wrote:

> Relatively simple yes. It seems the csv uses an UTF8 encoding. You'll
> need to that as an UTF8 string into memory and then convert that to
> Ansi string using a (the default) code page.  Note that this may be a
> lossy conversion: any utf8 encoded character that does have a mapping
> in the ansi code page is lost. function Utf8ToAnsi() is what you need here.
>
> When reading the file, you may also need to skip the UTF8 BOM if there's
> one.

For example:

{code}
const
  Utf8Bom: array[0..2] of Byte = ($EF, $BB, $BF);
var
  utf8: UTF8String;
  ms: TMemoryStream;
  ptr: PAnsiChar;
  len: Integer;
begin
  ms := TMemoryStream.Create;
  try
    ms.LoadFromFile(csvfile);
    ms.Position := 0;
    ptr := PAnsiChar(ms.Memory);
    len := ms.Size;
    if len >= 3 then begin
      if CompareMem(ptr, @Utf8Bom[0], 3) then
      begin
        Inc(ptr, 3);
        Dec(len, 3);
      end;
    end;
    SetString(utf8, ptr, len);
  finally
    ms.Free;
  end;
  MyStringList.Text := Utf8ToAnsi(u);
end;
{code}

And of course, if you ever do upgrade to D2009 or later, TStringList.LoadFromFile() 
can handle all of those details for you via the TEncoding class:

{code}
MyStringList.LoadFromFile(csvfile, TEncoding.UTF8);
{code}

--
Remy Lebeau (TeamB)

Vote for best answer.
Score: 0  # Vote:  0
Date Posted: 15-Dec-2014, at 10:17 AM EST
From: Remy Lebeau (TeamB)
 
Re: TStringlist with special characters  
News Group: embarcadero.public.delphi.oodesign
Hi,


> I am using Delphi 7 and have a routine which takes a csv file with a series of records and imports them. This is done by loading it into a TStringList with MyStringList.LoadFromFile(csvfile) and then getting each line with line = MyStringList[i].
> This has always worked fine but I have now discovered that special characters are not picked up correctly. For example, Rue François Coppée comes out as Rue François Coppée - the accented French characters are the problem.
> Is there a simple way to solve this?

Relatively simple yes. It seems the csv uses an UTF8 encoding. You'll need to
that as an UTF8 string into memory, and then convert that to Ansi string using a (the default)
code page. Note that this may be a lossy conversion: any utf8 encoded character
that does have a mapping in the ansi code page is lost.
function Utf8ToAnsi() is what you need here.

When reading the file, you may also need to skip the UTF8 BOM if there's one.

Similar when you save the file you may need to encode the ansi string to UTF8 (and add a BOM).

I won't be posting the complete details, but if you're not able to google for the ansi to utf8 
and back routines, drop me an email




Gerrit Beuze
ModelMaker Tools

Vote for best answer.
Score: 0  # Vote:  0
Date Posted: 15-Dec-2014, at 9:11 AM EST
From: Gerrit Beuze
 
Re: TStringlist with special characters  
News Group: embarcadero.public.delphi.oodesign
> {quote:title=Tim Stainton wrote:}{quote}
> 
> Is there a simple way to solve this?

I think you move to D2009 or higher to solve that.

Vote for best answer.
Score: 0  # Vote:  0
Date Posted: 15-Dec-2014, at 8:57 AM EST
From: > Rich