Mega Search
23.2 Million


Sign Up

Make a donation  
TIdMessage unsupported charsets (GB2312, ISO-2022-JP, some o  
News Group: embarcadero.public.cppbuilder.internet.socket

Any chance of supporting GB2312 encoding? It is widespread encoding supporting  99.75% Chinese characters, still widely used in China but yet, unsupported by Indy?

Additionally, ISO-2022-JP message fails with the exception: EIdException with message 'Invalid codepage (50220)'.

Obviously it translates ISO-2022-JP into 50220 codepage (as listed on this page - http://msdn.microsoft.com/en-us/library/windows/desktop/dd317756%28v=vs.85%29.aspx ) but the decoding fails with above exception.

There are also some issues with certain Hebrew encoding messages. As well as Russian (KOI8-R) and Ukranian (KOI8-U) encodings.

Is there a way to support such encodings to be properly translated to Unicode like the rest of the pack?

I can provide the test messages for testing the decoder where it fails if needed.

The above was tested with latest Indy (5231).

Vote for best question.
Score: 0  # Vote:  0
Date Posted: 26-Dec-2014, at 1:24 PM EST
From: John May
 
Re: TIdMessage unsupported charsets (GB2312, ISO-2022-JP, so  
News Group: embarcadero.public.cppbuilder.internet.socket
John wrote:

> Is it possible to call some system (Windows) API to do the translation
> (similar to "iconv") if Indy is running on Windows (e.g. MultiByteToWideChar)?
> Or maybe use lib-iconv?

Indy already does exactly that.  As I said earlier, the error you were seeing 
('Invalid codepage') was because Indy *did* attempt to translate data using 
the Win32 API but failed because the relevant codepage was not installed. 
 Indy can translate charset data use any codepage that is installed in Windows, 
and a few hooks if you need to support any uninstalled charsets/codepages.

> Or, does D2009+ have this native (eg. provide AnsiString with specific
> encoding and then copy it to UnicodeString leaving the Delphi/C++ to
> do the conversion themselves?

The RTL's native AnsiString charset handling is based on installed codepages 
in Windows, and on iconv on POSIX systems.

> Don't they already provide the built-in conversion of the specific encodings
> into UnicodeString

Yes.  But do keep in mind that starting with the mobile compilers, all of 
that functionality is *disabled* because AnsiString (and AnsiString(N), which 
includes RawByteString and UTF8String) is being phased out.  Who is to say 
that they won't eventually phase it out of the desktop compilers as well.

> and have already built-in conversion tables?

No, they do not.

--
Remy Lebeau (TeamB)

Vote for best answer.
Score: 0  # Vote:  0
Date Posted: 26-Dec-2014, at 7:05 PM EST
From: Remy Lebeau (TeamB)
 
Re: TIdMessage unsupported charsets (GB2312, ISO-2022-JP, so  
News Group: embarcadero.public.cppbuilder.internet.socket
> Indy does not implement any charsets natively other than ASCII and UTFs. 
>  Even Indy's native ISO-2022-JP implemention is incomplete.  Indy primarily 
> relies on the OS to provide charset handling for 99% of Indy's charset work.

I am not really asking to provide native decoding and decoding tables. That is probably not needed.

Is it possible to call some system (Windows) API to do the translation (similar to "iconv") if Indy is running on Windows (e.g. MultiByteToWideChar)? Or maybe use lib-iconv?

Or, does D2009+ have this native (eg. provide AnsiString with specific encoding and then copy it to UnicodeString leaving the Delphi/C++ to do the conversion themselves? Don't they already provide the built-in conversion of the specific encodings into UnicodeString and have already built-in conversion tables?

Vote for best answer.
Score: 0  # Vote:  0
Date Posted: 26-Dec-2014, at 5:15 PM EST
From: John May
 
Re: TIdMessage unsupported charsets (GB2312, ISO-2022-JP, so  
News Group: embarcadero.public.cppbuilder.internet.socket
John wrote:

> Any chance of supporting GB2312 encoding?

Natively in Indy itself?  Hmm, maybe...

> It is widespread encoding supporting  99.75% Chinese characters, still
> widely used in China but yet, unsupported by Indy?

Indy does not implement any charsets natively other than ASCII and UTFs. 
 Even Indy's native ISO-2022-JP implemention is incomplete.  Indy primarily 
relies on the OS to provide charset handling for 99% of Indy's charset work.

> Additionally, ISO-2022-JP message fails with the exception:
> EIdException with message 'Invalid codepage (50220)'.
>
> Obviously it translates ISO-2022-JP into 50220 codepage... but the decoding
> fails with above exception.

That error means codepage 50220 is not installed/available.  Have you tried 
installing it?

> There are also some issues with certain Hebrew encoding messages. As
> well as Russian (KOI8-R) and Ukranian (KOI8-U) encodings.

Indy depends on OS codepages to handle all of those charsets.  Although those 
particular charsets are much easier to implement manually than GB2313 and 
ISO-2022-JP are, so they might be able to add them sooner rather than later.

> Is there a way to support such encodings to be properly translated to
> Unicode like the rest of the pack?

You can provide a handler to the global IdGlobalProtocol.GIdEncodingNeeded 
callback, and have it return a custom IIdTextEncoding implementation for 
any charset you implement yourself.

--
Remy Lebeau (TeamB)

Vote for best answer.
Score: 0  # Vote:  0
Date Posted: 26-Dec-2014, at 2:04 PM EST
From: Remy Lebeau (TeamB)