Mega Search
23.2 Million


Sign Up

Make a donation  
a message causing Indy mesasage decoder to enter infinite lo  
News Group: embarcadero.public.cppbuilder.internet.socket

I have a message which is obviously malformed and cannot be decoded properly (SUBJECT, FROM and TO are even decoded OK in Outlook), but in Indy it causes infinite loop. I'm OK with message not being decoded well due to being malformed (even though, subject and from line could probably be decoded good enough. But it should not infinite loop the decoder. I have Indy 5231 which is not the latest but it is possibly it is a problem not addressed yet. Here goes:

{code}
Subject: abc, =?UTF-8?B?TmVl?= =?UTF-8?B?ZCBDYQ==?= =?UTF-8?B?c2ggUXU=?= =?UTF-8?B?aWNrIEdl?= =?UTF-8?B?dCB1cHRvIFVT?= =?UTF-8?B?RDEw?= =?UTF-8?B?MDAgTm8=?= =?UTF-8?B?dw==?=
From: =?UTF-8?B?TmU=?= =?UTF-8?B?eHRQ?= =?UTF-8?B?YXlk?= =?UTF-8?B?YXlBZHY=?= =?UTF-8?B?YW5j?= =?UTF-8?B?ZQ==?= 
To: "549b69b9a719d"abc@def.com.
Content-Type: text/html; charset= CP1026

msg content
{code}

I have tried some online decoders which decode subject and from also good. As for TO: line, Outlook decodes it as
{code}
"549b69b9a719d" as sender name and
"abc@def.com." as email address
{code}

And Outlook Express decodes the same as:
{code}
"549b69b9a719d" as sender name and
"abc@def.com" as email address (no last dot)
{code}

 What Indy does I cannot say because it loops forever.

*An update:*

After testing this more I found that Indy indeed manages to decode the message. However, it takes huge amount of time to do so a few minutes and during that time memory usage *increases* for the process! Why exactly I cannot tell, above 2 programs do it instantly. After it finally decodes it (in about 5 or 10 minutes on my PC), the memory usage is back to what it used to be before so it is not a memory leak I think. Subject from and to lines look OK. Sometimes it may end up in "out of memory" error though
 if left for a long time.

Vote for best question.
Score: 0  # Vote:  0
Date Posted: 1-Jan-2015, at 8:07 AM EST
From: John May
 
Re: a message causing Indy mesasage decoder to enter infinit  
News Group: embarcadero.public.cppbuilder.internet.socket
> due to the bugs I mentioned above.  I have to figure out how to get around 
> that, withough breaking Indy in the process.

OK, thank you for your work on this. Please read your private messages.

Vote for best answer.
Score: 0  # Vote:  0
Date Posted: 5-Jan-2015, at 10:57 AM EST
From: John May
 
Re: a message causing Indy mesasage decoder to enter infinit  
News Group: embarcadero.public.cppbuilder.internet.socket
John wrote:

> ReadLn() is supposed to read a line right?

It reads until the specified terminator has been read, where LF is the default 
terminator, thus ReadLn() effectly reads a line by default, yes.

> It looks for $0a as line end.

That is the intent.  However, ReadLn() encodes the specified terminator to 
a byte array using the specified byte encoding so it matches the rest of 
the data belonging to the line being read.  ReadLn() searches the raw data 
of the IOHandler.InputBuffer looking for the encoded terminator.  ReadLn() 
does not give any other special considerations to the encoded terminator, 
so it does not enforce the terminator being in a specific encoding, such 
as LF always being encoded as $0A.  In the majority of charsets you are likely 
to encounter in the wild, a LF will always encode to byte $0A as most charsets 
are ASCII compatible for characters #00-#127.  But it turns out that cp1026 
(and probably other EBCDIC-based charsets) does not do that.  Which is why 
ReadLn() is getting stuck.

> How about adding that it gives up and delivers a single line if it reaches 
end of file?

ReadLn() has no concept of files or streams, so it can't detect end-of-file. 
 That is not its job anyway.  That is the job of lower-level code to handle. 
 That being said, in the case of loading a file/stream into a TIdMessage, 
Indy actually does detect end-of-file and stop reading.  The problem with 
that in this particular situation is that cp1026 exposed some bugs I found 
the internal parsing:

1) TIdIOHandlerStreamMsg.Readable() returns False instead of True when EOF 
is reached, causing ReadLn() to return with ReadLnTimedOut=True instead of 
detecting a "disconnect" condition so the caller knows that no more data 
can be read.

2) TIdMessageClient (which TIdMessage uses internally) does not handle the 
case where ReadLn() "times out", so it just keeps reading expecting more 
data, thus gets stuck in a timeout loop.  A lot of code (and not just Indy 
code, but end user code as well) that uses ReadLn() tend to not be timeout-aware, 
as that requires checking the TIdIOHandler.ReadLnTimedOut property when ReadLn() 
exits.  There is a feature request in Indy's issue trackers to add a new 
ReadLnTimeoutAction property to TIdIOHandler so ReadLn() can raise an exception 
when a timeout occurs, which is more in line with how Indy normally handles 
errors.

3) There is also the issue that TIdMessageClient is expecting to read the 
end-of-email terminator, which TIdIOHandlerStreamMsg synthesizes when EOF 
is reached, but cp1026 is messing up that terminator before TIdMessageClient 
sees it, so TIdMessageClient does not know that EOF has been reached and 
keeps reading.  That goes back to the timeout bug above.

> The message would still be decoded incorrectly but it would not get into 
infinite loop,
> which is not a preferable situation for any software.

Actually, I think it would still get stuck, and I explained why earlier. 
 It is not just the line break detection at fault, there are other factors 
at play.

> The message does not decoded properly in email clients as well, that is 
not the problem
> but they do not freeze like Indy.

They are likely not affected by cp1026 messing up their line break and EOF 
detection logic.

> They probably look until the end of file if nothing is found they give 
up and deliver what they
> can. So why not looking for end of file (or file buffer) and if ReadLn() 
finds it, it stops there?

That is not the problem.  The problem is that Indy *is* already detecting 
EOF and stops reading, but the email parser doesn't know that EOF was reached 
due to the bugs I mentioned above.  I have to figure out how to get around 
that, withough breaking Indy in the process.

--
Remy Lebeau (TeamB)

Vote for best answer.
Score: 0  # Vote:  0
Date Posted: 5-Jan-2015, at 9:54 AM EST
From: Remy Lebeau (TeamB)
 
Re: a message causing Indy mesasage decoder to enter infinit  
News Group: embarcadero.public.cppbuilder.internet.socket
> {quote:title=Remy Lebeau (TeamB) wrote:}{quote}
> #$0A when converting bytes to characters).  Thus, ReadLn() gets stuck in 
> an endless loop waiting for byte $25, which it will never see.

ReadLn() is supposed to read a line right? It looks for $0a as line end. How about adding that it gives up and delivers a single line if it reaches end of file? The message would still be decoded incorrectly but it would not get into infinite loop, which is not a preferable situation for any software. The message does not decoded properly in email clients as well, that is not the problem but they do not freeze like Indy. They probably look until the end of file if nothing is found they give up and deliver
 what they can. So why not looking for end of file (or file buffer) and if ReadLn() finds it, it stops there? In other words, have 2 terminators for ReadLn - $0a or end of file, whichever comes first.

Vote for best answer.
Score: 0  # Vote:  0
Date Posted: 3-Jan-2015, at 6:14 AM EST
From: John May
 
Re: a message causing Indy mesasage decoder to enter infinit  
News Group: embarcadero.public.cppbuilder.internet.socket
Remy wrote:

> The main problem is that Indy is using cp1026 to process syntax
> elements that should be not processed using the email's charset.
> Changing that would require a rewrite of Indy's parser, and that is
> not going to happen in Indy 10.  Maybe this will be addressed in Indy
> 11.

I have opened tickets in Indy's issue trackers for this problem.

--
Remy Lebeau (TeamB)

Vote for best answer.
Score: 0  # Vote:  0
Date Posted: 2-Jan-2015, at 3:13 PM EST
From: Remy Lebeau (TeamB)
 
Re: a message causing Indy mesasage decoder to enter infinit  
News Group: embarcadero.public.cppbuilder.internet.socket
John wrote:

> Content-Type: text/html; charset= CP1026

Charset cp1026 is the cause of your problem.  As soon as I remove that, the 
message decodes just fine.

After Indy has read the headers and attempts to read the 'msg content' line, 
it calls TIdIOHandler.ReadLn() with the ATerminator parameter set to character 
#10 (LF).  ReadLn() is being passed an IIdTextEncoding that represents codepage 
1026, which is converting the LF into byte $25 instead of the expected byte 
$0A (it also converts byte $0A into character #$8E instead of the expected 
#$0A when converting bytes to characters).  Thus, ReadLn() gets stuck in 
an endless loop waiting for byte $25, which it will never see.

Even if I were to change Indy to force ReadLn() to look for $0A instead of 
$25 when ATerminator=LF, TIdMessage would still not decode the email correctly 
when using cp1026.  When the email terminator '.' is read (Indy synthesizes 
it because your email is missing it), cp1026 converts it to character #6 
instead of '.', so it does not match the terminator that TIdMessage is expecting, 
which will cause more blockage issues.

The main problem is that Indy is using cp1026 to process syntax elements 
that should be not processed using the email's charset.  Changing that would 
require a rewrite of Indy's parser, and that is not going to happen in Indy 
10.  Maybe this will be addressed in Indy 11.

--
Remy Lebeau (TeamB)

Vote for best answer.
Score: 0  # Vote:  0
Date Posted: 2-Jan-2015, at 2:02 PM EST
From: Remy Lebeau (TeamB)