Mega Search
23.2 Million


Sign Up

Make a donation  
A malformed message causing issue with the decoder [Edit]  
News Group: embarcadero.public.cppbuilder.internet.socket

I have another malformed message causing issue with Indy decoder:

{code}
--KEJ3581-rC-4502470-254=_!
Content-Type: text/html; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64

\CjxodG1sPgo8Y2VudGVyPgo8Ym9keT4KPCEtLSAyaW5kaXQgLSBpbiBnbHVtYSAtIHNhIGltcHJ1
bXV0IGNoaXB1bCB1bnVpIGNvbnNpbGllciAtLT48IS0tICwsQSwgbWkgc2UgcGFyZSBjYSBtaS1h
{code}

If I remove the "\" which is at the beginning of the MIME part the message is decoded well. "\" is not in the list of valid characters for base64.

Outlook Express, Outlook and other programs correctly decode the message, most likely because they ignore the invalid characters and Indy likely takes these invalid characters into account instead of ignoring them.

I think the problem is that people who send these messages test them on Outlook - they see it works and conclude it is good. Problem is that Indy decoder (not encoder which should be strict) is not tolerant enough of such malformed data and might be further adjusted to make it more compatible. I understand that Indy follows standards and that's fine for encoder. But decoder should have certain degree of tolerance against such malformed messages. I have also seen evidence of this being the case because in 
the MIME decoder of the Ararat Synapse decoder they have the following line:

{code}
//was 7bit before, but this is more compatible with RFC-ignorant outlook
Encoding := '8BIT'; 
{code}

It is clear from above evidence that they use 8bit default encoding as it seems to decode more messages properly (this is also a problem for Indy defaults), that many people test their web or php messages against Outlook as a "reference" of being well formed and as Outlook has a tolerance against such messages in its own decoder the end result is that decoders which respect standards need to be tweaked to be more compatible with such malformed messages.

Is it possible to strip out all non-valid data from the input (backslash "\" character in this case) to make the base64 input valid like Outlook and other programs do at the moment and make the decoder more compatible? In other words, if base64 decoder encounters an invalid character it just skips ahead as if the invalid character is not there? That should make the decoder happy and should not break the existing Indy stuff.

Vote for best question.
Score: 0  # Vote:  0
Date Posted: 19-Jan-2015, at 8:00 AM EST
From: John May
 
Re: A malformed message causing issue with the decoder [Edit  
News Group: embarcadero.public.cppbuilder.internet.socket
> {quote:title=Remy Lebeau (TeamB) wrote:}{quote}
> Without seeing the complete email, I can only guess, but I suspect the '\' 
> is actually not part of the base64 data itself, but it part of a higher level 
> transfer encoding that was not decoded during reading.  With the '\' present, 
> the base64 data shown would be 153 characters, which is not an even multiple 
> of 4 that base64 requires, and it also makes the first line be 77 characters 
> long.  Both of those are against standard base64 practice.  With the '\' 
> removed, the first line is 76 characters long, and the base64 data is 152 
> characters.  Both of which are within standard base64 ranges.

Thanks for looking into this. I PM-ed you a full message if you need it just in case. There is nothing extra in it. \ just appears out of place and should not be there.

Vote for best answer.
Score: 0  # Vote:  0
Date Posted: 19-Jan-2015, at 3:01 PM EST
From: John May
 
Re: A malformed message causing issue with the decoder [Edit  
News Group: embarcadero.public.cppbuilder.internet.socket
John wrote:

{code}
\CjxodG1sPgo8Y2VudGVyPgo8Ym9keT4KPCEtLSAyaW5kaXQgLSBpbiBnbHVtYSAtIHNhIGltcHJ1
bXV0IGNoaXB1bCB1bnVpIGNvbnNpbGllciAtLT48IS0tICwsQSwgbWkgc2UgcGFyZSBjYSBtaS1h
{code}

Without seeing the complete email, I can only guess, but I suspect the '\' 
is actually not part of the base64 data itself, but it part of a higher level 
transfer encoding that was not decoded during reading.  With the '\' present, 
the base64 data shown would be 153 characters, which is not an even multiple 
of 4 that base64 requires, and it also makes the first line be 77 characters 
long.  Both of those are against standard base64 practice.  With the '\' 
removed, the first line is 76 characters long, and the base64 data is 152 
characters.  Both of which are within standard base64 ranges.

> Outlook Express, Outlook and other programs correctly decode the
> message, most likely because they ignore the invalid characters and
> Indy likely takes these invalid characters into account instead of
> ignoring them.

Yes, it does.  So I re-read RFC 2045, which defines MIME's base64 algorithm. 
 It specifically says that illegal characters not defined in the base64 alphabet 
must be ignored during decoding.  So be it, but easier said than done, since 
Indy's base64 decoder shared common logic with the 00E, UUE, and XXE decoders. 
 So I would need to make sure not to break them while fixing the base64 decoder.

> Is it possible to strip out all non-valid data from the input (backslash 
"\"
> character in this case) to make the base64 input valid like Outlook and
> other programs do at the moment and make the decoder more compatible?

Indy uses streaming decoding, so the data may not be available up front when 
decoding begins, and thus cannot be stripped out ahead of time.  But I will 
look into having the base64 decoder ignore illegal data that is encountered 
during its decoding.

-- 
Remy Lebeau (TeamB)

Vote for best answer.
Score: 0  # Vote:  0
Date Posted: 19-Jan-2015, at 2:23 PM EST
From: Remy Lebeau (TeamB)