Folder names in THttpServer
12 answers - 515 bytes -

Hello,
I have some reports and experience that there is a problem with non-ascii
characters in folder names with files. Here is the report I got:
The characters 80C@ turns out to be 80C?/A>. ATV
Q@Qjqsi{ turns out to be ATV Q?/A>.
R.@ turns out to be R.?/A>. And @m turns out to be ?/A>
it doesn't seem to be a specific character
Anyone have any idea why this happens? Is there an RFC that explains this
behavior and its cure?
Best Regards,
SubZero
No.1 | | 324 bytes |
| 
The characters 80C@ turns out to be 80C?/A>. ATV
I have no experience with turkish, russian and other chatacter set.
What I can say is that I have no problem with accented characters used in
french.
Maybe a unicode or double byte character set issue ?
btw: FTP is defined as a 8 bit ascii protocol.
No.2 | | 832 bytes |
| 
Hello,
Thank you for your reply. Turkish or Russian standard characters are not
unicode. They are 8-bit. In Turkish the first 7-bits (0-127) are the same as
ANSI but characters (128-255) are different.
Best Regards,
SZ
Message
From: "Francois PIETTE" <francois.piette (AT) overbyte (DOT) be>
To: "ICS support mailing" <twsocket (AT) elists (DOT) org>
Sent: Saturday, June 04, 2005 11:09 AM
Subject: Re: [twsocket] Folder names in THttpServer
The characters 80C@ turns out to be 80C?/A>. ATV
I have no experience with turkish, russian and other chatacter set.
What I can say is that I have no problem with accented characters used in
french.
Maybe a unicode or double byte character set issue ?
btw: FTP is defined as a 8 bit ascii protocol.
No.3 | | 439 bytes |
| 
Turkish or Russian standard characters are not unicode. They are 8-bit. In
Turkish the first 7-bits (0-127) are the same as ANSI but characters
(128-255) are different.
So it is the same as french. And it works very well with french text. If it
doesn't with turkish or russian characters, then there is something I don't
understand. You should single step thru the component code and try to
understand what.
No.4 | | 1083 bytes |
| 
I traced it to on command and it reads the input character by character
wrong on some characters such as "". Could you try with these characters?
I know they look corrupted in your email client but indeed they are valid
Turkish characters and when you put them in a folder name, you will see that
ICS server will make them really corrupt!
Best Regards,
SZ
Message
From: "Francois PIETTE" <francois.piette (AT) overbyte (DOT) be>
To: "ICS support mailing" <twsocket (AT) elists (DOT) org>
Sent: Saturday, June 04, 2005 12:28 PM
Subject: Re: [twsocket] Folder names in THttpServer
Turkish or Russian standard characters are not unicode. They are 8-bit. In
Turkish the first 7-bits (0-127) are the same as ANSI but characters
(128-255) are different.
So it is the same as french. And it works very well with french text. If it
doesn't with turkish or russian characters, then there is something I don't
understand. You should single step thru the component code and try to
understand what.
No.5 | | 1290 bytes |
| 
>I traced it to on command and it reads the input character by character
>wrong on some characters such as "".
Why is it wrong ? In a previous message you told me turkish was 8 bit
characters. Now I understand it is double byte characters.
Could you try with these characters? I know they look corrupted in your
email client but indeed they are valid
What you show me is actually TW characters. I see a lower case c with a
cedilla (like in my french first name "F" and lower case letter g. I
have no idea how to try with this character. If I try, they would be
interpreted as "" that is perfectly correct in french and works very well.
Turkish characters and when you put them in a folder name, you will see
that ICS server will make them really corrupt!
Please write a program (A very short BCB console mode program is K, better
in Delphi of course) that create such a directory with a filename with
turkish characters. Do a screen dump on your turkish screen so that I can
compare with mine (do a partial screen dump so that it is not too large,
just enough to see what you talk about). Put that screen dump on a server so
that everybody can download it, along with your short program
No.6 | | 1716 bytes |
| 
No Turkish is not 16 bit characters. I sent you the two distinct problematic
characters!
I will try to build a test code.
Best Regards,
SZ
Message
From: "Francois PIETTE" <francois.piette (AT) overbyte (DOT) be>
To: "ICS support mailing" <twsocket (AT) elists (DOT) org>
Sent: Sunday, June 05, 2005 12:09 PM
Subject: Re: [twsocket] Folder names in THttpServer
>I traced it to on command and it reads the input character by character
>wrong on some characters such as "".
Why is it wrong ? In a previous message you told me turkish was 8 bit
characters. Now I understand it is double byte characters.
Could you try with these characters? I know they look corrupted in your
email client but indeed they are valid
What you show me is actually TW characters. I see a lower case c with a
cedilla (like in my french first name "F" and lower case letter g. I
have no idea how to try with this character. If I try, they would be
interpreted as "" that is perfectly correct in french and works very well.
Turkish characters and when you put them in a folder name, you will see
that ICS server will make them really corrupt!
Please write a program (A very short BCB console mode program is K, better
in Delphi of course) that create such a directory with a filename with
turkish characters. Do a screen dump on your turkish screen so that I can
compare with mine (do a partial screen dump so that it is not too large,
just enough to see what you talk about). Put that screen dump on a server so
that everybody can download it, along with your short program
No.7 | | 3886 bytes |
| 
I miss earlier mails of this thread (they were accidentally deleted) but
picking up on this one (I hope I'm not missing the point entirely)
Make sure you are not confusing Unicode with MBCS and SBCS.
Unicode is always 16 bits
MBCS (Multi Byte Character Set) as opposed to SBCS (Single Byte CS) can
contain, for certain characters, two (or more) bytes, as opposed to only one
byte for most other characters in that same character set.
I'm not sure if Turkish can have MB characters ?
If you're clear about that, also know that Borland VCL is limited in it's
capabilities to properly show non-latin characters on latin-set systems.
This is because VCL is completely MBCS inside, if it were Unicode we
wouldn't have all these problems.
So, suppose Win2K or XP, they are unicode based and convert MBCS via a
system defined code page to Unicode before displaying it anywhere.
If, for instance, your system's code page is set to "UK English" (as an
example) and you try to use foreign characters in a MBCS application (like
Borland builds them) then you will have problems seeing the correct
characters !
To work around that you have to tell XP that you want to use a different
code page (and next reboot your system)
To do this (XP) :
My Computer / Control panel / Date, Time, Language and regional settings /
Regional and Language / (third tab) Advanced
Language for non-unicode programs
See what language is specified there. If not Turkish, try setting it to
Turkish and see if this fixes things (*)
(*) Without really knowing what the real issue is ;-))
Like said, I hope I'm not beside the point entirely
Best Regards,
Peter
Peter Van Hove
CD and DVD Data recovery
Peter (AT) Smart-Projects (DOT) net
www.Smart-Projects.net
www.IsoBuster.com
Message
From: "Fastream Technologies" <gates (AT) fastream (DOT) com>
To: "ICS support mailing" <twsocket (AT) elists (DOT) org>
Sent: Sunday, June 05, 2005 4:46 PM
Subject: Re: [twsocket] Folder names in THttpServer
No Turkish is not 16 bit characters. I sent you the two distinct problematic
characters!
I will try to build a test code.
Best Regards,
SZ
Message
From: "Francois PIETTE" <francois.piette (AT) overbyte (DOT) be>
To: "ICS support mailing" <twsocket (AT) elists (DOT) org>
Sent: Sunday, June 05, 2005 12:09 PM
Subject: Re: [twsocket] Folder names in THttpServer
>I traced it to on command and it reads the input character by character
>wrong on some characters such as "".
Why is it wrong ? In a previous message you told me turkish was 8 bit
characters. Now I understand it is double byte characters.
Could you try with these characters? I know they look corrupted in your
email client but indeed they are valid
What you show me is actually TW characters. I see a lower case c with a
cedilla (like in my french first name "F" and lower case letter g. I
have no idea how to try with this character. If I try, they would be
interpreted as "" that is perfectly correct in french and works very well.
Turkish characters and when you put them in a folder name, you will see
that ICS server will make them really corrupt!
Please write a program (A very short BCB console mode program is K, better
in Delphi of course) that create such a directory with a filename with
turkish characters. Do a screen dump on your turkish screen so that I can
compare with mine (do a partial screen dump so that it is not too large,
just enough to see what you talk about). Put that screen dump on a server so
that everybody can download it, along with your short program
No.8 | | 246 bytes |
| 
No Turkish is not 16 bit characters. I sent you the two distinct
problematic characters!
Those characters, as received are used in french and wroks perfectly well
So there is something different between turkish and french characters.
No.9 | | 1364 bytes |
| 
Hello,
I just recall, that I hade a long time ago a simular problem with a
French customar. He gived me text in a file, that I should transmit to
vehicles. At a certain moment he was changing to new machines, and lots
of parts of text where corrupted in my logs. However in the customar's
log not (he did backoffice program, and I did the gateway/mobile part).
It seemed after viewing the text files with a hex editor, in the
beginning some characters where 2 byte, and in some parts of the file
the same (2 byte). However moste characters where 1 byte.
I hade sent someone else to him and I recall it was fixed with changing
something in regional settings. It was certainly not unicode, because
what I understeand of unicode is that each character is 2 byte.
I will try to trace back what the engineer I sent did. Not easy He is
the kind of person who write nothing down and trust his volatile memory
:)
Rgds, Wilfried
http://www.mestdagh.biz
Sunday, June 5, 2005, 17:37, Francois PIETTE wrote:
>No Turkish is not 16 bit characters. I sent you the two distinct
>problematic characters!
Those characters, as received are used in french and wroks perfectly well.
So there is something different between turkish and french characters.
No.10 | | 2555 bytes |
| 
Hello Peter,
this confirms my reply I did an hour ago or so. program wrote to
files (written in VB), and other program (written in Delphi) was reading
corrupted text. The text was always corrupted in the first bytes and
here and there also. Most part was K. And the log's of the VB program
showed nice text. It was all in French.
I recall that some change in regional settings fixed it, as you mention
also.
I googled a little on MBCS, and there seems to be different on win9x and
NT systems. From what I have found is what you say, if regional settings
are correct then it should be translated with the right code page.
Rgds, Wilfried
http://www.mestdagh.biz
Sunday, June 5, 2005, 17:31, Peter Van Hove wrote:
I miss earlier mails of this thread (they were accidentally deleted) but
picking up on this one (I hope I'm not missing the point entirely)
Make sure you are not confusing Unicode with MBCS and SBCS.
Unicode is always 16 bits
MBCS (Multi Byte Character Set) as opposed to SBCS (Single Byte CS) can
contain, for certain characters, two (or more) bytes, as opposed to only one
byte for most other characters in that same character set.
I'm not sure if Turkish can have MB characters ?
If you're clear about that, also know that Borland VCL is limited in it's
capabilities to properly show non-latin characters on latin-set systems.
This is because VCL is completely MBCS inside, if it were Unicode we
wouldn't have all these problems.
So, suppose Win2K or XP, they are unicode based and convert MBCS via a
system defined code page to Unicode before displaying it anywhere.
If, for instance, your system's code page is set to "UK English" (as an
example) and you try to use foreign characters in a MBCS application (like
Borland builds them) then you will have problems seeing the correct
characters !
To work around that you have to tell XP that you want to use a different
code page (and next reboot your system)
To do this (XP) :
My Computer / Control panel / Date, Time, Language and regional settings /
Regional and Language / (third tab) Advanced
Language for non-unicode programs
See what language is specified there. If not Turkish, try setting it to
Turkish and see if this fixes things (*)
(*) Without really knowing what the real issue is ;-))
Like said, I hope I'm not beside the point entirely
Best Regards,
Peter
No.11 | | 4381 bytes |
| 
Hello Peter,
Now I still see the folder name corrupted in DS FTP but I can get into the
folder with the seen corrupted folder name. However filezilla works fine so
there should be no problem!
Thanks a LT! :-))
SZ
Message
From: "Peter Van Hove" <Peter (AT) Smart-Projects (DOT) net>
To: "ICS support mailing" <twsocket (AT) elists (DOT) org>
Sent: Sunday, June 05, 2005 6:31 PM
Subject: Re: [twsocket] Folder names in THttpServer
I miss earlier mails of this thread (they were accidentally deleted) but
picking up on this one (I hope I'm not missing the point entirely)
Make sure you are not confusing Unicode with MBCS and SBCS.
Unicode is always 16 bits
MBCS (Multi Byte Character Set) as opposed to SBCS (Single Byte CS) can
contain, for certain characters, two (or more) bytes, as opposed to only one
byte for most other characters in that same character set.
I'm not sure if Turkish can have MB characters ?
If you're clear about that, also know that Borland VCL is limited in it's
capabilities to properly show non-latin characters on latin-set systems.
This is because VCL is completely MBCS inside, if it were Unicode we
wouldn't have all these problems.
So, suppose Win2K or XP, they are unicode based and convert MBCS via a
system defined code page to Unicode before displaying it anywhere.
If, for instance, your system's code page is set to "UK English" (as an
example) and you try to use foreign characters in a MBCS application (like
Borland builds them) then you will have problems seeing the correct
characters !
To work around that you have to tell XP that you want to use a different
code page (and next reboot your system)
To do this (XP) :
My Computer / Control panel / Date, Time, Language and regional settings /
Regional and Language / (third tab) Advanced
Language for non-unicode programs
See what language is specified there. If not Turkish, try setting it to
Turkish and see if this fixes things (*)
(*) Without really knowing what the real issue is ;-))
Like said, I hope I'm not beside the point entirely
Best Regards,
Peter
Peter Van Hove
CD and DVD Data recovery
Peter (AT) Smart-Projects (DOT) net
www.Smart-Projects.net
www.IsoBuster.com
Message
From: "Fastream Technologies" <gates (AT) fastream (DOT) com>
To: "ICS support mailing" <twsocket (AT) elists (DOT) org>
Sent: Sunday, June 05, 2005 4:46 PM
Subject: Re: [twsocket] Folder names in THttpServer
No Turkish is not 16 bit characters. I sent you the two distinct problematic
characters!
I will try to build a test code.
Best Regards,
SZ
Message
From: "Francois PIETTE" <francois.piette (AT) overbyte (DOT) be>
To: "ICS support mailing" <twsocket (AT) elists (DOT) org>
Sent: Sunday, June 05, 2005 12:09 PM
Subject: Re: [twsocket] Folder names in THttpServer
>I traced it to on command and it reads the input character by character
>wrong on some characters such as "".
Why is it wrong ? In a previous message you told me turkish was 8 bit
characters. Now I understand it is double byte characters.
Could you try with these characters? I know they look corrupted in your
email client but indeed they are valid
What you show me is actually TW characters. I see a lower case c with a
cedilla (like in my french first name "F" and lower case letter g. I
have no idea how to try with this character. If I try, they would be
interpreted as "" that is perfectly correct in french and works very well.
Turkish characters and when you put them in a folder name, you will see
that ICS server will make them really corrupt!
Please write a program (A very short BCB console mode program is K, better
in Delphi of course) that create such a directory with a filename with
turkish characters. Do a screen dump on your turkish screen so that I can
compare with mine (do a partial screen dump so that it is not too large,
just enough to see what you talk about). Put that screen dump on a server so
that everybody can download it, along with your short program
No.12 | | 4163 bytes |
| 
Hi,
Good to hear that things are working (more or less) now.
Again, I'm not sure what the original question was, but if you created a
folder with the old code page it may indeed still contain garbled characters
(because the conversion never happened properly). If so, create the folder
again, this time with correct settings.
FYI, for those interested :
As for MBCS and the limitations it causes, it is frustrating that Borland
has never addressed this properly.
For a unicode-internally application such as mine, it is sad to see Borland
convert everything to MBCS, hand that to Windows (because VCL wraps around
the MBCS APIs), and next Windows needs to converts to Unicode again.
For those interested I once put this to Team B in the newsgroups :
#00a00b5180de3cb2
If you want your Borland app to display things properly for all languages on
all systems you need to resort to third party Unicode components over the
standard VCL components. What's stupid about this is that you need to
completely replace all VCL objects and I personally haven't found the "moral
strength" to start doing that in my apps :-)
E.g. :
(This is NT tested by me, but mentioned here just to complete the
information on this topic)
Hope this helps.
Best Regards,
Peter
Peter Van Hove
CD and DVD Data recovery
Peter (AT) Smart-Projects (DOT) net
www.Smart-Projects.net
www.IsoBuster.com
Message
From: "Fastream Technologies" <gates (AT) fastream (DOT) com>
To: "ICS support mailing" <twsocket (AT) elists (DOT) org>
Sent: Sunday, June 05, 2005 7:46 PM
Subject: Re: [twsocket] Folder names in THttpServer
Hello Peter,
Now I still see the folder name corrupted in DS FTP but I can get into the
folder with the seen corrupted folder name. However filezilla works fine so
there should be no problem!
Thanks a LT! :-))
SZ
Message
From: "Peter Van Hove" <Peter (AT) Smart-Projects (DOT) net>
To: "ICS support mailing" <twsocket (AT) elists (DOT) org>
Sent: Sunday, June 05, 2005 6:31 PM
Subject: Re: [twsocket] Folder names in THttpServer
I miss earlier mails of this thread (they were accidentally deleted) but
picking up on this one (I hope I'm not missing the point entirely)
Make sure you are not confusing Unicode with MBCS and SBCS.
Unicode is always 16 bits
MBCS (Multi Byte Character Set) as opposed to SBCS (Single Byte CS) can
contain, for certain characters, two (or more) bytes, as opposed to only one
byte for most other characters in that same character set.
I'm not sure if Turkish can have MB characters ?
If you're clear about that, also know that Borland VCL is limited in it's
capabilities to properly show non-latin characters on latin-set systems.
This is because VCL is completely MBCS inside, if it were Unicode we
wouldn't have all these problems.
So, suppose Win2K or XP, they are unicode based and convert MBCS via a
system defined code page to Unicode before displaying it anywhere.
If, for instance, your system's code page is set to "UK English" (as an
example) and you try to use foreign characters in a MBCS application (like
Borland builds them) then you will have problems seeing the correct
characters !
To work around that you have to tell XP that you want to use a different
code page (and next reboot your system)
To do this (XP) :
My Computer / Control panel / Date, Time, Language and regional settings /
Regional and Language / (third tab) Advanced
Language for non-unicode programs
See what language is specified there. If not Turkish, try setting it to
Turkish and see if this fixes things (*)
(*) Without really knowing what the real issue is ;-))
Like said, I hope I'm not beside the point entirely
Best Regards,
Peter
Peter Van Hove
CD and DVD Data recovery
Peter (AT) Smart-Projects (DOT) net
www.Smart-Projects.net
www.IsoBuster.com