Ongoing encoding issues
7 answers - 725 bytes -

Hi all, I posted a question a couple of days ago regarding a web app I have
wherein users are able to indicated prices and concessions via a text field,
and the resulting encoding issues I have experienced, the main one being
seeing the pound sign as if viewing the results in a browser with the
encoding set to Latin-1.
My question is, how do I overcome this. If I set my browser encoding to
Latin-1 and enter the data I get that odd symbol, if I set it to UTF-8 I get
clean data. Is there a way to sniff out what encoding the browser is using
and then clean the data in any way.
I am googling for help also but you guys have been so helpful in the past I
thought I'd try you also.
No.1 | | 1307 bytes |
| 
Dave Goodchild wrote:
Hi all, I posted a question a couple of days ago regarding a web app I have
wherein users are able to indicated prices and concessions via a text
field,
and the resulting encoding issues I have experienced, the main one being
seeing the pound sign as if viewing the results in a browser with the
encoding set to Latin-1.
My question is, how do I overcome this. If I set my browser encoding to
Latin-1 and enter the data I get that odd symbol, if I set it to UTF-8 I
get
clean data. Is there a way to sniff out what encoding the browser is using
and then clean the data in any way.
check out phpinfo(); there is stuff in there telling you about what client
encoding was [probably] used.
that said you should probably opt to output everything as UTF-8 - all decent
browsers will return data in the same encoding as the page was given to them in
by default - this requires you to have php send the correct header (don't
bother with all that META tag crap), doing the following will automatically cause
the appropriate header to be sent:
ini_set('output_encoding', 'UTF-8');
I am googling for help also but you guys have been so helpful in the past I
thought I'd try you also.
No.2 | | 1449 bytes |
| 
Hi Dave.
I don't think you are able to detect your users character encoding
with php only (at least not rock-solid). Just some days ago, there
was a discussion about that issue (at least concerning Safari) on the
Apple web dev mailing list.
Have a look at:
I could be possible to send some information about character encoding
along with the user submitted post data to your php script as well.
Depending on that encoding, do some string replace on your input data.
Have you provided a valid "charset" encoding in your html?
Maybe you could give us a link to a test page?
//frank
26 jan 2007 kl. 10.33 skrev Dave Goodchild:
Hi all, I posted a question a couple of days ago regarding a web
app I have
wherein users are able to indicated prices and concessions via a
text field,
and the resulting encoding issues I have experienced, the main one
being
seeing the pound sign as if viewing the results in a browser
with the
encoding set to Latin-1.
My question is, how do I overcome this. If I set my browser
encoding to
Latin-1 and enter the data I get that odd symbol, if I set it to
UTF-8 I get
clean data. Is there a way to sniff out what encoding the browser
is using
and then clean the data in any way.
I am googling for help also but you guys have been so helpful in
the past I
thought I'd try you also.
No.3 | | 677 bytes |
| 
# frank.arensmeier (AT) nikehydraulics (DOT) se / 2007-01-26 14:29:52 +0100:
I don't think you are able to detect your users character encoding
with php only (at least not rock-solid). Just some days ago, there
was a discussion about that issue (at least concerning Safari) on the
Apple web dev mailing list.
Have a look at:
That thread is about a different problem.
I could be possible to send some information about character encoding
along with the user submitted post data to your php script as well.
Yeah, it's called the Content-Type entity header, and it's an important
part of HTTP Have you heard about HTTP?
No.4 | | 1514 bytes |
| 
# buddhamagnet (AT) gmail (DOT) com / 2007-01-26 09:33:13 +0000:
Hi all, I posted a question a couple of days ago regarding a web app I have
wherein users are able to indicated prices and concessions via a text field,
and the resulting encoding issues I have experienced, the main one being
seeing the pound sign as ? if viewing the results in a browser with the
encoding set to Latin-1.
My question is, how do I overcome this. If I set my browser encoding to
Latin-1 and enter the data I get that odd symbol, if I set it to UTF-8 I get
clean data. Is there a way to sniff out what encoding the browser is using
and then clean the data in any way.
I am googling for help also but you guys have been so helpful in the past I
thought I'd try you also.
Your PostgreSQL database uses some encoding, your PHP script runs under
some locale (incl. character encoding), and the browser sent the text in
some encoding. PostgreSQL assumes the input data is in the charset the
database uses (unless you have client_encoding set in postgresql.conf, or
PGCLIENTENCDING (IIRC) in the environment, or have set client_encoding
using the SET command).
It's important that you correctly identify encoding of the inserted data
to PostgreSQL or convert it to the encoding it expects beforehand. You
can use iconv or recode functions in PHP, I'd probably have a look if
there's an apache input filter for character encoding conversions.
No.5 | | 1373 bytes |
| 
# neuhauser (AT) sigpipe (DOT) cz / 2007-01-26 21:09:34 +0000:
# buddhamagnet (AT) gmail (DOT) com / 2007-01-26 09:33:13 +0000:
Hi all, I posted a question a couple of days ago regarding a web app I have
wherein users are able to indicated prices and concessions via a text field,
and the resulting encoding issues I have experienced, the main one being
seeing the pound sign as ? if viewing the results in a browser with the
encoding set to Latin-1.
Your PostgreSQL database uses some encoding,
Dave pointed out to me that he's using MySQL. That means the
configuration mechanisms for the database will be different, but the
principal issue remains the same.
your PHP script runs under some locale (incl. character encoding), and
the browser sent the text in some encoding. PostgreSQL assumes the
input data is in the charset the database uses (unless you have
client_encoding set in postgresql.conf, or PGCLIENTENCDING (IIRC) in
the environment, or have set client_encoding using the SET command).
It's important that you correctly identify encoding of the inserted data
to PostgreSQL or convert it to the encoding it expects beforehand. You
can use iconv or recode functions in PHP, I'd probably have a look if
there's an apache input filter for character encoding conversions.
No.6 | | 1141 bytes |
| 
Fri, January 26, 2007 3:33 am, Dave Goodchild wrote:
Hi all, I posted a question a couple of days ago regarding a web app I
have
wherein users are able to indicated prices and concessions via a text
field,
and the resulting encoding issues I have experienced, the main one
being
seeing the pound sign as if viewing the results in a browser with
the
encoding set to Latin-1.
My question is, how do I overcome this. If I set my browser encoding
to
Latin-1 and enter the data I get that odd symbol, if I set it to UTF-8
I get
clean data. Is there a way to sniff out what encoding the browser is
using
and then clean the data in any way.
I am googling for help also but you guys have been so helpful in the
past I
thought I'd try you also.
Send the charset in your headers *AND* set it in a META tag.
Firefox trusts the headers.
IE trusts the META tag.
If the user insists on viewing your UTF-8 document with Latin-1 after
that, then they probably had to work pretty hard at it, and you should
just leave them alone with the bed they have made.
No.7 | | 1016 bytes |
| 
Fri, January 26, 2007 7:24 am, Jochem Maas wrote:
Dave Goodchild wrote:
that said you should probably opt to output everything as UTF-8 - all
decent
browsers will return data in the same encoding as the page was given
to them in
by default - this requires you to have php send the correct header
(don't
bother with all that META tag crap), doing the following will
automatically cause
the appropriate header to be sent:
*Do* bother with the META tag crap.
MS IE will ignore the headers and attempt to "guess" the charset
otherwise, based on some funky algorithm they've made up to compare
the bytes in the HTML to what they "expect" for any given charset, and
you'll get weird and confusing cases when the same "page" won't be
UTF-8 just because the data within it suddenly tips over their
count-point of what "should" be in a UTF-8 or Latin-1 document.
Don't blame me -- I'm just reporting the behaviour. Talk to Bill.