Discussion:
7.3.2 incorrectly counts characters for unicode varchar field
Matthew Cooper
2003-09-12 19:43:41 UTC
Permalink
============================================================================
POSTGRESQL BUG REPORT TEMPLATE
============================================================================

Your name : Matthew Cooper
Your email address : matty (at) cloverworxs (dot) com

System Configuration
---------------------
Architecture (example: Intel Pentium) : Intel Pentium
Operating System (example: Linux 2.0.26 ELF) : Redhat 8.0 / 9.0
PostgreSQL version (example: PostgreSQL-7.2.2): PostgreSQL-7.2.2 / 7.3.2
Compiler used (example: gcc 2.95.2) : none

Please enter a FULL description of your problem:
------------------------------------------------
I have a database with UNICODE encoding set. In it is a table with a
varchar(10) column. If I insert 10 western characters into it, it is OK. If
I insert 10 chinese characters it says:
postgresql value too long for type character varying 10
when using 7.3.2. If I use 7.2.2 it works fine.
Please describe a way to repeat the problem. Please try to provide a
concise reproducible example, if at all possible:
----------------------------------------------------------------------
createdb -E UNICODE mydb
Then in psql...
create table mgc (c1 varchar(10));
insert into mgc values('0123456789');
This all works fine.
Now I put the following command into a file (say my.sql) which is UTF-8
encoded and the literal is 10 chinese characters.
(I don't know if once emailed this command will be readable so you may have
to re-create the command by pasting 10 chinese characters into your
favourite UTF-8 compatible editor.)
insert into mgc values ('åˆ†é’Ÿç»ƒä¹ åˆ†é’Ÿç»ƒä¹ ç»ƒä¹ ');
I then run psql -f my.sql and get the error for 7.3.2 but it works for
7.2.2.

If you know how this problem might be fixed, list the solution below:
---------------------------------------------------------------------
I am guessing it is incorrectly counting the bytes and not the characters.
Presumably a workaround is to double the length of the field.


---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to ***@postgresql.org
Tom Lane
2003-09-13 22:51:23 UTC
Permalink
insert into mgc values ('åˆ†é’Şç»ƒÀ¹ åˆ†é’Şç»ƒÀ¹ 练À¹ ');
I don't think this string is correctly unicode-encoded. Anyway "length"
claims it is 30 characters.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org
Tom Lane
2003-09-15 15:10:19 UTC
Permalink
Attached is the UTF-8 encoded sql file in case it got messed up in the mail
transfer.
Ah, no doubt it did.

This works fine for me, using either 7.3.4 or CVS tip. Are you sure
that the system knows your client-side encoding is supposed to be UTF8?

uc=# show client_encoding ;
client_encoding
-----------------
UNICODE
(1 row)

uc=# create table mgc(f1 varchar(10));
CREATE TABLE
uc=# \i mgc.sql
INSERT 328444 1
uc=# select * from mgc;
f1
----------------------
͈†Õ’ŸÓ£ƒð¿ Íˆ†Õ’ŸÓ£ƒð¿ Ó£ƒð¿ 
(1 row)

uc=# select length(f1) from mgc;
length
--------
10
(1 row)


regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org
Matthew Cooper
2003-09-15 14:50:49 UTC
Permalink
Attached is the UTF-8 encoded sql file in case it got messed up in the mail
transfer.

And here it is pasted in directly from the window that was displaying
chinese characters.

insert into mgc values ('分钟练习分钟练习练习');


Looking at the UTF-8 documentation, 10 chinese characters could be any
number of bytes, each character being say 2 or 3 characters.

Matty.
----- Original Message -----
From: "Tom Lane" <***@sss.pgh.pa.us>
To: "Matthew Cooper" <***@cloverworxs.com>
Cc: <pgsql-***@postgresql.org>
Sent: Saturday, September 13, 2003 5:51 PM
Subject: Re: [BUGS] 7.3.2 incorrectly counts characters for unicode varchar
field
Post by Tom Lane
insert into mgc values ('Ã¥Ë?â? éâ?TŞç»Æ'À¹ Ã¥Ë?â?
éâ?TŞç»Æ'À¹ ç»Æ'À¹ ');
Post by Tom Lane
I don't think this string is correctly unicode-encoded. Anyway "length"
claims it is 30 characters.
regards, tom lane
Matthew Cooper
2003-09-15 17:27:01 UTC
Permalink
Doh! It looks like its time to eat humble pie. It turns out that the guy
here who has 7.3.4 and helped me to reproduce the problem did not follow our
own installation instructions (that he recently re-worded!) as follows:

"createdb -E UNICODE -U DB_USER -P DB_PASSWORD DB_NAME"

and did not set the encoding. I, like a good boy, did on my 7.2
installation. The guys I am trying to debug the problem for are in another
location and are using 7.3.4 too. Hence I narrowed it down to a version
problem. I am asking them to check the encoding on their database too and
will post back with huge apologies and thanks for your time when they
inevitably confirm that the encoding is SQL_ANSI.

Thanks, Matty.
----- Original Message -----
From: "Matthew Cooper" <***@cloverworxs.com>
To: "Tom Lane" <***@sss.pgh.pa.us>
Cc: <pgsql-***@postgresql.org>
Sent: Monday, September 15, 2003 9:50 AM
Subject: Re: [BUGS] 7.3.2 incorrectly counts characters for unicode varchar
field
Attached is the UTF-8 encoded sql file in case it got messed up in the
mail
transfer.
And here it is pasted in directly from the window that was displaying
chinese characters.
insert into mgc values ('分钟练习分钟练习练习');
Looking at the UTF-8 documentation, 10 chinese characters could be any
number of bytes, each character being say 2 or 3 characters.
Matty.
----- Original Message -----
Sent: Saturday, September 13, 2003 5:51 PM
Subject: Re: [BUGS] 7.3.2 incorrectly counts characters for unicode
varchar
field
Post by Tom Lane
insert into mgc values ('Ã¥Ë?â? éâ?TŸç»Æ'ä¹ Ã¥Ë?â?
éâ?TŸç»Æ'ä¹ ç»Æ'ä¹ ');
Post by Tom Lane
I don't think this string is correctly unicode-encoded. Anyway "length"
claims it is 30 characters.
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Loading...