UTF-8 to UCS-2 data storage issue for Simplified Chinese

General discussion on Zend Core for IBM System i

UTF-8 to UCS-2 data storage issue for Simplified Chinese

Postby hirabhullar on Thu Jul 19, 2012 12:32 am

Hi All

I am facing this weird issue. My web page accepts input and displays correct Simplified Chinese characters. But data stored in ISeries database is showing junk characters. Here is little detail about my issue.

Front end is working correct:
- Web page allows entering and displaying Simplified Chinese data.
- Web Page is encoded in UTF-8. HTML Header info:
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="zh" lang="zh">
<head> <meta http-equiv="content-type" content="text/html; charset=UTF-8" />
<meta name="language" content="zh" />
- Web Page also displays correct hex code for Simplified Chinese characters on the page:
Store Name = 我的测试测试测试
Hex(Store Name) = E68891E79A84E6B58BE8AF95E6B58BE8AF95E6B58BE8AF95


Problem is display and publish raw data from database:
- ISeries data field is defined as UCS-2 (CCSID 13488)
STORE_NAME FOR T2STORNM VARGRAPHIC (512) ALLOCATE (60)
CCSID 13488 NOT NULL DEFAULT
- When PHP saves UTF-8 data into UCS-2 field then data looks corrupted (我的测试测试测试). The hex value stored like this : 00E60088009100E7009A008400E600B5008B00E800AF009500E600B5008B00E800AF009500E600B5008B00E800AF0095
- When I inserted same data using ISeries navigator (i-nav) it displays correctly in query (我的测试测试测试). The hex values for that data is 621176846D4B8BD56D4B8BD56D4B8BD5.
- Just to prove that UCS-2 field is to an issue. I have created a similar table with two fields (UTF-8 & UCS-2)
STORE_NAME FOR T2STORNM VARCHAR (512) ALLOCATE (60)
CCSID 1208 NOT NULL DEFAULT,
STORE_NAME_16 FOR T2MGRNM1 VARGRAPHIC (512) ALLOCATE (60)
CCSID 13488 NOT NULL DEFAULT
- When I inserted data using i-nav then data display correct in both fields. Also Hex values in UTF-8 field are
matching with hex values from PHP page.
UTF-8 from i-nav
我的测试测试测试 = Hex E68891E79A84E6B58BE8AF95E6B58BE8AF95E6B58BE8AF95
UCS-2 from i-nav
我的测试测试测试 = Hex 621176846D4B8BD56D4B8BD56D4B8BD5
UTF-8 from PHP Application
我的测试测试测试 = Hex C3A6C288C291C3A7C29AC284C3A6C2B5C28BC3A8C2AFC295C3A6C2B5C28BC3A8C2AFC295C3A6C2B5C28BC3A8C2AFC295
UCS-2 from i-nav
我的测试测试测试= Hex 00E60088009100E7009A008400E600B5008B00E800AF009500E600B5008B00E800AF009500E600B5008B00E800AF0095
- Also when I copy data from UCS-2 field to UTF-8 then system automatically convert the data into UTF-8 (Hex values are also correct) and vice versa from UTF-8 to UCS-2 is working fine.
- I found these documents online http://graphemica.com/%E6%88%91 or http://www.ansell-uebersetzungen.com/gbuni.html . As per this info i-nav is inserting correct hex codes for Simplified Chinese
GB Code Unicode UTF-8 Simplified Chinese Character
CED2 6211 E6 88 91 我

As per my understanding system should automatically convert from UTF-8 to UCS-2 but that is not happening. UTF-8 data is directly saved into UCS-2 field without conversion. For example E6 88 91 should be saved as 6211 but saved as 00E6 0088 0091. Any info would be a great help!!
hirabhullar
 
Posts: 1
Joined: Wed Jul 18, 2012 4:53 pm

Return to Zend Core for i5/OS

Who is online

Users browsing this forum: No registered users and 1 guest