[Bug 57008] Wrting _x0427_ to a string cell changes the string to some strange UTF-8 character

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

[Bug 57008] Wrting _x0427_ to a string cell changes the string to some strange UTF-8 character

Bugzilla from bugzilla@apache.org

--- Comment #18 from Greg Woolsey <[hidden email]> ---
I always go back to the standards doc when I get going around in circles.
Here's what it says about escaped strings: bstr (Basic String)
This element defines a binary basic string variant type, which can store any
valid Unicode character. Unicode characters that cannot be directly represented
in XML as defined by the XML 1.0 specification, shall be escaped using the
Unicode numerical character representation escape character format _xHHHH_,
where H represents a hexadecimal character in the character's value. [Example:
The Unicode character 8 is not permitted in an XML 1.0 document, so it shall be
escaped as _x0008_. end example] To store the literal form of an escape
sequence, the initial underscore shall itself be escaped (i.e. stored as
_x005F_). [Example: The string literal _x0008_ would be stored as
_x005F_x0008_. end example]

The possible values for this element are defined by the W3C XML Schema string

I think POI should assume it needs to escape Unicode when setting CT* class
value strings, and unescape when reading them.  I don't think POI should be
attempting to unescape them at any other time than when reading a string value
from a CT* class.

You are receiving this mail because:
You are the assignee for the bug.
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]