[Bug 57008] Wrting _x0427_ to a string cell changes the string to some strange UTF-8 character

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[Bug 57008] Wrting _x0427_ to a string cell changes the string to some strange UTF-8 character

Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=57008

--- Comment #13 from Matthias Gerth <[hidden email]> ---
I've written an escape function as a workaround.
https://stackoverflow.com/questions/48222502/xssfcell-in-apache-poi-encodes-certain-character-sequences-as-unicode-character

So my use case is this: I need to store a string containing "_x24B8_" into an
excel file. This is user input and I can not prevent this. The setValue
function on XSSFCell has one parameter of type java.lang.String. Java string
does not use microsoft encoding to represent unicode character.

So this happens
1. String value = "_x24B8_";
2. String valueEscaped = escape(value); // "_x005F_x24B8_"
3. cell.setValue(valueEscaped) // cell.value is now "_x24B8_"
4. once the file is written is changes back to "_x005F_x24B8_" in the file

I think setValue should not call XSSFRichTextString.utfDecode(). This would
prevent this back and forth encoding.
We could also make XSSFRichTextString.utfDecode() a public for people who are
using this type of encoding. I would prefer this microsoft encoding terminated
within the library since it is specific to the office file format.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]