UTF-8 Encoding

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

UTF-8 Encoding

John Brainard
I’m using JXLS to generate a report in Excel and am having a hard time with non-ASCII text, such as the following:

𝑦 = π‘šπ‘₯ + 𝑏, 𝐴π‘₯ + 𝐡𝑦 = 𝐢, and 𝑦 - 𝑦₁ = π‘š(π‘₯ - π‘₯₁)

The above is rendered to the sharedStrings.xml file as:

<sst count="1" uniqueCount="1" xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"><si><t>?? = ???? + ??, ???? + ???? = ??, and ?? - ??₁ = ??(?? - ??₁)</t></si></sst>

I believe I’ve narrowed it down to org.openxmlformats.schemas.spreadsheetml.x2006.main.CTRst. My testing shows that it’s storing the string correctly internally, but when writing to the sharedStrings.xml, the text isn’t being handled correctly. I’m not sure if this is something I’m doing wrong, or if this is a bug somewhere in POI or XmlBeans. I don’t believe the issue is in the JXLS library as I’ve isolated the issue to the code below:

        String text = "𝑦 = π‘šπ‘₯ + 𝑏, 𝐴π‘₯ + 𝐡𝑦 = 𝐢, and 𝑦 - 𝑦₁ = π‘š(π‘₯ - π‘₯₁)";
        SharedStringsTable table = new SharedStringsTable();
        CTRst st = CTRst.Factory.newInstance();
        st.setT(text);
        table.addEntry(st);

        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        table.writeTo(baos);
        String output = baos.toString("UTF-8");

        // This assertion passes
        Assert.assertEquals(st.getT(), text);

        // This assertion fails
        Assert.assertEquals(output, "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n" +
                        "<sst count=\"1\" uniqueCount=\"1\" xmlns=\"http://schemas.openxmlformats.org/spreadsheetml/2006/main\"><si><t>𝑦 = π‘šπ‘₯ + 𝑏, 𝐴π‘₯ + 𝐡𝑦 = 𝐢, and 𝑦 - 𝑦₁ = π‘š(π‘₯ - π‘₯₁)</t></si></sst>");


Here’s another snippet which reproduces the issue I’m having with creating a xlsx workbook:

        XSSFWorkbook workbook = new XSSFWorkbook();
        XSSFSheet sheet = workbook.createSheet();

        Row row = sheet.createRow(0);
        Cell cell = row.createCell(0);
        cell.setCellValue(TEXT);

        FileOutputStream outputStream = new FileOutputStream(FILE_NAME);
        workbook.write(outputStream);
        workbook.close();


I’m assuming it’s something I’m doing wrong, but have been unable to find a solution. I created a github repo with the above code in hopes that it aids in finding a solution.

https://github.com/JohnBrainard/poi-utf8-debugging

Thank you for your help!

John


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: UTF-8 Encoding

Dominik Stadler
Hi,

You might hit a known bug in XMLBeans, the library that POI currently uses
for serializing the XML data, see
https://bz.apache.org/bugzilla/show_bug.cgi?id=54084 and
https://bz.apache.org/bugzilla/show_bug.cgi?id=59268 for quite some
discussion on this issues. You may be able to use a beta-version of a newer
XMLBeans version from
https://github.com/pjfanning/xmlbeans/releases/tag/2.6.2, we would be
interested if this also resolves your problem.

Thanks... Dominik.

On Fri, Sep 8, 2017 at 5:50 PM, John Brainard <[hidden email]>
wrote:

> I’m using JXLS to generate a report in Excel and am having a hard time
> with non-ASCII text, such as the following:
>
> 𝑦 = π‘šπ‘₯ + 𝑏, 𝐴π‘₯ + 𝐡𝑦 = 𝐢, and 𝑦 - 𝑦₁ = π‘š(π‘₯ - π‘₯₁)
>
> The above is rendered to the sharedStrings.xml file as:
>
> <sst count="1" uniqueCount="1" xmlns="http://schemas.openxmlformats.org/
> spreadsheetml/2006/main"><si><t>?? = ???? + ??, ???? + ???? = ??, and ??
> - ??₁ = ??(?? - ??₁)</t></si></sst>
>
> I believe I’ve narrowed it down to org.openxmlformats.schemas.
> spreadsheetml.x2006.main.CTRst. My testing shows that it’s storing the
> string correctly internally, but when writing to the sharedStrings.xml, the
> text isn’t being handled correctly. I’m not sure if this is something I’m
> doing wrong, or if this is a bug somewhere in POI or XmlBeans. I don’t
> believe the issue is in the JXLS library as I’ve isolated the issue to the
> code below:
>
>         String text = "𝑦 = π‘šπ‘₯ + 𝑏, 𝐴π‘₯ + 𝐡𝑦 = 𝐢, and 𝑦 - 𝑦₁ =
> π‘š(π‘₯ - π‘₯₁)";
>         SharedStringsTable table = new SharedStringsTable();
>         CTRst st = CTRst.Factory.newInstance();
>         st.setT(text);
>         table.addEntry(st);
>
>         ByteArrayOutputStream baos = new ByteArrayOutputStream();
>         table.writeTo(baos);
>         String output = baos.toString("UTF-8");
>
>         // This assertion passes
>         Assert.assertEquals(st.getT(), text);
>
>         // This assertion fails
>         Assert.assertEquals(output, "<?xml version=\"1.0\"
> encoding=\"UTF-8\"?>\n" +
>                         "<sst count=\"1\" uniqueCount=\"1\" xmlns=\"
> http://schemas.openxmlformats.org/spreadsheetml/2006/main\"><si><t>𝑦 =
> π‘šπ‘₯ + 𝑏, 𝐴π‘₯ + 𝐡𝑦 = 𝐢, and 𝑦 - 𝑦₁ = π‘š(π‘₯ - π‘₯₁)</t></si></sst>");
>
>
> Here’s another snippet which reproduces the issue I’m having with creating
> a xlsx workbook:
>
>         XSSFWorkbook workbook = new XSSFWorkbook();
>         XSSFSheet sheet = workbook.createSheet();
>
>         Row row = sheet.createRow(0);
>         Cell cell = row.createCell(0);
>         cell.setCellValue(TEXT);
>
>         FileOutputStream outputStream = new FileOutputStream(FILE_NAME);
>         workbook.write(outputStream);
>         workbook.close();
>
>
> I’m assuming it’s something I’m doing wrong, but have been unable to find
> a solution. I created a github repo with the above code in hopes that it
> aids in finding a solution.
>
> https://github.com/JohnBrainard/poi-utf8-debugging
>
> Thank you for your help!
>
> John
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
Reply | Threaded
Open this post in threaded view
|

Re: UTF-8 Encoding

John Brainard
Thank you Dominik.

Using 'com.github.pjfanning:xmlbeans:2.6.2' fixes the issue.


On 9/8/17, 11:17 AM, "Dominik Stadler" <[hidden email]> wrote:

    Hi,
   
    You might hit a known bug in XMLBeans, the library that POI currently uses
    for serializing the XML data, see
    https://bz.apache.org/bugzilla/show_bug.cgi?id=54084 and
    https://bz.apache.org/bugzilla/show_bug.cgi?id=59268 for quite some
    discussion on this issues. You may be able to use a beta-version of a newer
    XMLBeans version from
    https://github.com/pjfanning/xmlbeans/releases/tag/2.6.2, we would be
    interested if this also resolves your problem.
   
    Thanks... Dominik.
   
    On Fri, Sep 8, 2017 at 5:50 PM, John Brainard <[hidden email]>
    wrote:
   
    > I’m using JXLS to generate a report in Excel and am having a hard time
    > with non-ASCII text, such as the following:
    >
    > 𝑦 = π‘šπ‘₯ + 𝑏, 𝐴π‘₯ + 𝐡𝑦 = 𝐢, and 𝑦 - 𝑦₁ = π‘š(π‘₯ - π‘₯₁)
    >
    > The above is rendered to the sharedStrings.xml file as:
    >
    > <sst count="1" uniqueCount="1" xmlns="http://schemas.openxmlformats.org/
    > spreadsheetml/2006/main"><si><t>?? = ???? + ??, ???? + ???? = ??, and ??
    > - ??₁ = ??(?? - ??₁)</t></si></sst>
    >
    > I believe I’ve narrowed it down to org.openxmlformats.schemas.
    > spreadsheetml.x2006.main.CTRst. My testing shows that it’s storing the
    > string correctly internally, but when writing to the sharedStrings.xml, the
    > text isn’t being handled correctly. I’m not sure if this is something I’m
    > doing wrong, or if this is a bug somewhere in POI or XmlBeans. I don’t
    > believe the issue is in the JXLS library as I’ve isolated the issue to the
    > code below:
    >
    >         String text = "𝑦 = π‘šπ‘₯ + 𝑏, 𝐴π‘₯ + 𝐡𝑦 = 𝐢, and 𝑦 - 𝑦₁ =
    > π‘š(π‘₯ - π‘₯₁)";
    >         SharedStringsTable table = new SharedStringsTable();
    >         CTRst st = CTRst.Factory.newInstance();
    >         st.setT(text);
    >         table.addEntry(st);
    >
    >         ByteArrayOutputStream baos = new ByteArrayOutputStream();
    >         table.writeTo(baos);
    >         String output = baos.toString("UTF-8");
    >
    >         // This assertion passes
    >         Assert.assertEquals(st.getT(), text);
    >
    >         // This assertion fails
    >         Assert.assertEquals(output, "<?xml version=\"1.0\"
    > encoding=\"UTF-8\"?>\n" +
    >                         "<sst count=\"1\" uniqueCount=\"1\" xmlns=\"
    > http://schemas.openxmlformats.org/spreadsheetml/2006/main\"><si><t>𝑦 =
    > π‘šπ‘₯ + 𝑏, 𝐴π‘₯ + 𝐡𝑦 = 𝐢, and 𝑦 - 𝑦₁ = π‘š(π‘₯ - π‘₯₁)</t></si></sst>");
    >
    >
    > Here’s another snippet which reproduces the issue I’m having with creating
    > a xlsx workbook:
    >
    >         XSSFWorkbook workbook = new XSSFWorkbook();
    >         XSSFSheet sheet = workbook.createSheet();
    >
    >         Row row = sheet.createRow(0);
    >         Cell cell = row.createCell(0);
    >         cell.setCellValue(TEXT);
    >
    >         FileOutputStream outputStream = new FileOutputStream(FILE_NAME);
    >         workbook.write(outputStream);
    >         workbook.close();
    >
    >
    > I’m assuming it’s something I’m doing wrong, but have been unable to find
    > a solution. I created a github repo with the above code in hopes that it
    > aids in finding a solution.
    >
    > https://github.com/JohnBrainard/poi-utf8-debugging
    >
    > Thank you for your help!
    >
    > John
    >
    >
    > ---------------------------------------------------------------------
    > To unsubscribe, e-mail: [hidden email]
    > For additional commands, e-mail: [hidden email]
    >
   


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]