[Bug 64108] New: unsafe pipe character ("|") in Relationship target attribute is not being encoded into a '%7C'

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[Bug 64108] New: unsafe pipe character ("|") in Relationship target attribute is not being encoded into a '%7C'

Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=64108

            Bug ID: 64108
           Summary: unsafe pipe character ("|") in Relationship target
                    attribute is not being encoded into a '%7C'
           Product: POI
           Version: 4.1.1-FINAL
          Hardware: All
                OS: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: OPC
          Assignee: [hidden email]
          Reporter: [hidden email]
  Target Milestone: ---

Created attachment 36989
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=36989&action=edit
This is a possible patch against REL_4_1_1 to encode unsafe pipe characters and
a change to unit tests to show that it works.

The pipe character ("|") in a Relationship target attribute is not being
encoded into a "%7C" string. The pipe character is not one of the characters
considered "valid" by this RFC:

https://tools.ietf.org/html/rfc3986#section-2

In the 4.1.1 version, a comparison is made to see if "|" (0x7C) is whitespace,
or is greater than or equal to 0x80. Since that evaluates "false" for a "|"
character, it is deemed "safe" (in PackagingURIHelper) instead of "unsafe".
Currently determining it as "safe" prevents it from being converted to "%7C",
which would be the correct thing to do.

The original way we found this was because we use the Tika AutoDetectParser to
extract text from various document types (MS-Word docx, or xslx). Tika uses POI
OPC to parse those documents in Open Office format, and although it logs the
original POI exception, it doesn't fail right away. Instead, it internally
replaces the original Relationship element target with the URL
"http://invalid.uri" and fails with an exception later in the processing
because that fake url is "absolute". As a result, there's no way to report to
the user the actual Relationship in the document that contained the target with
the "|".

For what it's worth, this is the original Relationship element that caused
parsing to fail:

        <Relationship Id="rId13"
Target="#ctl||rId16||cmdAddAction||_x0000_i1029"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/hyperlink"/>

Scanning the "Target" attribute failed at index 4 in that attribute's value,
due to the "|" being in that place in the fragment after the "#".

The only way I found the original exception from POI OPC code was to run our
code with a debugger attached to it.

Granted, the failure to include the root cause of the exception is an issue
with Tika, and probably should be addressed by that group, but the POI OPC code
should be considering the pipe character "unsafe" since it is not one of the
characters that is specified as valid in RPC3986.

Running the unit test changes in the attached patch (it is against the
REL_4_1_1 tag), and removing the change to the code, I got this exception:

java.net.URISyntaxException: Illegal character in fragment at index 4:
#ctl||rId16||cmdAddAction||_x0000_i1029
        at java.base/java.net.URI$Parser.fail(URI.java:2915)
        at java.base/java.net.URI$Parser.checkChars(URI.java:3086)
        at java.base/java.net.URI$Parser.parse(URI.java:3130)
        at java.base/java.net.URI.<init>(URI.java:600)
        at
org.apache.poi.openxml4j.opc.PackagingURIHelper.toURI(PackagingURIHelper.java:724)
        at
org.apache.poi.openxml4j.opc.TestPackagingURIHelper.testCreateURIFromString(TestPackagingURIHelper.java:128)
        at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at junit.framework.TestCase.runTest(TestCase.java:176)
        at junit.framework.TestCase.runBare(TestCase.java:141)
        at junit.framework.TestResult$1.protect(TestResult.java:122)
        at junit.framework.TestResult.runProtected(TestResult.java:142)
        at junit.framework.TestResult.run(TestResult.java:125)
        at junit.framework.TestCase.run(TestCase.java:129)
        at junit.framework.TestSuite.runTest(TestSuite.java:252)
        at junit.framework.TestSuite.run(TestSuite.java:247)
        at
org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:86)
        at
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:106)
        at
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
        at
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
        at
org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:66)
        at
org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
        at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
        at
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
        at
org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
        at
org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
        at com.sun.proxy.$Proxy5.processTestClass(Unknown Source)
        at
org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:117)
        at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
        at
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
        at
org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:155)
        at
org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:137)
        at
org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:404)
        at
org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:63)
        at
org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:46)
        at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at
org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:55)
        at java.base/java.lang.Thread.run(Thread.java:834)

With the code changes to the class, the altered test passes.

The attached patch changes the method in PackagingURIHelper from this:

    private static boolean isUnsafe(int ch) {
        return ch >= 0x80 || Character.isWhitespace(ch);
    }

To this:

    private static boolean isUnsafe(int ch) {
        return ch >= 0x80 || ch == 0x7C || Character.isWhitespace(ch);
    }

Then, the pipe characters in the URI will be deemed "unsafe" and subsequently
converted properly to their hexadecimal '%7C' encodings.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 64108] unsafe pipe character ("|") in Relationship target attribute is not being encoded into a '%7C'

Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=64108

PJ Fanning <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #1 from PJ Fanning <[hidden email]> ---
applied with https://svn.apache.org/repos/asf/poi/trunk@1873384 - thanks for
the patch

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 64108] unsafe pipe character ("|") in Relationship target attribute is not being encoded into a '%7C'

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=64108

--- Comment #2 from Richard Costine <[hidden email]> ---
I'm not sure if adding this could potentially break anybody who is already
expecting the "|" there to actually cause a failure.

I suppose that we could have the code look at a System property - something
like:

-Dorg.apache.poi.openxml4.opc.safePipeInURI=true|false

when true it would work like before, and when false it would encode into a
"%7C".

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 64108] unsafe pipe character ("|") in Relationship target attribute is not being encoded into a '%7C'

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=64108

--- Comment #3 from Richard Costine <[hidden email]> ---
Something like this code would do it:


    private static boolean isUnsafe(int ch) {
        boolean safePipeInUri = true; // assume we will fail like before.
        try {
            // set this System property to false to make it not fail
            safePipeInUri =
Boolean.parseBoolean(System.getProperty("org.apache.poi.openxml4.opc.safePipeInURI",
"false"));
        }
        catch (Throwable t) { } // defaults to true if the property is not
readable

        // safe pipe in URI, means that a "|" will fail like before
        return safePipeInUri ? (ch >= 0x80 || Character.isWhitespace(ch))
                : (ch >= 0x80 || ch == 0x7C || Character.isWhitespace(ch));
    }

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 64108] unsafe pipe character ("|") in Relationship target attribute is not being encoded into a '%7C'

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=64108

--- Comment #4 from PJ Fanning <[hidden email]> ---
We don't typically support system properties.

You've caught us in the middle of a release. If your patch is not the best
behaviour for this code and only suits your use case, then we need to remove
it. The RFC seems to suggest that we should the existing code had a bug but if
this not the case, then I will revert the code.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 64108] unsafe pipe character ("|") in Relationship target attribute is not being encoded into a '%7C'

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=64108

--- Comment #5 from Richard Costine <[hidden email]> ---
I think that the original patch I provided should be the correct behavior,
since the RFC indicates that the pipe character is not considered "valid" in a
url, and would normally be encoded with a "%7C".

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]