[Bug 62904] New: Simple Excel Array Formula Fails in POI

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

[Bug 62904] New: Simple Excel Array Formula Fails in POI

Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=62904

            Bug ID: 62904
           Summary: Simple Excel Array Formula Fails in POI
           Product: POI
           Version: 4.0.0-FINAL
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: XSSF
          Assignee: [hidden email]
          Reporter: [hidden email]
  Target Milestone: ---

Created attachment 36259
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=36259&action=edit
Test Excel File

In our project we use POI to "calculate" an Excel workbook that is managed by a
business analyst.

In the last revision we had very strange results and I tracked it down to POI
not really supporting array formulas.
I made a test excel and project where it fails for this simple formula

    { =MIN(IF(A1:A6>=C1;A1:A6)) }

The strange thing is that it works sometimes, but fails for other values.
If this is not supported I would be OK with a hard failure as soon as the
formula is evaluated.
But here there are no exceptions at all, but the formula just gives wrong
results for some values.

See the example project including an automated test at
https://github.com/eekboom/poi-minif

----

I also attached the test excel file to this issue.

Here's a groovy spock test that shows the problem:

    import org.apache.poi.ss.usermodel.Cell
    import org.apache.poi.ss.usermodel.FormulaEvaluator
    import org.apache.poi.ss.usermodel.Row
    import org.apache.poi.ss.usermodel.Sheet
    import org.apache.poi.ss.usermodel.Workbook
    import org.apache.poi.ss.usermodel.WorkbookFactory
    import spock.lang.Specification
    import spock.lang.Unroll


    class PoiSpec extends Specification {

        @Unroll
        def 'test array formula: #value -> #expectedValue'() {
            given:
            Workbook workbook =
WorkbookFactory.create(Thread.currentThread().getContextClassLoader().getResourceAsStream("workbook.xlsx"))
            Sheet sheet = workbook.getSheet("Sheet1")
            FormulaEvaluator formulaEvaluator =
workbook.getCreationHelper().createFormulaEvaluator();
            Row inputRow = sheet.getRow(0)
            Cell inputCell = inputRow.getCell(2)

            Row outputRow = sheet.getRow(2)
            Cell outputCell = outputRow.getCell(2)

            when:
            inputCell.setCellValue(value)
            formulaEvaluator.notifyUpdateCell(inputCell)

            then:
            formulaEvaluator.evaluate(outputCell).getNumberValue() ==
expectedValue

            where:
            value || expectedValue
            3     || 5
            5     || 5
            6     || 10
            27    || 30

        }
    }

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 62904] Simple Excel Array Formula Fails in POI

Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=62904

--- Comment #1 from [hidden email] <[hidden email]> ---
Hello Steven,
thanks for the report.

I encountered the same problem yesterday.

Reasonably enough yet unfortunately, the IF construct is being treated as "lazy
if", and the first argument is evaluated as a boolean scalar, never an array,
and determines the following evaluation flow. If you are interested, I can post
a dirty and almost untested 'works for me' patch.

Right now I am overloaded with work but hopefully I'll address the issue
properly during the holidays.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 62904] Simple Excel Array Formula Fails in POI

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=62904

--- Comment #2 from Yegor Kozlov <[hidden email]> ---
Created attachment 36356
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=36356&action=edit
patch to fix evaluation of IF with array arguments

With the attached patch POI evaluates it right.

Summary of changes:
1. IF needs to implement ArrayFunction
2. moved RelationalOperationEval#evaluateArray into ArrayFunction. The logic to
transpose and evaluate matrix arguments is common for all array functions and
IFFunc is an example.
3. changed WorkbookEvaluator to evaluate and put the first argument on the
stack in array mode. As Gallon noted, the "lazy if" logic  evaluates the first
arg as a boolean scalar which further results in a wrong result.

Unit tests are coming soon

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 62904] Simple Excel Array Formula Fails in POI

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=62904

Yegor Kozlov <[hidden email]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #3 from Yegor Kozlov <[hidden email]> ---
I checked in the fix in r1850646. The fix *improves* support for evaluation of
array formulas, but full coverage is a long way ahead.  

What is included in the fix:

1. support for array arguments in the IF function.
In a array context IF is rarely used by its own, it is usually nested in
another function like {=MIN(IF(logical_test, values))} as in your example.
Tests with MIN and MAX all pass. For SUM I have at least one failing test,  but
it can be related to another issue.

2. Support for array arguments in logical IS*** functions and unary operators.
This makes it possible to evaluate formulas like {=SUM( --ISERROR(E6:E11)) }.
Previously IS*** and unary operators evaluated to a scalar and the result was
wrong.

Please open a new ticket or re-open this one if you find more issues with
evaluation of array formulas.

Regards,
Yegor

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 62904] Simple Excel Array Formula Fails in POI

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=62904

--- Comment #4 from [hidden email] <[hidden email]> ---
@Yegor could you please look into this issue
https://bz.apache.org/bugzilla/show_bug.cgi?id=63054 ?

The topic seems very close, and the divine vision of the code may still be
fresh in your cache.

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 62904] Simple Excel Array Formula Fails in POI

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=62904

--- Comment #5 from Yegor Kozlov <[hidden email]> ---
Thanks for the pointer. This commit fixed bug #63054 as well.
The exact change that fixed it was in
TwoOperandNumericOperation#evaluateArray()

We had duplicate pieces of code that evaluated array arguments:
1. ArrayEval.evaluate(srcRowIndex, srcColumnIndex, rag1, arg2)
2. RelationalOperationEval.evaluateArray(args, srcRowIndex, srcColumnIndex)

both iterate over the input arrays and evaluate (i,j)-th element from the first
range with (i,j)-th element from the second range.  
The logic in RelationalOperationEval was more mature and I refactored it into a
base class. ArrayEval is obsolete and can be deleted.

The test cases are in TwoOperandNumericFunctionTestCaseData.xls

Regards,
Yegor

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Bug 62904] Simple Excel Array Formula Fails in POI

Bugzilla from bugzilla@apache.org
In reply to this post by Bugzilla from bugzilla@apache.org
https://bz.apache.org/bugzilla/show_bug.cgi?id=62904

--- Comment #6 from Yegor Kozlov <[hidden email]> ---
I checked in a follow-up to my previous commit.
Now all tests pass including the commented ones, that failed in my previous
commit.

Excel has an optimization when evaluating the IF formula: depending on whether
the logical condition is true/false, the PTGs for the 2nd or 3rd argument can
be skipped altogether without evaluation. I had a hard time figuring out how it
works in array mode until I realized that the "optimized if" logic is not
applicable in array context: in array mode all arguments should be evaluated
and put onto the stack and the 'skip' instructions should be ignored. Once I
realized this and changed the code all failing tests started to pass.

Regards,
Yegor

--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]