How to create oleObject1.bin for Word (.docx)

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

How to create oleObject1.bin for Word (.docx)

nilez
This post was updated on .
Hi all,

I am new for poi.

I would like to insert objects in Microsoft word (.docx) using java.

In the editor, I can insert object (from several file types) by going to
menu : *Insert > Object...*
<http://apache-poi.1045710.n5.nabble.com/file/t340505/0.jpg

Following are structure of each object in *\word\embeddings\* when I have
tried to insert /file.docx, file.odt, and file.pdf/
*\word\embeddings\*
  - Microsoft_Word_Document1.docx
  - oleObject1.bin
  - oleObject2.bin

<http://apache-poi.1045710.n5.nabble.com/file/t340505/3.jpg

Structure of oleObject1.bin
*\word\embeddings\oleObject1.bin\*
  - [1]CompObj
  - [1]Ole
  - package_stream
  - [3]ObjInfo
  - properties_stream

<http://apache-poi.1045710.n5.nabble.com/file/t340505/4.jpg

Structure of oleObject2.bin
*\word\embeddings\oleObject2.bin\*
  - [3]EPRINT
  - [1]Ole
  - [3]ObjInfo
  - [1]CompObj
  - CONTENTS

<http://apache-poi.1045710.n5.nabble.com/file/t340505/5.jpg


I see the example to create oleObject1.bin in TestEmbed.addHtml() from  This
link
<https://stackoverflow.com/questions/40382369/embed-files-into-xssf-sheets-in-excel-using-apache-poi

And I have tried to create my code as below to create sample oleObject1.bin

import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.nio.file.Files;
import java.util.HashSet;
import java.util.Set;

import org.apache.poi.hpsf.ClassID;
import org.apache.poi.poifs.filesystem.Ole10Native;
import org.apache.poi.poifs.filesystem.Ole10NativeException;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;
import org.apache.poi.util.IOUtils;

public class TestCreateOLE {
	private static final Set<String> MICROSOFT_MEDIATYPE = new HashSet<>();

	static {
		MICROSOFT_MEDIATYPE.add("application/msword");
		MICROSOFT_MEDIATYPE.add("application/vnd.openxmlformats-officedocument.wordprocessingml.document");
	        ...
	}

	public void createObject(File inputFile, File outputFolder) {
		String mediaType = getFileType(inputFile);
		if (MICROSOFT_MEDIATYPE.contains(mediaType)) {
			// TODO copy file
		} else {
			// create oleObject
			createOLE(inputFile, outputFolder);
		}
	}

	private void createOLE(File inputFile, File outputFolder) {
		try (FileInputStream input = new FileInputStream(inputFile);
		      FileOutputStream output = new FileOutputStream(new File(outputFolder, "oleObject1.bin"));
		      POIFSFileSystem poifs = new POIFSFileSystem()) {

			String inputFileName = inputFile.getName();
			Ole10Native ole10 = new Ole10Native(inputFileName, inputFileName, inputFileName,
					IOUtils.toByteArray(input));

			ByteArrayOutputStream bos = new ByteArrayOutputStream(500);
			ole10.writeOut(bos);

			poifs.getRoot().createDocument(Ole10Native.OLE10_NATIVE, new ByteArrayInputStream(bos.toByteArray()));
			poifs.getRoot().setStorageClsid(ClassID.OLE10_PACKAGE);
			poifs.writeFilesystem(output);
		} catch (Exception e) {
			e.printStackTrace();
		}
	}

	public String getFileType(File file) {
		try {
			return Files.probeContentType(file.toPath());
		} catch (IOException e) {
			e.printStackTrace();
		}
		return null;
	}
}

When I try to create ole object from .odt file

	
public static void main(String[] args) throws IOException, Ole10NativeException {
        TestCreateOLE ole = new TestCreateOLE();
	File inputFile = new File("C:\\Users\\test\\source\\file.odt");
	File outputFolder = new File("C:\\Users\\test\\target");
	ole.createObject(inputFile, outputFolder);
}

The output is oleObject1.bin but its structure does not match to
oleObject1.bin that is created by editor
*\word\embeddings\oleObject1.bin\*
  - [1]Ole10Native

However the structure of o/leObject1.bin\[1]Ole10Native/ (from my code)
match to /oleObject1.bin\package_stream/ (from structure)

This is my misunderstand about oleObject and POI APIs.

I wonder that
1. How file (several type) is stored in Word? It seem the embedded object
from each file type has different format and structure.
- file.docx ---> embedded.docx
- file.odt, file.pdf ---> oleObject1.bin (stored as binary but contain
different structure)


2. How to decide the structure of oleObject1.bin from each file type?
For example: file.odt and file.pdf, they are stored as same format but
contain different sub-files and sub-directories. Where can i find the
specification to create ole object from each file type?
- oleObject1.bin from file.odt
  - [1]CompObj
  - [1]Ole
  - package_stream
  - [3]ObjInfo
  - properties_stream

- oleObject2.bin from file.pdf
  - [3]EPRINT
  - [1]Ole
  - [3]ObjInfo
  - [1]CompObj
  - CONTENTS

3. Can I create the complete oleObject1.bin (including its sub-files or
sub-directories) using POI library?
How to do that?
How to create [1]CompObj, [1]Ole, [3]ObjInfo, ... etc.?

Thanks in advance



-----
Best regards,
Nilez
--
Sent from: http://apache-poi.1045710.n5.nabble.com/POI-User-f2280730.html

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org

Best regards,
Nilez
Reply | Threaded
Open this post in threaded view
|

Re: How to create oleObject1.bin for Word (.docx)

kiwiwings
Hi Nilez,

coincidentally I've provided a similar topic (XSSF ole object inside XSLF)
yesterday evening [1].
It will take some time to try it myself, but based on my experience from [1]
I can answer you the following:

@1) embedded objects aren't always stored in the same way, as you've already
noticed.
The SO entry is about a packager object [2], which is represented as
Ole10Native - this can contain any filetype - think of it as an archive
inside a office document.
What you seem to want is embedding ole objects and showing their contents.
AFAIK every OLE container (PDF application, ...) can define their own
structured OLE document and therefore the oleObjectX.bins differ.
It is even not necessary that it is included as oleObject, but can be also a
OOXML zip - I guess it depends, how you inserted the objects (drag&drop /
insert object / ...) into the parent document ...

@2) ... trial and error

@3) it depends ... in [1] there's an example method in
EmbedXSSFinXSLF.fillOleData
which creates an Excel oleObject. So I use a POIFSFilesystem having only a
"Package" entry, which is XSSFWorkbook.
To find out which poifs entries you need, simply take a working embedded
document and strip its entries down - usually you can ignore "CompObj" which
is only a clipboard representation

In any case, you will always have the problem of providing the preview image
... apart of embedding PPT/X, POI doesn't offer you a preview mechanism.

Best wishes,
Andi


[1] https://bz.apache.org/bugzilla/show_bug.cgi?id=61797
[2] https://stackoverflow.com/questions/40382369



--
Sent from: http://apache-poi.1045710.n5.nabble.com/POI-User-f2280730.html

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]