Download creating word document in office open xml format using java.

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
CREATING WORD DOCUMENT IN OFFICE OPEN XML
FORMAT USING JAVA.
AUTHOR
SANJAY KUMAR MADHVA/ KULKARNI D.V./ SRINIDHI H. S/PUJARI Y.
SONATA SOFTWARE LIMITED
RV ROAD BANGALORE INDIA.
INTRODUCTION TO OFFICE OPEN XML ............................ 3
SCOPE OF THE ARTICLE ......................................................... 3
WORDPROCESSINGML ............................................................................................................................... 3
PACKAGE .................................................................................................................................................... 4
PARTS ......................................................................................................................................................... 4
ITEM ........................................................................................................................................................... 5
CONTENT TYPE ........................................................................................................................................... 5
CONTENT-TYPE ITEM .................................................................................................................................. 5
RELATIONSHIP ............................................................................................................................................ 5
PACKAGE RELATIONSHIP ............................................................................................................................ 6
PACKAGE-RELATIONSHIP ITEM ................................................................................................................... 6
RELATIONSHIP MARKUP ............................................................................................................................. 6
JAVA WORDPROCESSINGML IMPLEMENTATION ......... 6
JAVA WORDPROCESSINGML FOLDER CREATION ...................................................................................... 6
JAVA WORDPROCESSINGML FILE CREATION............................................................................................ 7
Create [Content_Types].xml ................................................................................................................. 7
Create or copy image1.jpg .................................................................................................................... 7
Create .rels............................................................................................................................................ 7
Create document.xml.rels ..................................................................................................................... 7
Create document.xml ............................................................................................................................ 8
JAVA PACKAGING CLASS IMPLEMENTATION ............................................................................................ 9
Importing Classes ................................................................................................................................. 9
Create a OpenXMLZipFile Classes ...................................................................................................... 9
Create a CreateZipFile Method ............................................................................................................ 9
Create a UnZipFile Method .................................................................................................................10
Creating WordprocesingML package ..................................................................................................14
Introduction to Office Open XML
The introduction of the Open XML file formats standard from Ecma provides
developers with the option of creating/editing an Open XML document using any
development tool on any platform as long at they are conforming to the standardized file
format specified. The use of open document formats, such as WordprocessingML
improves interoperability by enabling standard-based XML 1.0 tools to create, read and
write files conforming to the standardized file format. The Office Open XML formats can
be used by a wide set of tools and platforms in order to foster interoperability across
office productivity application and with line-of-business systems.
This article is based the Office Open XML standard being developed by Ecma the TC45
technical committee, the family of XML schemas collectively called Open XML. This
standard defines the XML vocabularies consumed and produced by applications such as
the “Office 2007” version of the Microsoft Office products Microsoft Word, Microsoft
Excel, and Microsoft PowerPoint. The standard describes the packaging of documents
that conform to these schemas.
Scope of the article
Article describes the packaging mechanism and minimum required files for creating an
Office Open XML Word document (referred to as WordprocessingML) using JAVA. This
document, although created with no Microsoft APIs or software, can be consumed or
viewed by Word 2007. (It may also be consumed by Word 2000, Word XP, or Word
2003, using the free add-in for Open XML support that will be released by Microsoft
when Office 2007 is released.)
Assumption:
All the required files such as XML and images are created manually under the directory
for packaging.
Understanding Office open XML
In order to create a WordprocessingML, let us understand how the document is
structured in the Open XML packaging specification. The following sections cover some
of those parts.
WordProcessingML
A WordprocessingML document (Office Open XML document) is represented as a series
of related parts that are stored in a container called a package. Information about the
relationships between a package and its parts is stored in the package’s packagerelationship items. Information about the relationships between two parts is stored in the
part-relationship item for the source parts. A package is an ordinary Zip archive whose
items correspond directly to those related parts.
Package
Package – A Zip archive that contains all the relationship items and parts of the Office
Open XML documents, such that those parts are reachable via a set of relationships
defines in the relationship items.
Package acts as a container for a collection of components, which are composed,
processed, and persisted according to a set of rules. These are two kinds of
components: parts and relationship items. A package is implemented as a ZIP archive,
with each component in a package corresponding to an item in the archive. A Zip
archive is a ZIP file as defined in the ZIP file format specification, but excluding all
elements of that specification related to the encryption or decryption. A package
provides a convenient way to distribute a document with all of its components pieces,
such as images fonts and data.
The purpose of a package is to combine all of the pieces of document into a single file. A
package holding a WordprocessingML document with a picture might contain a number
of parts; an XML markup part representing the document, a part containing page header
information, a part containing footnotes, and a part representing the picture in jpeg form.
Note: XML that is valid according to Office Open XML’s schemas.
Note: All XML content of the components defined in this Standard must be encoded
using either UTF-8 or UTF-16.
Parts
Part – A package component that has associated common properties. A part
corresponds to an item in a package.
A WordprocessingML document contains a part for the body of the text; it might also
contain a part for an image referenced by that text, parts that defining documents
characteristics, styles and fonts.
Parts can have relationship to each other, as well as to the package itself. These
relationships are defined using XML in one or more relationship items. Each part has a
content type and is unambiguously addressed using well defined naming guidelines.
Content-type information is recorded in the content-type item.
Each part has part names. Part names refer to parts within a package, typically as part
of a URI reference. Like file names in a file system and URIs, part names are
hierarchical. Part name consist of segments, each representing a level in the hierarchy.
For example, the part name “/hello/world/document.xml” contains three segments
“hello”, “world”, and “document.xml”. Segments form a tree structure. This is similar to
the file systems, where all of the non-leaf nodes in the tree are folders and the leaf
nodes are files, which contain actual content. The folder (that is non-leaf) is the tree
serve a similar function: they organize the parts of the package.
EG:
<Override PartName="/hello/world/document.xml "
ContentType="application/vnd.ms-word.main+xml" />
Item
Item is the context of a package “item” is a synonym for ”Zip item”
Content type
Content type is the description of the type of content stored in a part. A content type
defines a media type, a subtype, and an optional set of parameters. The file which is a
must and will be named [Content_Types].xml
Content-type item
Content type item an XML representation of mappings from part names to content types,
stored as an item in a package. A content-type is not itself a part. This is the must and
will look as below
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<Types xmlns="http://schemas.microsoft.com/package/2005/06/content-types">
<Default Extension="xml" ContentType="application/xml" />
<Default Extension="rels" ContentType="application/vnd.ms-package.relationships+xml" />
<Default Extension="jpg" ContentType="image/jpeg" />
<Override PartName="/document.xml" ContentType="application/vnd.ms-word.main+xml" />
</Types>
Relationship
Parts often contain reference to other parts in a package and to resources outside of the
package. However, in general, these references are represented inside the references
inside the referring part in way that are specific to the content type of the part; that is, in
arbitrary markup of an application-specific encoding. This effectively hides the internal
and external linkages between parts from consumers that do not understand the content
type of the parts containing such references.
The package user relationship as a higher-level mechanism to describe references from
parts to other internal of external resources. A relationship represents the kind of
connection between a source and a target resource. If the source is a part, the
relationship is referred to as a part relationship. If the source is the package itself, the
relationship is referred to as a package relationship. Relationship makes the connections
directly discoverable without looking at the content in the parts, so they are independent
of content-specific schema and faster to resolve the location of others parts.
Package relationship
A relationship whose target is a part and whose source is the package as a whole.
Package-relationship item
An XML representation of one or more package relation ship. Stored as an item in a
package. A package relationship item is not itself a part.
Relationship markup
Relationship is represented using one or more Relationship elements nested in a single
Relationship element. These elements are defiled in the relationship namespace.
Every relationship element must have an Id attribute, the value of which must be unique
with in the relationship item. The Id type is xsd:ID and must conform to the naming
restriction for that type.
This concludes the commonly used terms in creating an Office Open XML
document. What follows next is how to create an Office Open XML Word
document referred to as WordprocessingML using JAVA,
JAVA WordprocessingML implementation
We in this article are trying to create a WordprocessingML document, which contains
body text as “This document was created using JAVA….” and image being
embedded. This can be achieved by creating hierarchy of folders and files, which will be
packaged together as mentioned in the steps below.
JAVA WordprocessingML Folder creation

Create a directory for example “c:\WordprocessingML\”, which will contain all the
files required for packaging such as [Content_Types].xml, image1.jpg,
document.xml etc

Under “c:\WordprocessingML” create a folder with the name of “_rels” as shown
below.
JAVA WordprocessingML File creation
Create [Content_Types].xml
In the directory “c:\WordprocessingML\” create a XML file “[Content_Types].xml”, which
will contain the WordprocessingML content type.
The file content is displayed below.
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<Types xmlns="http://schemas.microsoft.com/package/2005/06/content-types">
<Default Extension="xml" ContentType="application/xml" />
<Default Extension="rels" ContentType="application/vnd.ms-package.relationships+xml" />
<Default Extension="jpg" ContentType="image/jpeg" />
<Override PartName="/document.xml" ContentType="application/vnd.ms-word.main+xml" />
</Types>
In Override tag attribute PartName is the xml representation of the word document and
the ContentType indicates that it is the main document in xml format.
<Override PartName="/document.xml" ContentType="application/vnd.ms-word.main+xml" />
Images used in the document are referred as shown below, where Extension attribute
describes the file <type> and the ContentType attributes contains “image/<type>”.
<Default Extension="jpg" ContentType="image/jpeg" />
Create or copy image1.jpg
Copy or create an “image1.jpg” of type jpg file format under
the“c:\WordprocessingML”, which needs to be embedded in the document.
Create .rels
Create XML file under “c:\WordprocessingML\_rels\” with below content.
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<Relationships xmlns="http://schemas.microsoft.com/package/2005/06/relationships">
<Relationship Id="rId1"
Type="http://schemas.microsoft.com/office/2006/relationships/officeDocument"
Target="document.xml" />
</Relationships>
Create document.xml.rels
Create XML file under “c:\WordprocessingML\_rels\” with below content.
<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<Relationships xmlns="http://schemas.microsoft.com/package/2005/06/relationships">
<Relationship Id="rId1"
Type="http://schemas.microsoft.com/office/2006/relationships/image"
Target="image1.jpg" />
</Relationships>
Create document.xml
Create XML file under “c:\WordprocessingML\” with below content. This XML contains
the text of the document as wells as formatting such as paragraph, row etc… For
example in below example tag <w:p> represents paragraph for the text. In below case it
will also contain the <w:pict> tag for the image to be embedded in the document.
<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<w:wordDocument xmlns:w="http://schemas.microsoft.com/office/word/2005/10/wordml"
xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:r="http://schemas.microsoft.com/office/2005/11/relationships">
<w:body>
<w:p>
<w:r>
<w:t>This document was created using JAVA….</w:t>
</w:r>
</w:p>
<w:p>
<w:r>
<w:pict>
<v:shape>
<v:imagedata r:id="rId1" />
</v:shape>
</w:pict>
</w:r>
</w:p>
</w:body>
</w:wordDocument>
JAVA Packaging Class Implementation
Importing Classes
Following classes needs to be imported for creating packaging class. The imported
classes are built in classes of JAVA..
import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.Enumeration;
import java.util.zip.ZipEntry;
import java.util.zip.ZipFile;
Create a OpenXMLZipFile Classes
OpenXMLZipFile class will contain all the required methods, which will to help in
packaging..
public class OpenXMLZipFile
{
// CreateZipFile method which will take the zipFileName and ToCompressFiles as
arguments
// and will go through the array of ToCompressFiles and pack it into zipFileName
// Below sections explains the method implementation.
….
}
Create a CreateZipFile Method
-
create a ZipOutputStream
set the level as Deflater.BEST_COMPRESSION
loop through the list of files to be Zipped and for each file do the following
o Get the file Name and add it to the ZipEntry.
o set the ZipEntry to the ZipOutputStream
o write the contents of the file to the ZipOutputStream
public static void CreateZipFile(String zipFileName, String[] ToCompressFiles)
{
try
{
String[] fileNames = ToCompressFiles;
//fileNames[0] = "C:\\noname.xml";
//fileNames[1] = "C:\\sql_reference.pdf";
FileInputStream inStream;
// "C:\\ZipExample1.zip"
FileOutputStream outStream = new FileOutputStream(zipFileName);
ZipOutputStream zipOStream = new ZipOutputStream(outStream);
zipOStream.setLevel ( Deflater.BEST_COMPRESSION );
for (int loop=0;loop < fileNames.length; loop++)
{
inStream = new FileInputStream(fileNames[loop]);
zipOStream.putNextEntry(new ZipEntry(fileNames[loop]));
int i=0;
while ((i=inStream.read())!=-1)
{
zipOStream.write(i);
}
zipOStream.closeEntry();
inStream.close();
}
zipOStream.flush();
zipOStream.close();
}
catch (IllegalArgumentException iae) {
iae.printStackTrace();
}
catch(FileNotFoundException fe)
{
System.out.println("File not found===="+fe);
}
catch (IOException ioe)
{
System.out.println("IOException===="+ioe);
ioe.printStackTrace();
}
}
Create a UnZipFile Method
-
Read the zip file
Loop through the entries in the zip file and for each entry do the following
o Create a File Object. The file name is derived from the ZipEntry.
o Create an OutputStream using the File Object.
o Read the contents ZipEntry into a InputStream
o Write the contents of the InputStream into the OutputStream
public static void UnZipFile(String zipFileName, String ToExtractFile)
{
String inputFileName = zipFileName; // "C:\\PPT.zip";
String desFileName = ToExtractFile; // "C:\\TEST\\";
try
{
File sourceZipFile = new File(inputFileName);
File destDirectory = new File(desFileName);
//Open the ZIP file for reading
ZipFile zipFile = new ZipFile(sourceZipFile,ZipFile.OPEN_READ);
//Get the entries
Enumeration enum = zipFile.entries();
while(enum.hasMoreElements())
{
ZipEntry zipEntry = (ZipEntry)enum.nextElement();
String currName = zipEntry.getName();
File destFile = new File(destDirectory,currName);
// grab file's parent directory structure
File destinationParent = destFile.getParentFile();
// create the parent directory structure if needed
destinationParent.mkdirs();
if(!zipEntry.isDirectory())
{
BufferedInputStream is = new
BufferedInputStream(zipFile.getInputStream(zipEntry));
int currentByte;
// write the current file to disk
FileOutputStream fos = new FileOutputStream(destFile);
BufferedOutputStream
dest
=
new
BufferedOutputStream(fos);
// read and write until last byte is encountered
while ((currentByte = is.read()) != -1)
{
dest.write(currentByte);
}
dest.flush();
dest.close();
is.close();
}
}
}
catch(IOException ioe)
{
System.out.println("IOException occured====="+ioe);
ioe.printStackTrace();
}
}
The OpenXMLZipFile class. class code looks as below after implementation
import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.Enumeration;
import java.util.zip.ZipEntry;
import java.util.zip.ZipFile;
public class OpenXMLZipFile
{
// CreateZipFile method which will take the zipFileName and ToCompressFiles as
arguments
// and will go through the array of ToCompressFiles and pack it into zipFileName
public static void CreateZipFile(String zipFileName, String[] ToCompressFiles)
{
try
{
String[] fileNames = ToCompressFiles;
//fileNames[0] = "C:\\noname.xml";
//fileNames[1] = "C:\\sql_reference.pdf";
FileInputStream inStream;
// "C:\\ZipExample1.zip"
FileOutputStream outStream = new FileOutputStream(zipFileName);
ZipOutputStream zipOStream = new ZipOutputStream(outStream);
zipOStream.setLevel ( Deflater.BEST_COMPRESSION );
for (int loop=0;loop < fileNames.length; loop++)
{
inStream = new FileInputStream(fileNames[loop]);
zipOStream.putNextEntry(new ZipEntry(fileNames[loop]));
int i=0;
while ((i=inStream.read())!=-1)
{
zipOStream.write(i);
}
zipOStream.closeEntry();
inStream.close();
}
zipOStream.flush();
zipOStream.close();
}
catch (IllegalArgumentException iae) {
iae.printStackTrace();
}
catch(FileNotFoundException fe)
{
System.out.println("File not found===="+fe);
}
catch (IOException ioe)
{
System.out.println("IOException===="+ioe);
ioe.printStackTrace();
}
}
public static void UnZipFile(String zipFileName, String ToExtractFile)
{
String inputFileName = zipFileName; // "C:\\PPT.zip";
String desFileName = ToExtractFile; // "C:\\TEST\\";
try
{
File sourceZipFile = new File(inputFileName);
File destDirectory = new File(desFileName);
//Open the ZIP file for reading
ZipFile zipFile = new ZipFile(sourceZipFile,ZipFile.OPEN_READ);
//Get the entries
Enumeration enum = zipFile.entries();
while(enum.hasMoreElements())
{
ZipEntry zipEntry = (ZipEntry)enum.nextElement();
String currName = zipEntry.getName();
File destFile = new File(destDirectory,currName);
// grab file's parent directory structure
File destinationParent = destFile.getParentFile();
// create the parent directory structure if needed
destinationParent.mkdirs();
if(!zipEntry.isDirectory())
{
BufferedInputStream is = new
BufferedInputStream(zipFile.getInputStream(zipEntry));
int currentByte;
// write the current file to disk
FileOutputStream fos = new FileOutputStream(destFile);
BufferedOutputStream
dest
=
new
BufferedOutputStream(fos);
// read and write until last byte is encountered
while ((currentByte = is.read()) != -1)
{
dest.write(currentByte);
}
dest.flush();
dest.close();
is.close();
}
}
}
catch(IOException ioe)
{
System.out.println("IOException occured====="+ioe);
ioe.printStackTrace();
}
}
}
Creating WordprocesingML package
To create a WordprocessingML do the following steps.
1. Create an instance of the class OpenXMLZipFile
OpenXMLZipFile myWordprocessingML = new OpenXMLZipFile()
2. Create a variable
String zipFileName = “c:\\myFirstDocumentUsingJava.docx”
String [] ToCompressFiles = new String[4];
ToCompressFiles [0] = “c:\\WordprocessingML\\[Content_Types].xml”;
ToCompressFiles [1] = “c:\WordprocessingML\\image1.jpg”;
ToCompressFiles [2] = “c:\WordprocessingML\_rels\document.xml.rels”;
ToCompressFiles [3] = “c:\WordprocessingML\document.xml”;
3. Call the method CreateZipFile
CreateZipFile (zipFileName, ToCompressFiles);
The output of the above method will be a “myFirstDocumentUsingJava.docx”. This
document fully conforms to the Open XML standard, and can be accessed using Office
2007 (or the 2000/XP/2003 versions of Office with the free Open XML add-in installed).