Saturday, January 2, 2010

Invoking COM word object in Php script and using COM

Use COM object in PHP using WORD.APPLICATION to convert or saveas() file(doc/docx) to word, html, txt, UTF-8,Unicode,RTF.

Invoking COM word object in Php script and using COM.

  • OK. It’s way old technology but it’s important to understand how it’s used with new technologies or emerging technologies like Php. So here is the basic and some walk through about COM & DCOM.
  • The very basic question comes to mind, why we need com object to invoke word application… cant php do it all by itself ?
  • NO, MS doesn’t provide such functionality to invoke objects without COM. As these applications are developed using com technology they must be invoked by COM (best to my belief and understanding , I’ve not came across any other alternative technique). Com is Language independent.

COM is used to communicate with objects (e.g. word or excel or faxing application). COM will set rules about how objects need to be invoked as well as how to send messages between objects in a particular way, which makes it easier for objects written in different languages to communicate with each other.

OBJECT: it is combination of properties (attributes, data) and methods (functions, code ). Properties : top, left, width, etc and methods: onChange, onExit, onClick, etc. c++ programmers can imagine object diagram.

Diagram of object

.

COMPONENT: reusable piece of executable code that can be used with other applications with minimum efforts; could be .EXE, .DLL or .OCX .

INTERFACE – set of functions grouped together under one name.COM uses Vtables to define Virtual functions(basically functions) in memory, so any language that supports calling functions via pointers can be used to write components.it provides programming language independency

E.g. in php : $word->Documents->Open('C:/test1.doc');

Diagram of interface,component,object.

  • To develop a application(website also considered) which uses word for finding solution to problem you can interact with Microsoft office word objects. Word objects has two main classes they are the Application and Document classes.
  • The Application object represents the entire application, each Document object represents a single Word document.This will be useful while declaring or invoing word functions.
  • Once the document is invoked we can open any file format which is permissible in word application and saveas() it into any other format.

COM Type Libraries…

It is basically documentation about collections, objects, methods (with their parameters), and properties (with their data types) that are exposed for use by COM client applications.

Eg. com_load_typelib('Word.Application');

Distributed COM….

COM can be used as Distributed component, which is not bound by any one process nor it can be stopped by another process at remote m/c (any other computer connected to yours, apart from your's) is DCOM.

Two types of processes occupy memory in Windows: OS processes and User (other)

Processes. All (user or ) other processes have a separate process space in which they keep their variables, and the OS grants a place which is shared by all, this is where shared code is written.

Server – a piece of code that implements component (reusable code E.g. exe, dll) object.

In-process – the server code executes in the same process space as the client (as DLL)

Out-of-process or remote – the server code runs in another process on the same machine

or in another process on a remote machine (as an EXE).

Local process – the code runs in the same process space on the same machine

DCOM uses surrogate process (you cannot run in-proc server on a remote machine. You need a surrogate process to do that.)

GUID: Every COM component is registered in the Windows registry and has a unique 16-byte number. This number is unique for every COM component of this

world. Whenever the COM component or a library is called,it is checked by its GUID in the registry.

E.g. The type library GUID, followed by its version number, for example {00000200-0000-0010-8000-00AA006D2EA4},2,0.

COM component or library looks for supplied GUID, whether it is registered or not if it is registered then server is loaded into memory server will run the component that is located and services of the component are utilized.

Marshalling – Case I

If another process, P2 requests reference to the same COM component, then instead of calling it again in P2, P2 is given a hypothetical object of the COM component. Even though P2 interacts with the object in some other process but it functions as if it has its own copy of the object.

Marshalling – Case II

If the component is registered but the server that is required to run the component is located on a remote computer then the process defined in Case I is repeated.The only difference being it is done for a remote computer, and consequently, has greater overhead.

Creating a COM Object

//invoke word application

$word = new COM("word.application") or die("Unable to instantiate Word");

//This will open up one Word Instances.

$word->ActiveDocument->Close(false)

The document that has the focus is called the active document and is represented by theActiveDocument property of the Application object.

COM Execution is done in its client Machine where DCOM Run on selected server

This is a code to embed COM object in php to invoke word application as a part of your solution where input is doc file and ssaveas() function will convert the doc file into other mentioned formats(txt,html,Unicode,RTF).

Firstly I’ll provide with the code …test it ..run it..execute it.Then look into the working.

// this piece of code takes “test1.doc” and saveas it into “aa.txt”

com_load_typelib('Word.Application');

// create instance of word file

$word = new COM("word.application") or die("Unable to instantiate Word");

// use like this. Place yours word file in C drive

$word->Documents->Open('C:/original_doc_file.doc');

// create text file with name as “converted_text_file.txt”

$new_text_file = "c:/converted_text_file.txt";

// word Document = 0; word Template = 1; Text file = 2; Text file with Line Breaks = 3; Text with DOS encoding = 4; Text file with Line Breaks DOS encoding = 5; RTF file= 6; Unicode Text file = 7;

// documents[1] is used to “saveas” file for different functions different array elements are used.

$word->Documents[1]->SaveAs($new_text_file,2);

// The document that has the focus is called the active document

$word->ActiveDocument->Close(false);

$word->Quit();

$word = null;

?>