During my development I need to convert Word to HTML. I have installed Word on my local machine and I believe there are already many available ways on Internet to convert Word to HTML. In this small example I would like to introduce a method without adding any reference to any COM Component. This method uses namespace System.Reflection to create an instance of word document and invoke functionality “Export to HTML” for converting. The following functions will be used
Type.GetTypeFromProgID Method: Gets the type associated with the specified program identifier (ProgID).
Activator.CreateInstance Method: Creates an instance of the specified type using the constructor that best matches the specified parameters.
Type.InvokeMember Method: When overridden in a derived class, invokes the specified member, using the specified binding constraints and matching the specified argument list, modifiers and culture.
And the complete source code of example is in following snippet
static void Main(string[] args) { ConvertWord2Html(@"E:\temp.doc", @"E:\temp.html"); Console.WriteLine("Convert done"); Console.ReadLine(); } static void ConvertWord2Html(string strSource, string strDestination) { // Constant for WORD-TO-HTML exporting format const int WORD_HTML_FORMAT = 8; // Load COM-Metadata of Word application from registry Type tWordApplication = Type.GetTypeFromProgID("Word.Application"); // Create new instance of Word object objWord = Activator.CreateInstance(tWordApplication); // List all documents object objDocuments = tWordApplication.InvokeMember("Documents", BindingFlags.IgnoreCase | BindingFlags.GetProperty | BindingFlags.Public, null, objWord, new object[0]); // Get COM-Metadata of Word Documents Type tWordDocuments = objDocuments.GetType(); // Load source object objDocument = tWordDocuments.InvokeMember("Open", BindingFlags.IgnoreCase | BindingFlags.InvokeMethod | BindingFlags.Public | BindingFlags.OptionalParamBinding, null, objDocuments, new object[1] { strSource }); // Get COM-Metadata of Word Documents Type tWordDocument = objDocument.GetType(); // Create HTML tWordDocument.InvokeMember("SaveAs", BindingFlags.IgnoreCase | BindingFlags.InvokeMethod | BindingFlags.Public | BindingFlags.OptionalParamBinding, null, objDocument, new object[2] { strDestination, WORD_HTML_FORMAT }); // Close Word tWordApplication.InvokeMember("Quit", BindingFlags.IgnoreCase | BindingFlags.InvokeMethod | BindingFlags.Public | BindingFlags.OptionalParamBinding, null, objWord, new object[0]); }
Although this method is very comfortable, it doesn’t not allow us to monitor how long the converting still runs so that we can notify the user and allow him to cancel if it lasts too long, for example with very large Word document.
Thx,
I need exactly the opposite, html to office doc…
ssd
As I Know, for converting word file to html file, using Spire.Doc may good choice.
http://www.e-iceblue.com/Introduce/word-for-net-introduce.html
This is a great idea . It’s easy and safe .The best thing is it is very clear .
Hi,
Do we need MSWord installed in our machine to run this program??
Regards,
Kalyan
@Kalyan: You still need Word on local machine. This method only allows you to make a late binding to instance of word application during converting.
Hi, this is very helpful! Just to clarify, what if I wanted to use the same approach on a web based application? Do i need MS word installed on the server? Thanks…
@Melo: You need Word on your server. Regards.
Oh, thanks! actually we don’t have Word on the server. Is there another way of displaying a Word file as HTML in an iframe? Thanks!
@Melo: From version 2007 you can use OpenXML to read a Word file, but for 2003 you need Word installed on server.
I think the only problem about this method is that it still needs Word to be installed on the machine this code is running on. Using third party libraries, you won’t need to install Word, or any kind of redistributable package.
@Saeed Neamati: Yes, you’re right.
Hi,
I’ve a problem with this code.
Fist time, I run this code using Unit Test, everything is OK.
Then, I use this function in a Window Service, it throw an Exception:
‘Object reference not set to an instance of an object’ at
// Get COM-Metadata of Word Documents
Type tWordDocuments = objDocuments.GetType();
// Load source
object objDocument = tWordDocuments.InvokeMember(“Open”, BindingFlags.IgnoreCase | BindingFlags.InvokeMethod | BindingFlags.Public | BindingFlags.OptionalParamBinding, null, objDocuments, new object[1] { strSource });
// Get COM-Metadata of Word Documents
Type tWordDocument = objDocument.GetType(); ====> throw Exception at this line.
Could you please give me some guides, my machine installed Office, and it run very well at TEST UNIT mode.