C# – Convert Word to HTML

During my development I need to convert Word to HTML. I have installed Word on my local machine and I believe there are already many available ways on Internet to convert Word to HTML. In this small example I would like to introduce a method without adding any reference to any COM Component. This method uses namespace System.Reflection to create an instance of word document and invoke functionality “Export to HTML” for converting. The following functions will be used
Type.GetTypeFromProgID Method: Gets the type associated with the specified program identifier (ProgID).
Activator.CreateInstance Method: Creates an instance of the specified type using the constructor that best matches the specified parameters.
Type.InvokeMember Method: When overridden in a derived class, invokes the specified member, using the specified binding constraints and matching the specified argument list, modifiers and culture.

And the complete source code of example is in following snippet

static void Main(string[] args)
        {
            ConvertWord2Html(@"E:\temp.doc", @"E:\temp.html");
            Console.WriteLine("Convert done");
            Console.ReadLine();
        }

        static void ConvertWord2Html(string strSource, string strDestination)
        {
            // Constant for WORD-TO-HTML exporting format
            const int WORD_HTML_FORMAT = 8;

            // Load COM-Metadata of Word application from registry
            Type tWordApplication = Type.GetTypeFromProgID("Word.Application");

            // Create new instance of Word
            object objWord = Activator.CreateInstance(tWordApplication);

            // List all documents
            object objDocuments = tWordApplication.InvokeMember("Documents", BindingFlags.IgnoreCase | BindingFlags.GetProperty | BindingFlags.Public, null, objWord, new object[0]);

            // Get COM-Metadata of Word Documents
            Type tWordDocuments = objDocuments.GetType();

            // Load source
            object objDocument = tWordDocuments.InvokeMember("Open", BindingFlags.IgnoreCase | BindingFlags.InvokeMethod | BindingFlags.Public | BindingFlags.OptionalParamBinding, null, objDocuments, new object[1] { strSource });

            // Get COM-Metadata of Word Documents
            Type tWordDocument = objDocument.GetType();

            // Create HTML 
            tWordDocument.InvokeMember("SaveAs", BindingFlags.IgnoreCase | BindingFlags.InvokeMethod | BindingFlags.Public | BindingFlags.OptionalParamBinding, null, objDocument, new object[2] { strDestination, WORD_HTML_FORMAT });

            // Close Word
            tWordApplication.InvokeMember("Quit", BindingFlags.IgnoreCase | BindingFlags.InvokeMethod | BindingFlags.Public | BindingFlags.OptionalParamBinding, null, objWord, new object[0]);
        }

Although this method is very comfortable, it doesn’t not allow us to monitor how long the converting still runs so that we can notify the user and allow him to cancel if it lasts too long, for example with very large Word document.

13 thoughts on “C# – Convert Word to HTML”

  1. @Kalyan: You still need Word on local machine. This method only allows you to make a late binding to instance of word application during converting.

  2. Hi, this is very helpful! Just to clarify, what if I wanted to use the same approach on a web based application? Do i need MS word installed on the server? Thanks…

  3. Oh, thanks! actually we don’t have Word on the server. Is there another way of displaying a Word file as HTML in an iframe? Thanks!

  4. @Melo: From version 2007 you can use OpenXML to read a Word file, but for 2003 you need Word installed on server.

  5. I think the only problem about this method is that it still needs Word to be installed on the machine this code is running on. Using third party libraries, you won’t need to install Word, or any kind of redistributable package.

  6. Hi,
    I’ve a problem with this code.
    Fist time, I run this code using Unit Test, everything is OK.
    Then, I use this function in a Window Service, it throw an Exception:
    ‘Object reference not set to an instance of an object’ at
    // Get COM-Metadata of Word Documents
    Type tWordDocuments = objDocuments.GetType();
    // Load source
    object objDocument = tWordDocuments.InvokeMember(“Open”, BindingFlags.IgnoreCase | BindingFlags.InvokeMethod | BindingFlags.Public | BindingFlags.OptionalParamBinding, null, objDocuments, new object[1] { strSource });
    // Get COM-Metadata of Word Documents
    Type tWordDocument = objDocument.GetType(); ====> throw Exception at this line.

    Could you please give me some guides, my machine installed Office, and it run very well at TEST UNIT mode.

Leave a Reply

Your email address will not be published. Required fields are marked *