C# – Convert Word to HTML

During my development I need to convert Word to HTML. I have installed Word on my local machine and I believe there are already many available ways on Internet to convert Word to HTML. In this small example I would like to introduce a method without adding any reference to any COM Component. This method uses namespace System.Reflection to create an instance of word document and invoke functionality “Export to HTML” for converting. The following functions will be used
Type.GetTypeFromProgID Method: Gets the type associated with the specified program identifier (ProgID).
Activator.CreateInstance Method: Creates an instance of the specified type using the constructor that best matches the specified parameters.
Type.InvokeMember Method: When overridden in a derived class, invokes the specified member, using the specified binding constraints and matching the specified argument list, modifiers and culture.

And the complete source code of example is in following snippet

static void Main(string[] args)
        {
            ConvertWord2Html(@"E:\temp.doc", @"E:\temp.html");
            Console.WriteLine("Convert done");
            Console.ReadLine();
        }

        static void ConvertWord2Html(string strSource, string strDestination)
        {
            // Constant for WORD-TO-HTML exporting format
            const int WORD_HTML_FORMAT = 8;

            // Load COM-Metadata of Word application from registry
            Type tWordApplication = Type.GetTypeFromProgID("Word.Application");

            // Create new instance of Word
            object objWord = Activator.CreateInstance(tWordApplication);

            // List all documents
            object objDocuments = tWordApplication.InvokeMember("Documents", BindingFlags.IgnoreCase | BindingFlags.GetProperty | BindingFlags.Public, null, objWord, new object[0]);

            // Get COM-Metadata of Word Documents
            Type tWordDocuments = objDocuments.GetType();

            // Load source
            object objDocument = tWordDocuments.InvokeMember("Open", BindingFlags.IgnoreCase | BindingFlags.InvokeMethod | BindingFlags.Public | BindingFlags.OptionalParamBinding, null, objDocuments, new object[1] { strSource });

            // Get COM-Metadata of Word Documents
            Type tWordDocument = objDocument.GetType();

            // Create HTML 
            tWordDocument.InvokeMember("SaveAs", BindingFlags.IgnoreCase | BindingFlags.InvokeMethod | BindingFlags.Public | BindingFlags.OptionalParamBinding, null, objDocument, new object[2] { strDestination, WORD_HTML_FORMAT });

            // Close Word
            tWordApplication.InvokeMember("Quit", BindingFlags.IgnoreCase | BindingFlags.InvokeMethod | BindingFlags.Public | BindingFlags.OptionalParamBinding, null, objWord, new object[0]);
        }

Although this method is very comfortable, it doesn’t not allow us to monitor how long the converting still runs so that we can notify the user and allow him to cancel if it lasts too long, for example with very large Word document.

C# – Insert many tables/images successively into Word/current Word document

Today I read a thread on mycsharp.de asking about how man can insert many tables successively into Word. I think it’s very simple to do that with some Google search but when I begin to write a demo for myself I found out that it’s not easy as I think. There are some new concepts which I am still not clear, for example Range or a function with lot of arguments with ref-keyword. So I lost a lot of time to understand how the classes work.
To work with a Word document, what all we need to do is add a reference to assembly Microsoft.Office.Interop.Word. You can choose version 11 or 12 whatever you want. To add tables to Word, let’s following these steps.

1. We need to create a Word application instance

Application objWord = new Application();

2. Create a Word document

object objMissing = System.Reflection.Missing.Value;
Document objDocument = objWord.Documents.Add(ref objMissing, ref objMissing, ref objMissing, ref objMissing);

Did you see the objMissing appearing so many times in calling function Documents.Add. It’s really insane when writing such a code but we can quickly create a new blank document.

3. Add tables into document

object oEndOfDoc = "\\endofdoc";
Range objRange;
Table objTable;
Paragraph objParagraph;
object objRangePara;
for (int nIndex = 0; nIndex < 3; nIndex++)
{
	objRange = objDocument.Bookmarks.get_Item(ref oEndOfDoc).Range;
	objTable = objDocument.Tables.Add(objRange, 5, 2, ref objMissing, ref objMissing);                                     
	objRangePara = objDocument.Bookmarks.get_Item(ref oEndOfDoc).Range;
	objParagraph = objDocument.Content.Paragraphs.Add(ref objRangePara);                  
	objParagraph.Range.Text = Environment.NewLine;   
}

We’ll try to find out where is the end of our document and set range from the current position until end. Then we add our table with 5 rows and 2 columns. What important is that we must insert a blank line between tables. If we don’t do so, all tables will merge into one. I really lost a lot of time at this point. From the beginning, I did not add any blank line therefore the tables were always automatically merged into one. I thought that when we add new table, document object will insert table from new line or in new paragraph but the truth is that it doesn’t.

4. Save our file

object objFileName = @"E:\temp.doc";
objDocument.SaveAs(ref objFileName, ref objMissing, ref objMissing, ref objMissing, ref objMissing, ref objMissing, ref objMissing, ref objMissing, ref objMissing, ref objMissing, ref objMissing, ref objMissing, ref objMissing, ref objMissing, ref objMissing, ref objMissing);
((Microsoft.Office.Interop.Word._Application)objWord).Quit(ref objMissing, ref objMissing, ref objMissing);

The complete source code of example you can download here “Insert table into word document

UPDATE 06.08.2012
To insert images into current opening Word document, you can use listing below

private static void InsertImages()
{
	Application wordApp;
	Document doc = null;
	try
	{
		wordApp = (Application)System.Runtime.InteropServices.Marshal.GetActiveObject("Word.Application");
	}
	catch (Exception ex)
	{
		wordApp = new Application();
		wordApp.Documents.Add();
		wordApp.Visible = true;
	}

	for (int index = 0; index < wordApp.Windows.Count; index++)
	{
		object a = index + 1;
		Window winWord = wordApp.Windows.get_Item(ref a);
		if (winWord.Active)
		{
			doc = wordApp.ActiveDocument;
			break;
		}
	}

	List<string> fileNames = new List<string>()
				{
					@"D:\Temp\Files\20120619-1216_37_73.jpg",
					@"D:\Temp\Files\20120619-1201_37_65.jpg",
					@"D:\Temp\Files\20120619-1200_37_64.jpg",
					@"D:\Temp\Files\20120619-1215_37_72.jpg",
					@"D:\Temp\Files\20120427-1430_22_2.jpg",
				};
	foreach (string filename in fileNames)
	{				
		doc.InlineShapes.AddPicture(filename, Type.Missing, Type.Missing, Type.Missing);
	}

	//wordApp.Quit();
}

We call API GetActiveObject to get all instances of Word application and make a loop through all of their windows. Only one of them will be active and it’s the last one that we visited. After locating the “active” window, we get the current document and insert images at current cursor through function Document.InlineShapes.AddPicture()