Tool,C# – Docx to Html library

A “big” project comes from a small requirement. That always happens in programming. What I really need is just the thumbnails of MS Office 2007/2010 files, but there is no reasonable way so that I can get them without having to install MS Office (or another 3rd party components) on server. The thumbnails are only generated and stored with Office files only if users set the option “Save thumbnail” intentionally. If they don’t set this option, I have no way to get thumbnail because it’s not available. Many suggestions are made on Internet such as using Open Office, using Share Point Word Automation Service, converting to PDF then getting thumbnail or using a 3rd party component…

However I think – if I can convert the office files to HTML, I can get the snapshot from HTML. It would be great because I don’t need anything else but .Net Framework. The file format of MS Office 2007/2010 is Office Open XML format. It’s “almost” open-source standard and I can use Open XML SDK 2.0 for Microsoft Office to read the files and parse the content as HTML. Moreover I don’t need to parse the content perfectly as it should be because I only need the thumbnail. So if you have same requirements as I do, then try to use this library to convert DOCX to HTML. This version is only first draft, use it on your risk. Exceptions may come.

1. Prerequisites

Microsoft .NET Framework 3.5 Service Pack 1
Microsoft .NET Framework 4.0

2. Details

– Version: 1.0.0.0
– Supported OS: All Windows
– How to use:

[TestMethod]
public void TestMethodCheckExceptionsSingle()
{
	FileInfo fi = new FileInfo(@"C:\Users\nguy\Dropbox\ToolTuCode\Application\OfficeToHtml\TestData\docx\20130214\Haddad_Thesis.docx");

	DocxToHtml docxToHtml = new DocxToHtml();
	HtmlConvertResult result = docxToHtml.ToHtml(fi.FullName, @"C:\temp\Output");
	if (result.ResultType == ResultType.OK)
		Process.Start(result.OutputFileName);
}

– Download link: http://hintdesk.com/Web/Tool/OfficeToHtml.zip
– History:
+ [1.0.0.0] : Beta Version – Only DOCX is supported
– Some results of test:

3. Source code

3.1 Original source code

http://hintdesk.com/Web/Source/OfficeToHtml.zip

3.2 Similar open source projects

https://github.com/mwilliamson/dotnet-mammoth

9 thoughts on “Tool,C# – Docx to Html library”

  1. This only works with Microsoft .NET Framework 4.0? There is any way to use into a Visual Studio 2008 project?

    Thanks

  2. @Diego: Yes of course you can use it in Visual Studio 2008. However you have to set your project to run on .NET Framework 4 in Project properties section.

  3. 1.source code has no code for supporting images.
    2.Many of my docx files are not being converted to HTML.Please help me out I am in hard need.

Leave a Reply

Your email address will not be published. Required fields are marked *