A “big” project comes from a small requirement. That always happens in programming. What I really need is just the thumbnails of MS Office 2007/2010 files, but there is no reasonable way so that I can get them without having to install MS Office (or another 3rd party components) on server. The thumbnails are only generated and stored with Office files only if users set the option “Save thumbnail” intentionally. If they don’t set this option, I have no way to get thumbnail because it’s not available. Many suggestions are made on Internet such as using Open Office, using Share Point Word Automation Service, converting to PDF then getting thumbnail or using a 3rd party component…
However I think – if I can convert the office files to HTML, I can get the snapshot from HTML. It would be great because I don’t need anything else but .Net Framework. The file format of MS Office 2007/2010 is Office Open XML format. It’s “almost” open-source standard and I can use Open XML SDK 2.0 for Microsoft Office to read the files and parse the content as HTML. Moreover I don’t need to parse the content perfectly as it should be because I only need the thumbnail. So if you have same requirements as I do, then try to use this library to convert DOCX to HTML. This version is only first draft, use it on your risk. Exceptions may come.
1. Prerequisites
Microsoft .NET Framework 3.5 Service Pack 1
Microsoft .NET Framework 4.0
2. Details
– Version: 1.0.0.0
– Supported OS: All Windows
– How to use:
[TestMethod] public void TestMethodCheckExceptionsSingle() { FileInfo fi = new FileInfo(@"C:\Users\nguy\Dropbox\ToolTuCode\Application\OfficeToHtml\TestData\docx\20130214\Haddad_Thesis.docx"); DocxToHtml docxToHtml = new DocxToHtml(); HtmlConvertResult result = docxToHtml.ToHtml(fi.FullName, @"C:\temp\Output"); if (result.ResultType == ResultType.OK) Process.Start(result.OutputFileName); }
– Download link: http://hintdesk.com/Web/Tool/OfficeToHtml.zip
– History:
+ [1.0.0.0] : Beta Version – Only DOCX is supported
– Some results of test:
3. Source code
3.1 Original source code
http://hintdesk.com/Web/Source/OfficeToHtml.zip
3.2 Similar open source projects
https://github.com/mwilliamson/dotnet-mammoth
Would you release the source code? Your class locks files, i would like to correct it.
@Johan: Link to download source code is updated in post.
This only works with Microsoft .NET Framework 4.0? There is any way to use into a Visual Studio 2008 project?
Thanks
@Diego: Yes of course you can use it in Visual Studio 2008. However you have to set your project to run on .NET Framework 4 in Project properties section.
do you have any good library to create html to docx with css of html
@binod tamang: No I don’t.
1.source code has no code for supporting images.
2.Many of my docx files are not being converted to HTML.Please help me out I am in hard need.
@amit: I don’t support this library officially. You have source code, you can extend it as you want.
I used succesfully also this tools:
http://www.gekoproject.com/component/k2/18-docx2html
It manages css styles from word styles.
Also summary, bulleted lists, numbered lists, images, ext links, …
It’s open source and fully customizable. they give me also a good support.
Hi,
This is not supporting for Docx->Content Contollers.
Link for download does not work.