How to use or convert Java library (PDFBox) in .NET application?

Searching with keywords “most popular programming languages 2013” in any search engine, you’ll find that Java always stays on top 3 of the list. It’s easy to understand because Java is an old programming language, open source, supports many operating systems (such as Linux, Windows…) and is able to run on many devices with slogan “Write once, run everywhere”. There is no question about position of Java or C or C++ but I just wonder how it can be, that Objective C has higher position than C#. A language for about one billion devices is more popular than a language for many billion devices around the world? It’ really weird. However it is not the topic of this post today. Back to Java, with its popularity, of course there are many great open source libraries written in Java. They are all wonderful libraries that any programmer would like to use in his product because of their numerous features and stability.

For example, there are 2 open source libraries in Java that I like very much. One is MPXJ (http://mpxj.sourceforge.net/), it supports reading many project file formats which are designed to assist a project manager in developing a plan, assigning resources to tasks, tracking progress, managing the budget, and analyzing workloads. Its supported formats are Microsoft Project Exchange (MPX), Microsoft Project (MPP,MPT), Microsoft Project Data Interchange (MSPDI XML), Microsoft Project Database (MPD), Planner (XML), Primavera (PM XML, XER, and database), Asta Powerproject (PP, MDB), and the Standard Data Exchange Format (SDEF).

The second one is PDFBox (http://pdfbox.apache.org/), an open source Java tool for working with PDF documents. It allows creation of new PDF documents, manipulation of existing documents and the ability to extract content from documents. Both of them are great products and free. That means we can use it also in our commercial projects. However they are all written in Java. How can we use them in .NET application? Can we convert a Java library to a .dll which can be used in .NET?

On the website of MPXJ, you’ll see that they provide a converted version for .NET development. You only need to make reference to converted .dll files and copy all required other components to application folder and you can read any project file supported by MPXJ. That’s nice. PDFBox, unfortunately, doesn’t provide any converted version for .NET. On its homepage, there’s just some hints for converting the project http://pdfbox.apache.org/building.html and that’s all. Therefore in this post, I would like to show step by step how we can convert PDFBox (or any Java library) to .dll which can be used in .NET application.

1. Download
– Apache Apt : http://ant.apache.org/bindownload.cgi or latest version at the time of writing http://artfiles.org/apache.org//ant/binaries/apache-ant-1.9.1-bin.zip
– IKVM : http://sourceforge.net/projects/ikvm/files/ or latest version at the time of writing http://sourceforge.net/projects/ikvm/files/latest/download?source=files
– PDFBox source: http://pdfbox.apache.org/downloads.html or latest version at the time of writing http://www.apache.org/dyn/closer.cgi/pdfbox/1.8.2/pdfbox-1.8.2-src.zip
– JAVA SDK (JDK): http://www.oracle.com/technetwork/java/javase/downloads/index.html or latest version at the time of writing http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html

2. Extract Apache Apt, IKVM, PDFBox zip file to folder

3. Install JAVA SDK

4. Check if JAVA_HOME environment variable was set

JAVA_HOME environment variables

5. Go to extract folder of PDFBox <pdfbox-extract-folder>/<pdfbox-version>/pdfbox , for example, in my case D:\JavaToNet\pdfbox-1.8.2-src\pdfbox-1.8.2\pdfbox

6. Open build.xml file, search to ‘<property name=”ikvm.dir”‘ and replace value ‘.’ with folder of IKVM. For example, in my case

IKVM directory

7. Open ‘cmd’ console and move to <apache ant>/bin folder and build the build.xml file of PDFBox as following image

Ant Build PDFBox

Ant build PDFBox successfully

8. When converting process finishes, go to <pdf-box-folder>/<pdfbox-version>/pdfbox/bin (this folder is created after building successfully), you’ll find converted version of PDFbox (pdfbox-1.8.2.dll) for .NET application

pdfbox .dll

9. Make a sample project, add reference to new generated .dll

Visual Studio add reference to pdfbox .dll

You’ll still need another dll from IKVM also. Just compile your code, the required .dll will be shown up at error tab.

private static void Main(string[] args)
{
	PDDocument document = PDDocument.load("AdventureWorksLT.pdf");
	PDFTextStripper stripper = new PDFTextStripper();
	Console.WriteLine(stripper.getText(document));
	Console.ReadLine();
}

So, after simple steps, you got PDFBox working in your .NET Application. Maybe you’ll ask yourself why PDFBox Team don’t releasesa .NET version themselves? It seems to be pretty simple. Unfortunately although most of features of PDFBox work in .NET version, the main feature that I want to use “PDF to Image Conversion” doesn’t work at all. If you try to convert PDF to Image with PDFBox, you will get a wrong result. Text will be completely wrong displayed, it’s just getting much smaller. Maybe that’s the reason why PDFBox doesn’t release any official .NET version because some features are broken in converted .dll file.

Even though there are limitations, we can still use .NET version with many other features. And it’s more important that we’ve learnt how to convert PDFBox or any Java library to dll. From now on, we have more open source libraries to use for building great features in our products.

UPDATE 22.11.2013
This is the converted .NET version of PDFBox 1.8.2, you can try it.

Mirror 1: http://www.mediafire.com/download/y59d0llgla1zbpl/pdfbox-1.8.2.dll
Mirror 2: https://app.box.com/s/idliacomvtjhq6x6k2z0
Mirror 3: https://mega.co.nz/#!msgmlRYD!OSUaKGGHsM4ISabuyOXXwE4qUiscXh7VKbtOdNAqncA

UPDATE 01.02.2014
The commons-logging.dll which maybe also required

Mirror 1: https://www.mediafire.com/?xr2j7zq5u34kjjw
Mirror 2: https://app.box.com/s/76lcjwzw8zv9luzefd0e
Mirror 3: https://mega.co.nz/#!TlRFELBY!j0Y6xLGnkDoUtP_y3ByZyy75jdv6t4ZiYhB-8ktBLRk

5 thoughts on “How to use or convert Java library (PDFBox) in .NET application?”

Leave a Reply

Your email address will not be published. Required fields are marked *