Project Oxford – Computer Vision API – OCR

I have introduced some candidates for OCR in the previous posts such as C# – OCR library candidates and C# – An example of OCR web service. Last year Microsoft published his new project for machine learning. The project is called Project Oxford. This project provides us the APIs for computer vision, face, emotion, video, speech, speaker recognition… The project is really interesting, you can learn more about it from its homepage. In this small post, I would like to illustrate how we can make a call to OCR service. OCR ist just a part of Computer Vision APIs.

1. Register

To use OCR service, you have to sign up an account in Project Oxford but the registration is only available for the account with Azure abonnement. So you have to sign up your account with Azure abonnement and turn on Computer Version API as following steps.
1. Go to https://manage.windowsazure.com. Log in with your Azure account.
2. On the left under corner, click on New

New

3. Choose Marketplace.

New Marketplace

4. Select Computer Vision APIs

New Marketplace

5. You can use plan Free for testing services. After service is created, click on Manage on the bottom to get access keys.

Manage

6. On the subscriptions site, write down the secondary key. We’ll need it for code later.

Secondary key

2. OCR API

After having the API key, we can follow the documentation of Computer Vision APIs to call OCR service for extracting text from an image. The OCR service is just a normal REST service, we can use HttpClient to post image to it and get result back as following.

private static void Main(string[] args)
{
	//https://manage.windowsazure.com
	const string apiKey = "c593b208ab7b4404b48e8620564519f3";
	const string baseAddress = "https://api.projectoxford.ai/vision/v1/";
	const string DefaultLanguage = "unk";
	const bool DefaultDetectOrientation = true;
	const string APIKeyHeader = "Ocp-Apim-Subscription-Key";
	var testImageUri =
		new Uri("https://oxfordportal.blob.core.windows.net/vision/OpticalCharacterRecognition/1.jpg");
	var requestBody = new JObject
	{
		["Url"] = testImageUri.ToString()
	};

	var ocrUri = $"ocr?language={DefaultLanguage}&detectOrientation={DefaultDetectOrientation}";
	var httpClient = new HttpClient();
	httpClient.BaseAddress = new Uri(baseAddress);
	httpClient.DefaultRequestHeaders.Add(APIKeyHeader, apiKey);
	var response = httpClient.PostAsJsonAsync(ocrUri, requestBody).Result;

	var responseJson = new JObject();
	if (response.IsSuccessStatusCode)
	{
		responseJson = response.Content.ReadAsAsync<JObject>().Result;
		Console.WriteLine(responseJson);
	}
	else
		throw new Exception(
			$"Failed call: {testImageUri} failed to OCR - code {response.StatusCode} - details\n{responseJson.ToString(Formatting.Indented)}");
	Console.ReadLine();
}

The result is given back in JSON format as the following image.

Secondary key

Computer Vision APIs gives information not only about the text (in lines) but also the position of bounding box wrapping about that text. Remember to replace apiKey with your real key. The header option, as well as arguments, are all documented at API reference site.

The APIs also provides other functions such as image analyzing or thumbnail generating. You can read more at documentation site.

3. Source code

Source code: https://bitbucket.org/hintdesk/dotnet-project-oxford-computer-vision-api-ocr

Leave a Reply

Your email address will not be published. Required fields are marked *