Search our Blogs

[Tutorial] OCR on Google Glass

By martinz October 23, 2014 - last edited February 2, 2016

Please note that HP IDOL OnDemand is now HPE Haven OnDemand. The API endpoints have changed to Haven OnDemand. Please see the API documentation for more details.

---

In this Google Glass tutorial we’ll be demonstrating how to create a google glass app to run Image-To-Text/Optical Character Recognition(OCR) on pictures taken with the glass Camera.

Note: This Google Glass tutorial used the CameraPreview example from Jared Burrows' GitHub repository OpenQuartz as a stub. The OpenQuartz repo is a great source of starter examples for all things to do with Glass and specifically things to do with the Camera or OpenCV.

Setup

This tutorial comes with stub code which you should use to quickly get setup.

This stub code is available on Github

To get started clone the repo somewhere on your computer.

git clone https://github.com/HP-IDOL-OnDemand/iod-glass-ocr

The repo contains 3 versions of the example

Camera: A Camera sample app, allowing you to take pictures using google glass.
CameraWithVoiceCommand: A Camera sample app with Voice Command launch. Same as the first example but with the voice command already added to the code.
IODCameraOCR: A Camera+OCR sample app, the final output of this tutorial. Being able to take photos and display the OCR result using IDOL OnDemand. Set your APIkey in the MainActivity class and you’ll be ready to go.

Get started using 1 or 2 and create a new project importing from Existing Android Code.

Notes

Since we’ll be calling the internet, we’ve added the following line to our AndroidManifest.xml file, make sure it is there in any project that is calling IDOL OnDemand or other internet endpoints.

<uses-permission android:name="android.permission.Internet"/>

Since sending MultiPart POST calls from android natively can be a pain, for the Rest calls I’ll be using Unirest.io for simplicity, . Like many HTTP libraries with android, the Unirest.io library has some conflicts that break a regular import but I have generated a conflict free jar that can be found in the lib folder of each of the projects, saving you time and effort. The jar was created by following the instructions provided here, should you wish to do it yourself.

All set up! Let’s get coding.

The stub example

The Camera and CameraWithVoice commands examples from the GitHub Repository are what we’ll be using to get started. When run on google glass, they display a view of what the Camera is seeing and allow us to tap once to take a picture.

All we need to do is to use that picture to extract the text from it! But first…

Adding a Voice command to Start our App

If you started with the example app number 2 as referenced above, you can skip this section

Right now the application will start when you press Run on Eclipse. Assuming Google Glass is connected and running. What we’ll want to do however is make our application callable using a voice command. We’ll want to go into our AndroidManifest.xml file and replace the following code:

<intent-filter>
<action android:name="android.intent.action.MAIN"/>
<category android:name="android.intent.category.LAUNCHER"/>
</intent-filter>

With this one:

<intent-filter>
<action  android:name="com.google.android.glass.action.VOICE_TRIGGER"/>  </intent-filter>
<meta-data android:name="com.google.android.glass.VoiceTrigger" android:resource="@xml/mycameralaunch"/>

That meta-data is referencing the command that we’ll be using to call it. We can set that command by creating a “res/xml/mycameralaunch.xml” file and fill it with the following

<?xml version="1.0" encoding="utf-8"?>
<trigger keyword="@string/glass_voice_trigger">
<constraints network="true" />
</trigger>

And finally adding the following to the /res/values/strings.xml file. Allowing us to start our app by saying “Ok Glass My Camera Project” or anything else you may want to use.

<string name="glass_voice_trigger">My Camera Project</string>

Adding OCR using IDOL OnDemand

We have a project that takes a picture when we tap the screen. What we now want is to run OCR on that image when it is ready. For that we’ll want to look at the processPictureWhenReady method. Currently the method will get called when the photo was first taken, and it will create a FileObserver to observe the photo file until it has been closed and hence is ready for us to use. When that is done it calls itself again and enters the following if statement.

    if (pictureFile.exists()) 
    {
    // This is where we'll want to do our thing
    }

That pictureFile is the photo we’ve just taken, so what we want to do now is start an asynchronous process to call the OCR API from idolondemand.com . On Android, such task are done using AsyncTasks. In our MainActivity class we can define the following.

private static class IODOCRTask extends AsyncTask<File,Void,String> {
        private MainActivity activity;

        protected IODOCRTask(MainActivity activity) {
            this.activity = activity;
        }
}

We want to pass it our current activity so that we can then use it to display the result. In our processPictureWhenReady we can now prepare our OCR call.

if (pictureFile.exists()) 
    {
    Toast.makeText(getApplicationContext(), "FILE IS WRITTEN",              Toast.LENGTH_SHORT).show();
    new IODOCRTask(this).execute(new File(pictureFile.toString()));
    }

As we can see we’re writing a small message to the screen to say the File has been written and starting our task, passing along a copy of the pictureFile we’ve just created.

However due to the way the OCR API works, sending the raw image may not give the best results, and it is often better to resize the image first, something we can do by changing the above code to pass a resized image instead.

            File dir = Environment                               .getExternalStoragePublicDirectory(Environment.DIRECTORY_DCIM);
            Bitmap b = BitmapFactory.decodeFile(picturePath);
            Bitmap out = Bitmap.createScaledBitmap(b, 640, 960, false);

            File file = new File(dir, "resize.png");
            FileOutputStream fOut;
            try {
                fOut = new FileOutputStream(file);
                out.compress(Bitmap.CompressFormat.PNG, 100, fOut);
                fOut.flush();
                fOut.close();
                b.recycle();
                out.recycle();

                new IODOCRTask(this).execute(file);

            } catch (Exception e) { // TODO
                e.printStackTrace();
            }

Now we just need to handle that File and do the call. For this we’ll need to implement AsyncTask’s doInBackground method and add the following to our IODOCRTask

@Override
protected String doInBackground(File... params) {
File file= params[0];
String result="";
...
//Our IDOL OnDemand code here

return result;
}

We now have the file and we want our result to be the OCR output. I mentioned earlier that I had packaged the Unirest Library to make HTTP calls and we’ll be using it here.

try {

    HttpResponse<JsonNode> response = Unirest.post("http://api.idolondemand.com/1/api/sync/ocrdocument/v1")                     .field("file",file)
.field("mode", "scene_photo")                       .field("apikey","<yourapikey>")
.asJson();

    JSONObject textblock =(JSONObject) response.getBody().getObject().getJSONArray("text_block").get(0);
     result=textblock.getString("text");                
} catch (Exception e) {
    // Keeping the error handling simple
    e.printStackTrace();
}

Using Unirest is amazingly simple, and we can chain file, mode and apikey parameters before calling the asJson() method to return a parsed object. We are now returning the text result from our OCR call, should there be one. Handling it is simple, we need to override the onPostExecute() method to our IODOCRTask.

@Override
protected void onPostExecute(String result) {
    activity.toastResult(result);
}

It takes the output of the OCR call and tells the activity to display it in a Toast message. We do this by adding the following method to our MainActivity class.

public void toastResult(String result){
    Toast.makeText(getApplicationContext(), result,
            Toast.LENGTH_LONG).show();
}

We now have a working Google Glass app that displays the OCR result when a picture is taken!

For any questions don’t hesitate to send me a direct message or submit issues in the repo.

Github Source: Available here

Tags: Google Glass| OCR| Tutorial

Everyone's Tags:

Topics:

Tutorial

You must be a registered user to add a comment here. If you've already registered, please log in. If you haven't registered yet, please click login and create a new account.

Post a Comment

My Community

Social Media

† The opinions expressed above are the personal opinions of the authors, not of HPE. By using this site, you accept the Terms of Use and Rules of Participation

Feb	MAR	May
	05
2015	2016	2017

[Tutorial] OCR on Google Glass

Follow Us on HPE

Follow Us on Facebook

Follow Us on Twitter

Follow Us on Youtube

Follow Us on LinkedIn

Follow Us on Vimeo

Follow Us on SlideShare