Firebase ML Kit 1: Text Recognition

Here marks the beginning of another mini-series, Firebase ML Kit! Machine Learning for beginners and experts alike, even though the only really applicable chapter of this series for experts would be the very last one.

For this first entry, we’ll learn how to use Text Recognition within our app. Though if you’re interested in ML Kit as a whole and this is your first time reading about it, check out the full introduction here.

Text Recognition is as simple as it gets. You pass in an image, Firebase processes that image and provides you the text it’s detected.

Note that since Firebase ML Kit’s cloud-based APIs are currently not production-ready, we’ll only be covering Text Recognition handled on the device (If you didn’t know, yes, you can choose to handle the text recognition process on the cloud… well, in good time you’ll be able to).

Add the Dependency and Metadata

implementation 'com.google.firebase:firebase-ml-vision:16.0.0'

As with any other Firebase Service, we’ll start by importing this dependency which is the same one used for all the ML Kit features.

<application ...>
  ...
  <meta-data
      android:name="com.google.firebase.ml.vision.DEPENDENCIES"
      android:value="text" />
  <!-- To use multiple models: android:value="text,model2,model3" -->
</application>

Although this is optional, it’s highly recommended to at this to your AndroidManifest.xml as well. Doing so will have the machine learning model downloaded along with your app in the Play Store. Otherwise, the model will be downloaded during the first ML request you make, at which point, you can’t get any results from ML operations before the model is downloaded.

Create the FirebaseVisionImage

FirebaseVisionImage image = FirebaseVisionImage.fromBitmap(bitmap);

This object will prepare the image for ML Kit processing. You can make a FirebaseVisionImage from a bitmap, media.Image, ByteBuffer, byte array, or a file on the device.

From Bitmap

The simplest way to do it. The above code will work as long as your image is upright.

From media.Image

Such as when taking a photo using your device’s camera. You’ll need to get the angle by which the image must be rotated to be turned upright, given the device’s orientation while taking a photo, and calculate that against the default camera orientation of the device (which is 90 on most devices, but can be different for other devices).

private static final SparseIntArray ORIENTATIONS = new SparseIntArray();
    static {
        ORIENTATIONS.append(Surface.ROTATION_0, 90);
        ORIENTATIONS.append(Surface.ROTATION_90, 0);
        ORIENTATIONS.append(Surface.ROTATION_180, 270);
        ORIENTATIONS.append(Surface.ROTATION_270, 180);
    }

private int getRotationCompensation(String cameraId) throws CameraAccessException {
        
        int deviceRotation = getWindowManager().getDefaultDisplay().getRotation();
        int rotationCompensation = ORIENTATIONS.get(deviceRotation);

        CameraManager cameraManager = (CameraManager) getSystemService(CAMERA_SERVICE);
        int sensorOrientation = cameraManager
                .getCameraCharacteristics(cameraId)
                .get(CameraCharacteristics.SENSOR_ORIENTATION);
        rotationCompensation = (rotationCompensation + sensorOrientation + 270) % 360;

        // Return the corresponding FirebaseVisionImageMetadata rotation value.
        int result;
        switch (rotationCompensation) {
            case 0:
                result = FirebaseVisionImageMetadata.ROTATION_0;
                break;
            case 90:
                result = FirebaseVisionImageMetadata.ROTATION_90;
                break;
            case 180:
                result = FirebaseVisionImageMetadata.ROTATION_180;
                break;
            case 270:
                result = FirebaseVisionImageMetadata.ROTATION_270;
                break;
            default:
                result = FirebaseVisionImageMetadata.ROTATION_0;
                Log.e(LOG_TAG, "Bad rotation value: " + rotationCompensation);
        }
        return result;
    }

private void someOtherMethod() {
    int rotation = getRotationCompensation(cameraId);
    FirebaseVisionImage image = FirebaseVisionImage.fromMediaImage(mediaImage, rotation);
}

Long method do make all those calculations, but it’s pretty copy-pastable. Then you can pass in the mediaImage and the rotation to generate your FirebaseVisionImage.

From ByteBuffer

FirebaseVisionImageMetadata metadata = new FirebaseVisionImageMetadata.Builder()
        .setWidth(1280)
        .setHeight(720)
        .setFormat(FirebaseVisionImageMetadata.IMAGE_FORMAT_NV21)
        .setRotation(rotation)
        .build();

FirebaseVisionImage image = FirebaseVisionImage.fromByteBuffer(buffer, metadata);

You’ll need the above (from media.Image) rotation method as well, on top of having to build the FirebaseVisionImage with the metadata of your image.

From File

FirebaseVisionImage image = FirebaseVisionImage.fromFilePath(context, uri);

Simple to present here in one line, but you’ll be wrapping this in a try-catch block.

Instantiate a FirebaseVisionTextDetector

FirebaseVisionTextDetector detector = FirebaseVision.getInstance()
        .getVisionTextDetector();

The actual text recognition method belongs to this object.

Call detectInImage

detector.detectInImage(image)
        .addOnSuccessListener(new OnSuccessListener<FirebaseVisionText>() {
            @Override
            public void onSuccess(FirebaseVisionText firebaseVisionText) {
                // Task completed successfully
                // ...
            }
        })
        .addOnFailureListener(new OnFailureListener() {
            @Override
            public void onFailure(@NonNull Exception e) {
                // Task failed with an exception
                // ...
            }
        });

Use the detector, call detectInImage, pass in the image, add success and failure listeners, the success listener has your text in a FirebaseVisionText object. The code above says it all really.

Extract the text from blocks of recognised text

for (FirebaseVisionText.Block block: firebaseVisionText.getBlocks()) {
    String blockText = block.getText();

    for (FirebaseVisionText.Line line: block.getLines()) {
        String lineText = line.getText();

        for (FirebaseVisionText.Element element: line.getElements()) {
            String elementText = element.getText();
        }
    }
}

In your onSuccess method, you’ll have access to a FirebaseVisionText object. This contains blocks of text which contains lines which of text which contain elements (words). Iterate through them and choose how you want to extract your text.

Conclusion

I love how easy this is to use. It gets weird when you have to do all that rotation work with media.Image and ByteBuffer, but even that’s just a copy-paste job.

This is only the first part of the ML Kit series so expect the tutorials on the other Firebase ML features to come in the following weeks!