Skip to content

Firebase ML Kit 4: Image Labelling

Ever tried out Google Lens and saw the app just tell you everything the camera sees?

That’s exactly what Image Labelling is. You pass in an image and in return you get a list of FirebaseVisionCloudLabel each of which contains the label, the confidence (how sure ML Kit is of its correctness), and the entityID which references to the entity’s id in Google’s Knowledge Graph.

If this is your first time seeing Firebase ML Kit, I recommend you check out my introduction on it to get a nice little overview of the whole thing.

The process of running this in your app’s code is fairly simple, and it’s very similar to the other ML Kit operations.

Add the Dependencies and Metadata

implementation 'com.google.firebase:firebase-ml-vision:16.0.0'
implementation 'com.google.firebase:firebase-ml-vision-image-label-model:15.0.0'

Add these dependencies to your app-level build.gradle file.

<application ...>
  ...
  <meta-data
      android:name="com.google.firebase.ml.vision.DEPENDENCIES"
      android:value="label" />
  <!-- To use multiple models: android:value="label,model2,model3" -->
</application>

Then add this meta-data tag to your AndroidManifest.xml file. This ensures the ML Kit data model is downloaded along with your app, otherwise the model will be downloaded when the operation is run for the first time which tends to slow it down.

Setting the Confidence Level

FirebaseVisionLabelDetectorOptions options =
        new FirebaseVisionLabelDetectorOptions.Builder()
                .setConfidenceThreshold(0.8f)
                .build();

By using a FirebaseVisionLabelDetectorOptions you can set the minimum confidence level needed for an entity to be returned. By default, this value is 0.5.

Get the FirebaseVisionImage

The first step to most ML Kit operations is to get a FirebaseVisionImage which you can get from a bitmapmedia.ImageByteBufferbyte[], or a file on the device.

From Bitmap

FirebaseVisionImage image = FirebaseVisionImage.fromBitmap(bitmap);

Your image must be upright for this to work. This would normally be the simplest way to get a FirebaseVisionImage.

From media.Image

Such as when taking a photo using your device’s camera. You’ll need to get the angle by which the image must be rotated to be turned upright, given the device’s orientation while taking a photo, and calculate that against the default camera orientation of the device (which is 90 on most devices, but can be different for other devices).

private static final SparseIntArray ORIENTATIONS = new SparseIntArray();
    static {
        ORIENTATIONS.append(Surface.ROTATION_0, 90);
        ORIENTATIONS.append(Surface.ROTATION_90, 0);
        ORIENTATIONS.append(Surface.ROTATION_180, 270);
        ORIENTATIONS.append(Surface.ROTATION_270, 180);
    }
 
private int getRotationCompensation(String cameraId) throws CameraAccessException {
        
        int deviceRotation = getWindowManager().getDefaultDisplay().getRotation();
        int rotationCompensation = ORIENTATIONS.get(deviceRotation);
 
        CameraManager cameraManager = (CameraManager) getSystemService(CAMERA_SERVICE);
        int sensorOrientation = cameraManager
                .getCameraCharacteristics(cameraId)
                .get(CameraCharacteristics.SENSOR_ORIENTATION);
        rotationCompensation = (rotationCompensation + sensorOrientation + 270) % 360;
 
        // Return the corresponding FirebaseVisionImageMetadata rotation value.
        int result;
        switch (rotationCompensation) {
            case 0:
                result = FirebaseVisionImageMetadata.ROTATION_0;
                break;
            case 90:
                result = FirebaseVisionImageMetadata.ROTATION_90;
                break;
            case 180:
                result = FirebaseVisionImageMetadata.ROTATION_180;
                break;
            case 270:
                result = FirebaseVisionImageMetadata.ROTATION_270;
                break;
            default:
                result = FirebaseVisionImageMetadata.ROTATION_0;
                Log.e(LOG_TAG, "Bad rotation value: " + rotationCompensation);
        }
        return result;
    }
 
private void someOtherMethod() {
    int rotation = getRotationCompensation(cameraId);
    FirebaseVisionImage image = FirebaseVisionImage.fromMediaImage(mediaImage, rotation);
}

Long method do make all those calculations, but it’s pretty copy-pastable. Then you can pass in the mediaImage and the rotation to generate your FirebaseVisionImage.

From ByteBuffer

FirebaseVisionImageMetadata metadata = new FirebaseVisionImageMetadata.Builder()
        .setWidth(1280)
        .setHeight(720)
        .setFormat(FirebaseVisionImageMetadata.IMAGE_FORMAT_NV21)
        .setRotation(rotation)
        .build();
 
FirebaseVisionImage image = FirebaseVisionImage.fromByteBuffer(buffer, metadata);

You’ll need the above (from media.Image) method to get the rotation, then use this method to build the FirebaseVisionImage with the metadata of your image.

From File

FirebaseVisionImage image = FirebaseVisionImage.fromFilePath(context, uri);

Simple to present here in one line, but you’ll be wrapping this in a try-catch block.

Instantiate a FirebaseVisionLabelDetector

FirebaseVisionLabelDetector detector = FirebaseVision.getInstance()
        .getVisionLabelDetector(options);

The detector contains the detectInImage method which is the main function of the process. Pass in your options to set the confidence level, or leave the parameters empty if you’re fine with the default value.

Call detectInImage

Task<List<FirebaseVisionLabel>> result =
        detector.detectInImage(image)
                .addOnSuccessListener(
                        new OnSuccessListener<List<FirebaseVisionLabel>>() {
                            @Override
                            public void onSuccess(List<FirebaseVisionLabel> labels) {
                                // Task completed successfully
                                // ...
                            }
                        })
                .addOnFailureListener(
                        new OnFailureListener() {
                            @Override
                            public void onFailure(@NonNull Exception e) {
                                // Task failed with an exception
                                // ...
                            }
                        });

Use the detector, call detectInImage, add success and failure listeners, and in the success method you have access to a list of the FirebaseVisionLabels from which you can get information. The code above says it all really.

Get the Information from each label

for (FirebaseVisionLabel label: labels) {
    String text = label.getLabel();
    String entityId = label.getEntityId();
    float confidence = label.getConfidence();
}

Each FirebaseVisionLabel in the list represents a different entity detected in the image. These are the 3 pieces of information you can get from each.

Conclusion

I’ll be honest, this is the one part of ML Kit where I feel like it could have a little more. How about if we could get the coordinates of each entity (like we do in every other ML Kit operation), we could do some pretty sick stuff with that.

If you want to learn more about ML Kit operations like Text Recognition, Face Detection, and Barcode Scanning, check out the rest of my ML Kit series.