Firebase ML Kit 5: Landmark Recognition

When an app can recognise all sorts of known landmarks, it can add a whole new level of experience and immersion. It doesn’t only need to apply to tourism-related apps. Say for example, you have a books app and you stumble across a library. You can take a picture of it and let ML Kit recognise the place, then tell the user whether a certain book can be found in that library.

One thing about this though is that this particular service is only available as a Cloud Vision API, and not a regular On-Device API. That means at the very least, you’ll need to upgrade to a Blaze plan to use this feature.

This is the 5th post in the ML Kit series. If this is your first time hearing about ML Kit, check out the introduction here.

Add the Dependency

implementation 'com.google.firebase:firebase-ml-vision:16.0.0'

Add this dependency to your app-level build.gradle file. Unlike the other ML Kit features, you won’t need to add any metadata to your AndroidManifest.xml file.

Configuring the Landmark Detector (Optional)

FirebaseVisionCloudDetectorOptions options =
    new FirebaseVisionCloudDetectorOptions.Builder()
        .setModelType(FirebaseVisionCloudDetectorOptions.LATEST_MODEL)
        .setMaxResults(15)
        .build();

There are two settings you can change. The ModelType which is STABLE by default, or the MaxResults which is 10 by default.

Create the FirebaseVisionImage

The first step to most ML Kit operations is to get a FirebaseVisionImage which you can get from a bitmap, media.Image, ByteBuffer, byte[], or a file on the device.

From Bitmap

FirebaseVisionImage image = FirebaseVisionImage.fromBitmap(bitmap);

Your image must be upright for this to work. This would normally be the simplest way to get a FirebaseVisionImage.

From media.Image

Such as when taking a photo using your device’s camera. You’ll need to get the angle by which the image must be rotated to be turned upright, given the device’s orientation while taking a photo, and calculate that against the default camera orientation of the device (which is 90 on most devices, but can be different for other devices).

private static final SparseIntArray ORIENTATIONS = new SparseIntArray();
    static {
        ORIENTATIONS.append(Surface.ROTATION_0, 90);
        ORIENTATIONS.append(Surface.ROTATION_90, 0);
        ORIENTATIONS.append(Surface.ROTATION_180, 270);
        ORIENTATIONS.append(Surface.ROTATION_270, 180);
    }
 
private int getRotationCompensation(String cameraId) throws CameraAccessException {
        
        int deviceRotation = getWindowManager().getDefaultDisplay().getRotation();
        int rotationCompensation = ORIENTATIONS.get(deviceRotation);
 
        CameraManager cameraManager = (CameraManager) getSystemService(CAMERA_SERVICE);
        int sensorOrientation = cameraManager
                .getCameraCharacteristics(cameraId)
                .get(CameraCharacteristics.SENSOR_ORIENTATION);
        rotationCompensation = (rotationCompensation + sensorOrientation + 270) % 360;
 
        // Return the corresponding FirebaseVisionImageMetadata rotation value.
        int result;
        switch (rotationCompensation) {
            case 0:
                result = FirebaseVisionImageMetadata.ROTATION_0;
                break;
            case 90:
                result = FirebaseVisionImageMetadata.ROTATION_90;
                break;
            case 180:
                result = FirebaseVisionImageMetadata.ROTATION_180;
                break;
            case 270:
                result = FirebaseVisionImageMetadata.ROTATION_270;
                break;
            default:
                result = FirebaseVisionImageMetadata.ROTATION_0;
                Log.e(LOG_TAG, "Bad rotation value: " + rotationCompensation);
        }
        return result;
    }
 
private void someOtherMethod() {
    int rotation = getRotationCompensation(cameraId);
    FirebaseVisionImage image = FirebaseVisionImage.fromMediaImage(mediaImage, rotation);
}

Long method do make all those calculations, but it’s pretty copy-pastable. Then you can pass in the mediaImage and the rotation to generate your FirebaseVisionImage.

From ByteBuffer

FirebaseVisionImageMetadata metadata = new FirebaseVisionImageMetadata.Builder()
        .setWidth(1280)
        .setHeight(720)
        .setFormat(FirebaseVisionImageMetadata.IMAGE_FORMAT_NV21)
        .setRotation(rotation)
        .build();
 
FirebaseVisionImage image = FirebaseVisionImage.fromByteBuffer(buffer, metadata);

You’ll need the above (from media.Image) method to get the rotation, then use this method to build the FirebaseVisionImage with the metadata of your image.

From File

FirebaseVisionImage image = FirebaseVisionImage.fromFilePath(context, uri);

Simple to present here in one line, but you’ll be wrapping this in a try-catch block.

Instantiate FirebaseVisionCloudLandmarkDetector

FirebaseVisionCloudLandmarkDetector detector = FirebaseVision.getInstance()
        .getVisionCloudLandmarkDetector(options);

Leave the parameters empty if you haven’t configured any options.

Call detectInImage

Task<List<FirebaseVisionCloudLandmark>> result = detector.detectInImage(image)
        .addOnSuccessListener(new OnSuccessListener<List<FirebaseVisionCloudLandmark>>() {
            @Override
            public void onSuccess(List<FirebaseVisionCloudLandmark> firebaseVisionCloudLandmarks) {
                // Task completed successfully
                // ...
            }
        })
        .addOnFailureListener(new OnFailureListener() {
            @Override
            public void onFailure(@NonNull Exception e) {
                // Task failed with an exception
                // ...
            }
        });

Use the detector, call detectInImage, add success and failure listeners, and in the success method you have access to a list of the FirebaseVisionCloudLandmarks from which you can get information about each landmark. The code above says it all really.

Parsing Landmark Information

for (FirebaseVisionCloudLandmark landmark: firebaseVisionCloudLandmarks) {

    Rect bounds = landmark.getBoundingBox();
    String landmarkName = landmark.getLandmark();
    String entityId = landmark.getEntityId();
    float confidence = landmark.getConfidence();

    // Multiple locations are possible, e.g., the location of the depicted
    // landmark and the location the picture was taken.
    for (FirebaseVisionLatLng loc: landmark.getLocations()) {
        double latitude = loc.getLatitude();
        double longitude = loc.getLongitude();
    }
}

In the OnSuccess method, you have access to a list of landmarks. Loop through these landmarks, and in each one, you can get the Bounding Box, Confidence Level, Entity ID, Landmark, and Location (of which there may be one than one).

Conclusion

I’m disappointed myself that currently this feature only works off of the Blaze plan. I do hope they’ll make an on-device version of this service so that we can use it with a spark plan as well.

If you want to learn more about ML Kit operations like Text Recognition, Face Detection, Barcode Scanning, and Image Labelling, check out the rest of my ML Kit series.