Firebase ML Kit 2: Face Detection

Here’s the second part of the ML Kit series and its going to be Face Detection! You pass in an image and you can get the coordinates of each face’s eyes, ears, etc. and recognise facial expression like people’s sweet smiles!

Or you can pass in a video then track and manipulate people’s faces in real-time (in the video of course, real life will have to wait). We won’t dive too much into the video part of it yet (I’ll try to get this updated ASAP to include it) so we’ll be focusing on image face detection for now.

If this is the first time you’ve heard about the Firebase ML Kit, check out its introduction here.

Add the Dependencies and Metadata

implementation 'com.google.firebase:firebase-ml-vision:16.0.0'

As with any other Firebase Service, we’ll start by importing this dependency which is the same one used for all the ML Kit features.

<application ...>
  ...
  <meta-data
      android:name="com.google.firebase.ml.vision.DEPENDENCIES"
      android:value="face" />
  <!-- To use multiple models: android:value="face,model2,model3" -->
</application>

Although this is optional, it’s highly recommended to at this to your AndroidManifest.xml as well. Doing so will have the machine learning model downloaded along with your app in the Play Store. Otherwise, the model will be downloaded during the first ML request you make, at which point, you can’t get any results from ML operations before the model is downloaded.

Configuring Face Detection Settings

There’s a few settings you might want to configure based on your app’s needs. I’m just going to rip this table straight off of the official docs.

Settings
Detection mode	FAST_MODE (default) \| DEFAULT_MODE Favor speed or accuracy when detecting faces.
Detect landmarks	NO_LANDMARKS (default) \| ALL_LANDMARKS Whether or not to attempt to identify facial “landmarks”: eyes, ears, nose, cheeks, mouth.
Classify faces	NO_CLASSIFICATIONS (default) \| ALL_CLASSIFICATIONS Whether or not to classify faces into categories such as “smiling”, and “eyes open”.
Minimum face size	FLOAT (default: 0.1f) The minimum size, relative to the image, of faces to detect.
Enable face tracking	false (default) \| true Whether or not to assign faces an ID, which can be used to track faces across images.

And here’s an example also ripped straight off of the official docs

FirebaseVisionFaceDetectorOptions options =
        new FirebaseVisionFaceDetectorOptions.Builder()
                .setModeType(FirebaseVisionFaceDetectorOptions.ACCURATE_MODE)
                .setLandmarkType(FirebaseVisionFaceDetectorOptions.ALL_LANDMARKS)
                .setClassificationType(FirebaseVisionFaceDetectorOptions.ALL_CLASSIFICATIONS)
                .setMinFaceSize(0.15f)
                .setTrackingEnabled(true)
                .build();

Create the FirebaseVisionImage

Here’s where my interpretation of the article starts differing from the official docs, although this first step is identical to that of Text Recognition, so if you’ve read how to create a FirebaseVisionImage from there, this step will be EXACTLY the same as the one there.

FirebaseVisionImage image = FirebaseVisionImage.fromBitmap(bitmap);

This object will prepare the image for ML Kit processing. You can make a FirebaseVisionImage from a bitmap, media.Image, ByteBuffer, byte array, or a file on the device.

From Bitmap

The simplest way to do it. The above code will work as long as your image is upright.

From media.Image

Such as when taking a photo using your device’s camera. You’ll need to get the angle by which the image must be rotated to be turned upright, given the device’s orientation while taking a photo, and calculate that against the default camera orientation of the device (which is 90 on most devices, but can be different for other devices).

private static final SparseIntArray ORIENTATIONS = new SparseIntArray();
    static {
        ORIENTATIONS.append(Surface.ROTATION_0, 90);
        ORIENTATIONS.append(Surface.ROTATION_90, 0);
        ORIENTATIONS.append(Surface.ROTATION_180, 270);
        ORIENTATIONS.append(Surface.ROTATION_270, 180);
    }

private int getRotationCompensation(String cameraId) throws CameraAccessException {
        
        int deviceRotation = getWindowManager().getDefaultDisplay().getRotation();
        int rotationCompensation = ORIENTATIONS.get(deviceRotation);

        CameraManager cameraManager = (CameraManager) getSystemService(CAMERA_SERVICE);
        int sensorOrientation = cameraManager
                .getCameraCharacteristics(cameraId)
                .get(CameraCharacteristics.SENSOR_ORIENTATION);
        rotationCompensation = (rotationCompensation + sensorOrientation + 270) % 360;

        // Return the corresponding FirebaseVisionImageMetadata rotation value.
        int result;
        switch (rotationCompensation) {
            case 0:
                result = FirebaseVisionImageMetadata.ROTATION_0;
                break;
            case 90:
                result = FirebaseVisionImageMetadata.ROTATION_90;
                break;
            case 180:
                result = FirebaseVisionImageMetadata.ROTATION_180;
                break;
            case 270:
                result = FirebaseVisionImageMetadata.ROTATION_270;
                break;
            default:
                result = FirebaseVisionImageMetadata.ROTATION_0;
                Log.e(LOG_TAG, "Bad rotation value: " + rotationCompensation);
        }
        return result;
    }

private void someOtherMethod() {
    int rotation = getRotationCompensation(cameraId);
    FirebaseVisionImage image = FirebaseVisionImage.fromMediaImage(mediaImage, rotation);
}

Long method do make all those calculations, but it’s pretty copy-pastable. Then you can pass in the mediaImage and the rotation to generate your FirebaseVisionImage.

From ByteBuffer

FirebaseVisionImageMetadata metadata = new FirebaseVisionImageMetadata.Builder()
        .setWidth(1280)
        .setHeight(720)
        .setFormat(FirebaseVisionImageMetadata.IMAGE_FORMAT_NV21)
        .setRotation(rotation)
        .build();

FirebaseVisionImage image = FirebaseVisionImage.fromByteBuffer(buffer, metadata);

You’ll need the above (from media.Image) rotation method as well, on top of having to build the FirebaseVisionImage with the metadata of your image.

From File

FirebaseVisionImage image = FirebaseVisionImage.fromFilePath(context, uri);

Simple to present here in one line, but you’ll be wrapping this in a try-catch block.

Instantiate a FirebaseVisionFaceDetector

FirebaseVisionFaceDetector detector = FirebaseVision.getInstance()
        .getVisionFaceDetector(options);

The actual face recognition method belongs to this object.

Call detectInImage

detector.detectInImage(image).addOnSuccessListener(
                new OnSuccessListener<List<FirebaseVisionFace>>() {
                    @Override
                    public void onSuccess(List<FirebaseVisionFace> firebaseVisionFaces) {
                        // Task completed successfully
                        // ...
                    }
                })
                .addOnFailureListener(new OnFailureListener() {
                    @Override
                    public void onFailure(@NonNull Exception e) {
                        // Task failed with an exception
                        // ...
                    }
                });

Use the detector, call detectInImage, pass in the image, add success and failure listeners, the success listener gives you a list of FirebaseVisionFaces. The code above says it all really.

What you can do with each FirebaseVisionFace

Here’s where the fun begins… All the following code assumes you loop through the FirebaseVisionFaces and are currently handling an object called face

Get Face coordinates and rotation

Rect bounds = face.getBoundingBox();
float rotY = face.getHeadEulerAngleY();  // Head is rotated to the right rotY degrees
float rotZ = face.getHeadEulerAngleZ();  // Head is tilted sideways rotZ degrees

Get Facial Landmark Positions (Requires Landmark Detection enabled)

FirebaseVisionFaceLandmark leftEar = face.getLandmark(FirebaseVisionFaceLandmark.LEFT_EAR);
if (leftEar != null) {
    FirebaseVisionPoint leftEarPos = leftEar.getPosition();
}

Identify Facial Expressions (Requires Face Classification enabled)

if (face.getSmilingProbability() != FirebaseVisionFace.UNCOMPUTED_PROBABILITY) {
    float smileProb = face.getSmilingProbability();
}

Get Face Tracking ID (Requires Face Tracking enabled)

if (face.getTrackingId() != FirebaseVisionFace.INVALID_ID) {
    int id = face.getTrackingId();
}