使用Camera2接口获取depth图像

简介:本文使用Camera2接口获取depth图像,并对其进行处理,对于学习“depth image”的童鞋有很好的入门和借鉴作用。文章的代码位于https://github.com/plluke/tof

Working with the 3D Camera on the Samsung S10 5G

原文:https://medium.com/swlh/working-with-the-3d-camera-on-the-samsung-s10-5g-4782336783c

“Say, that’s a nice new Samsung S10 5G device your user has got there,” he said with the least amount of subtlety he could muster. “It would be a shame if a cloud-based video conferencing service that also provided a great Android app didn’t use the 3D camera to blur the user’s background so that it provides more privacy,” he threatened. I personally think we should all cower before imaginary techno-mobsters so let’s dive right in.

Background (Pun Intended)

The concept of a “Privacy Mode” or background blur is pretty well understood. The visual effect is similar to bokeh but the business value is privacy, preventing leaking information, and overall visual ambience. Something like this:

使用Camera2接口获取depth图像

使用Camera2接口获取depth图像

The key is generating a mask that separates the areas you want to blur and the areas you don’t. Intuitively, if you knew the distance of each pixel in the image you could do generate this mask but distance isn’t the only approach. You could also use a trained neural network to distinguish foreground and background without any distance information at all. But that’s a different post.

We’re here to play around with the 3D Camera (hereafter ToF camera) on the Samsung S10 5G. Why? Because it’s there and it’s useful to evaluate all the tools at your disposal. The example app/code I used for this post is available on GitHub.

What is Time of Flight?

使用Camera2接口获取depth图像

使用Camera2接口获取depth图像

Time-of-Flight technology refers to measuring distance to a point by tracking the time it takes for a beam of light to travel to that point. Speed of light is constant so once you have the time, you also have the distance. A Time-of-Flight Camera is a system that can track distance over a sensor area using the Time-of-Flight principle. There are different ways of figuring out the elapsed time(the S10 5G uses phase-shift detection on an infrared carrier wave, 940nm iirc) but the fundamental theory remains the same. There are pros and cons of this approach versus other popular approaches (e.g. Structured Light as used in Apple’s True Depth Camera) but for our purposes, it’s just another source of distance data.

The ToF Camera

The front-facing ToF sensor on the Samsung S10 5G is a Sony IMX316. It outputs frames in the DEPTH16 image format with a resolution of 240x180. It has a 75° field of view, which roughly matches the S10 5G’s front-facing camera’s field of view of 80°.

Watch out: The S10 5G (and Note10+ 5G as well) returns two cameras through the Camera2 API. Both the cameras are actually derived from the same sensor and the 6.5MP camera is just a crop of the 10MP camera. If you want to actually implement the mask yourself, make sure to use frames from the 10MP camera.

You can find the ToF camera through CameraCharacteristics. Here’s an example:

for (String camera : cameraManager.getCameraIdList()) {
    CameraCharacteristics chars = cameraManager.getCameraCharacteristics(camera);
    final int[] capabilities = chars.get(CameraCharacteristics.REQUEST_AVAILABLE_CAPABILITIES);
    boolean facingFront = chars.get(CameraCharacteristics.LENS_FACING) == CameraMetadata.LENS_FACING_FRONT;
    boolean depthCapable = false;
    for (int capability : capabilities) {
        boolean capable = capability == CameraMetadata.REQUEST_AVAILABLE_CAPABILITIES_DEPTH_OUTPUT;
        depthCapable = depthCapable || capable;
    }
    if (depthCapable && facingFront) {
        // Note that the sensor size is much larger than the available capture size
        SizeF sensorSize = chars.get(CameraCharacteristics.SENSOR_INFO_PHYSICAL_SIZE);
        Log.i(TAG, "Sensor size: " + sensorSize);

        // Since sensor size doesn't actually match capture size and because it is
        // reporting an extremely wide aspect ratio, this FoV is bogus
        float[] focalLengths = chars.get(CameraCharacteristics.LENS_INFO_AVAILABLE_FOCAL_LENGTHS);
        if (focalLengths.length > 0) {
            float focalLength = focalLengths[0];
            double fov = 2 * Math.atan(sensorSize.getWidth() / (2 * focalLength));
            Log.i(TAG, "Calculated FoV: " + fov);
        }
        return camera;
    }
}

Once you have the camera, you can open it like any other camera. Since DEPTH16is not a great format for a direct preview, we’ll want to attach an ImageReader to a preview session and read frames from there directly.

Extracting Range Information

Once you have an image of DEPTH16 format, each pixel will give you both a range (distance) and a confidence measure. The DEPTH16 documentation tells you exactly what to do but here is an example of generating an int[]mask based on an Image.

private int[] getDepthMask(Image image) {
    ShortBuffer shortDepthBuffer = image.getPlanes()[0].getBuffer().asShortBuffer();
    int[] mask = new int[WIDTH * HEIGHT];
    for (int y = 0; y < HEIGHT; y++) {
        for (int x = 0; x < WIDTH; x++) {
            int index = y * WIDTH + x;
            short depthSample = shortDepthBuffer.get(index);
            int newValue = extractRange(depthSample, 0.1);
            mask[index] = newValue;
        }
    }
}

private int extractRange(short sample, float confidenceFilter) {
    int depthRange = (short) (sample & 0x1FFF);
    int depthConfidence = (short) ((sample >> 13) & 0x7);
    float depthPercentage = depthConfidence == 0 ? 1.f : (depthConfidence - 1) / 7.f;
    return depthPercentage > confidenceFilter ? depthRange : 0;
}

You can try to filter out higher confidence levels but for the privacy blur feature, I found that it was better to let all confidence values through (except 0) and then do a bit of signal processing afterwards. Setting the confidence minimum higher reduces overall noise somewhat but removes too much useful information.

Visualizing Range Information

I have a bug in my brain where I can’t easily visualize an int[] to save my life. I need that #tradlife ARGB. So let’s convert the mask to something that looks good!

The approach here is to simply normalize the range to values between 0 and 255 and then assign that to the green channel of an ARGB pixel. Since I only really care about a section of the foreground, I’m going to clamp the ranges to an arbitrary min/max value and then scale everything else down. (In a real implementation, a FaceDetection routine would be useful as a way to hone in on an area of the mask to establish your min/max.) Here’s an example:

private int normalizeRange(int range) {
    float normalized = (float)range - RANGE_MIN;
    // Clamp to min/max
    normalized = Math.max(RANGE_MIN, normalized);
    normalized = Math.min(RANGE_MAX, normalized);
    // Normalize to 0 to 255
    normalized = normalized - RANGE_MIN;
    normalized = normalized / (RANGE_MAX - RANGE_MIN) * 255;
    return (int)normalized;
}

Once normalized, simply create a bitmap and loop through and assign the colors:

private Bitmap convertToRGBBitmap(int[] mask) {
    Bitmap bitmap = Bitmap.createBitmap(WIDTH, HEIGHT, Bitmap.Config.ARGB_4444);
    for (int y = 0; y < HEIGHT; y++) {
        for (int x = 0; x < WIDTH; x++) {
            int index = y * WIDTH + x;
            bitmap.setPixel(x, y, Color.argb(255, 0, mask[index],0));
        }
    }
    return bitmap;
}

Once you have the bitmap, you can render it onto a TextureView:

Canvas canvas = textureView.lockCanvas();
canvas.drawBitmap(bitmap, transform, null);
textureView.unlockCanvasAndPost(canvas);

The frame will come out in landscape orientation so make sure to rotate it to fit into the view with an appropriate Matrix (see example app for details). Once you’ve done all that, you get a preview.

使用Camera2接口获取depth图像

使用Camera2接口获取depth图像

Nice!

上一篇:Luat Demo | 一文读懂,如何使用Cat.1开发板实现Camera功能


下一篇:Camera2之CameraManager类