The beauty of computer vision is that it allows us to perform a wealth of useful inferential tasks with wide applicability: gesture recognition, face detection, object classification, and many others. However, most of these tasks require the full image as the input. This can pose privacy problems. Suppose you have a smart mirror in your bathroom, and you would like it to be able to perform face recognition to customize itself depending on whether it is you or your roommate in the bathroom. You probably would not want it to have a camera!
My goal is to design sensors that can perform some computer vision tasks but not others. For example, can we build a sensor that can perform face recognition but cannot (ever) reconstruct a recognizable image? Can we build a sensor that can detect if there is a face in the room but never be able to tell whose face it is? Such sensors will provide the benefits of computer vision without the compromise in privacy. Ultimately, this will give rise to secure systems for IoT, surveillance, smart homes, and others.
FlatCam is a lensless imaging system designed by replacing the optical lens of a camera with a thin mask of apertures placed directly atop the image sensor along with an image reconstruction algorithm. Such a design allows for very thin (< 1mm) and very cheap (≅ $5 in mass production) cameras. This becomes especially valuable for regimes such as the Internet of Things, wherein one desires to attach cameras to multiple objects to be able to perform inference tasks such as identifying faces or recognizing gestures. The low cost allows this to be done widely in a cost-effective manner, and the thin form-factor allows easy installment in a variety of surfaces and objects. What remains is to develop algorithms that can handle the non-idealities (i.e. noise) of lensless systems and successfully perform inference tasks.
We have taken the first step towards this goal by experimentally demonstrating reasonable accuracy for the tasks of performing face detection and face verification with FlatCam using deep learning techniques. In particular, using a single convolutional neural network, we obtain 93% on the standard LFW test protocol for face verification using the FlatCam (compared to 97.6% on lens-based images with the same method). Another end-to-end CNN we have trained gives roughly an 11% decrease in face detection accuracy on the common FDDB test protocol when compared to lens-based images.
As cameras evolve towards becoming information gatherers rather than purely artistic devices, lensless systems such as the FlatCam allow ubiquitous inference with a low price tag and an ultra-thin package.
See original paper of FlatCam here.