To Know (and how) to know
In a world of rushing people and blurred faces (and not just the metaphorical kind), the Inter IIT Tech Meet 10.0 panel on Bosch's Age and Gender Estimation turned many heads. A pioneer in the much-needed growing field of crime-based videographic evidence and ways to make retrieving it more efficient both human resource-wise and concerning time management, it is an intelligent approach. The panel was highly educational and exciting. In an interview with interview Mr. Samar Pratap Singh, a 3rd year IIT Kharagpur electrical engineering undergraduate who was an associate manager for the Tech Meet, following the panel as well, he was able to provide an even deeper insight into the project.
At its core, Bosch's Age and Gender Estimation algorithm involves essentially what the name suggests - an algorithm to figure out the age and gender of persons from a particular video; a process that involves extracting images from videos, using the Image Upsampler to produce high-resolution versions of them, and cropping faces and processing them through the clarifier to detect the ages and genders of the people in them respectively. "These days, you can see a similar feature in many newer smartphones," Samar explained, giving the example of the Apple Facelock. Its technology can not just detect the same face but is being used to predict the users' age and other attributes. "What Bosch approached this project with is a goal to estimate these attributes in videos obtained from surveillance cams or government drones, where the difficulty is exponentially greater due to the poor quality of images." And this would be precisely why such an algorithm would be constructive, and Bosch was tackling the problem head-on, and ultimately, why their team chose this project in the first place.
To be able to perform image processing, the first step was to obtain images from the videos. The first task was any video made up of several frames and separated by freezing. Since the resolution of images in these videos posed a significant problem, that was the next element of the solution. The photos collected by freezing frames of the video were passed through the Image Upsampler - ESRGAN, or Enhanced Super-Resolution Generative Adversarial Networks. This network reconstructs higher resolution images from lower resolution ones and is widely used for image enhancement. The process uptill now seems straightforward enough, but the complexity comes with the actual functioning. "Developing a pipeline was the real challenge. You can't just hand a person a video, ask them to take screenshots from videos, write a code themselves to upload certain images, match the screenshots with them, etc. There needs to be a smooth process and a feasible tradeoff between speed and accuracy - hence, a pipeline." Samar summarised, emphasising the varying balance depending on the application. Bosch, for example, focuses on accuracy - that is, greater time frames (for example, 15 FPS) at the cost of taking greater time even, a tradeoff that was possible because the algorithm didn't need real-time outputs as of yet. Further, he explained how teams in the competition focused on accuracy for the same reason and, overall, did a fantastic job. The dataset was generated from videos of their friends and sports matches, and the results were incredible.
Talking about setbacks, the greatest was acquiring a dataset of significant size to test and operate the algorithm. From the competition's perspective, it would've been more helpful had the company provided this themselves because it's challenging to find easily permissible-to-use surveillance footage online. Even if it's found, the persons in it aren't necessarily documented alongside. Having such a comprehensive dataset would also have helped model the algorithm better because there might be adjustments to make between videos obtained from different settings. Due to these reasons, the final video that the participants used was not from a CCTV camera but rather from a much clearer one. Thus, Samar concluded, "If this algorithm is to be scaled to greater heights, and even be used for national security purposes eventually, a viable dataset would greatly help."
Last, we talked about the varied field of applications and the project's future. Bosch Age and Gender Estimation have the potential to make a lot of change. Currently, it's a big task to draw evidence even at the level of security camera footage manually. Often, the faces aren't visible - but further at a larger scale, this algorithm could even have military-grade benefits. "It's a work in progress. But with the growing demand for machine learning and various fields of image processing, the only way to go is up."
Several of India's best machine learning enthusiasts and "scientists" came to this Inter IIT tech meet and participated in competitions such as these. At the end of the interview, Samar reiterated that it was amazing to see the brightest minds of the country work in teams on something as fascinating as this. There were also various companies there to witness these events and panels, and there were even talks of giving out internships based on competition performances. Overall, the IIT Kharagpur organised tech meet was a vastly successful affair, consisting of many incredible projects and people. And similar to the future of Bosch's algorithm, the only way it's going is up.