To a person with low vision or is visually impaired, what difference do video descriptions make? This was the question that inspired Mutsuki Ishii to pursue her research on the automatic generation of video descriptions (sometimes called audio descriptions).
Born and raised in Nagaoka (Niigata prefecture), 21-year old Mutsuki Ishii joined the Information and Management System Engineering Department at the Nagaoka University of Technology. Armed with her interest in programming and machine learning, she is now working on research that aims to address the gaps in video accessibility using automatic generation of video descriptions designed for the visually impaired and those who cannot see videos adequately.
The Nagaoka Review (NR) spoke with Mutsuki Ishii (MI) to understand what makes her research and approach different, her future goals, and aspirations.
MI: I am currently enrolled in the Information and Management System Engineering program at NUT, where I am researching on automatic generation of video descriptions. Essentially, we want to incorporate natural language sentences to the contents of any given video.
In our research, our goal is to be able to analyze specific videos using video data that is available to anyone. This is still in its infancy and right now, our team is diligently working on improving the accuracy of our model. I began studying about this theme since April of 2021 with my laboratory members. And by next month (December 2021), we will be organizing a workshop about this topic.
How ABC and SBS are making popular television programs more accessible to blind audiences through video descriptions. Image Source: ABC
MI: As we all know, streaming services have been on the rise and will continue to do so as big players such as Netflix, Amazon, HBO, and Disney (to name a few) continue to roll out new and exclusive content, unique programming, local content, etc. Even individuals produce their own video through platforms such as Youtube.
The current challenge is the availability of video descriptions that can describe what viewers see on their screen. Millions of videos / contents are being created every day, but the video descriptions are still catching up. They are not as much available and there are concerns about accuracy. We believe that generating automatic video descriptions will be beneficial to people who are hearing or visually impaired.
MI: I believe that there’s so much we can do in this area given the growth of the streaming industry. More and more users get connected to the Internet by the day. And more and more users are consuming video content. I want to contribute to this research so everyone can have equal access.
On a personal level, I have always wanted to study programming and machine learning. We have seen machine learning applied in static images. But when it comes to video, we can say that it is still in its infancy. Adding descriptions to videos can be quite complicated because there are many elements that must be considered such as the video frames, events, actions, objects, etc.
MI: There’s a book called, “Do Androids Dream of Electric Sheep?” and that made me interested in AI and robots. I also learned about the Turing Test for the first time from that book.
MI: As we all know, NUT is one of the top universities when it comes to developing students with technical expertise in electronics and communications technology. But what I do like about the university is that it also focuses on the application of ICT to create innovative management and social systems. And in order to make sure that the students have the expertise, knowledge, and skills, the university provides a variety of internship, exchange programs, and practical trainings.
When I found that NUT even had a language training course in Australia for 5 weeks, it gave me a challenger’s mindset. I believe that more than developing a range of skills and acquiring new knowledge, overseas exposure can be life-changing and can enhance my intercultural skills, confidence, and curiosity. I have lived in Nagaoka since I was born. Nagaoka is home, a familiar city, but sometimes, I also long to live in a foreign county.
MI: Onsen! I always miss a hot bath whenever I travel abroad. It is a unique aspect of Japanese culture and heritage.
MI: As a matter of fact, there is. This month, I am going to participate in an online internship at Bucharest Robots, a robotics company that sells and rents humanoid service robots in Romania. At the same time, they also have capabilities in robotic process automation (RPA) and other task automation software. Right now, I am leaning towards RPA as I would like to develop automatic document processes in hospitals. I know that people who work in hospitals are swamped with corona-related paperworks, and I would like to contribute to the future work of hospitals.
This is actually my first time to participate in an internship and I am looking forward to having a conversation with the local people in Romania. It would be interesting to experience an actual work environment and develop systems with my own ideas. I also hope to improve my communication and speaking skills in English from this experience.
I plan to go to graduate school. But of course, I want to complete my present research, and after that, I want to integrate my study into the products of a company for as long as it is suitable for high-speed processing of large amount of video data.
Be sure to come to Japan! Then, try unique Japanese cultures. They will probably change your perspective of the world, help you understand other people’s way of thinking, and love your own culture much more.
Playing billiards and traveling abroad.