2001: A Space Odyssey, Back to the Future, A. I. If you watched these movies in the past, you may have thought the technologies impossible to create, especially by 2020. Now we see that some of the predictions for technology have come to fruition and even surpassed what humans thought possible. It may be a little frightening to know the powers of artificial intelligence, but there may be benefits to embracing technology, especially for those with hearing loss.
Audio AI: Looking to listen at the cocktail party
Back in August 2018, Google presented their new audio-visual separation model called the “Looking to listen at the cocktail party.” They hoped to tap into the natural human instinct to focus in on one conversation and drown out other voices, similar to what happens at a cocktail party. With their new creation, a user can watch a video with two voices speaking simultaneously and select the image of the voice they want to hear. This way, their speakers will broadcast one voice, not both.
The method Google uses to intake sounds, separate them, and release one over the other could in theory be programmed into hearing aids to improve their technology. Hearing loss often makes it harder to discern who is speaking, and it can be challenging to focus on one voice in a crowd, so the integration of the technology into hearing aids would help comprehension of sounds. The method could also benefit speech recognition and enhancement in videos or closed captioning in videos with multiple speakers.
Lip-reading artificial intelligence
Imagine teaching technology how to understand the human voice. Not just that, but mouth movements as well. Researchers are now coding artificial intelligence to watch videos and determine the words being said, all based on lip-reading.
In order to complete this task, researchers sifted through 140,000 hours of YouTube videos to pull out 27,000 words and their accompanying mouth video. Video clips with poor sound or video quality, a surplus of background noise, or a lack of speech were deleted in order to focus on teaching the A.I. what words looked like when spoken.
Coders then taught the artificial intelligence what video accompanies which words and completed the reverse process so the intelligence could predict words based on video. They were even able to monitor the movement in videos to determine the exact wavelength of a voice so the reproduction of it would sound similar. This is helpful in an instance where a video’s sound suddenly cuts out. The A.I. can fill in the gaps of a fuzzy conversation without your ear even noticing a shift in speaker.
Previous technologies working to accomplish the same task had an accuracy rate of 33%. This new A.I. has the ability to match 59% of words. The odds are improving! Once perfected, this technology could act as a translator or even dub over silent footage.
What’s next for the future of hearing and technology? We’ll have to wait and see.