Not being able to communicate with loved ones is an unfortunate reality for some, such as recent stroke patients or mute individuals. Vocal imaging research performed by Professor Ahmed Sabbir Arif’s lab is contributing to a new form of communication that could help these afflicted populations communicate, even if they can’t audibly express themselves.
Arif runs the Human Computer Interaction Group where he and fourth-year doctoral student Laxmi Pandey recently developed a deep neural net framework that translates real-time magnetic resonance imaging (MRIs) of vocal tract shaping to text. The lab used a data set of MRI images that shows the vocal movements people make when speaking.
“The MRI images are of the vocal cords and surrounding areas where there are a lot of muscles that are used to produce speech,” Arif said. “The idea is that by processing these images, we are able to understand those who are unable to produce speech.”
The human-computer field is a collaborative one, with researchers and universities openly sharing their findings. Because of this, Arif was able to process thousands of MRI images from the University of Southern California on the campus’s cluster, or supercomputer. Arif’s lab taught the computer words and sentences based on the shown articulatory vocal patterns with an accuracy rate of 60 percent.
“We found incredible results considering this is the first time someone has done something like this,” Arif said of the study.
Professor Ahmed Sabbir Arif’s lab is contributing to a new form of communication that could people who are not able to audibly express themselves.
Pandey also studied how emotion and gender can affect the articulation of speech, finding surprising results.
“One of the findings that really grabbed my attention was that each sub-regions of the vocal tract are affected by both emotion and gender,” Laxmi said. “Most of the regions showed more distortion for high arousal emotions (anger and happiness) than low arousal emotion (sadness). Overall, for all emotions, female speakers had more noticeable changes in all regions.”
Laxmi presented the findings of the Human Computer Interaction Group’s study at SIGGRAPH, the premier conference for computer graphics and interactive techniques, earlier this month.
The overarching mission of Arif’s lab is to make computer systems accessible to everyone by developing intuitive, effective, and enjoyable input and interaction techniques. Much of his work centers on improving human relationships with computers to increase accessibility.
“We receive a lot of emails from people who have conditions and that inspires us,” Arif said. “People desire a way to communicate with other human beings.”
“We receive a lot of emails from people who have conditions and that inspires us,” Arif said. “People desire a way to communicate with other human beings.”
Arif’s recent work includes creating lip-reading software to enable people to send secure messages as well as helping blind people send text messages more accurately. Moving forward, he hopes his vocal imaging research will help other scientists in the human computer interaction field by laying a groundwork from which more advances to the technology can be made.
“When the technology becomes more affordable and less invasive, it could be used to input text and communicate with various computer systems. It can also enable users to interact with public displays and kiosks without contact, which is of a particular interest in global spread of infectious diseases, such as the current COVID-19 situation,” Laxmi said. “Most importantly, it could enable people with speech disorders, muteness, blindness and motor impairments to communicate with other individuals, and input text and interact with various computer systems, increasing their access to these technologies.”
An example of the can speech and emotion recognition can be found here.