Abstract
This study proposes a secure and effective lips-reading system that can accurately detect lips movements, even when face masks are worn. The system utilizes radio frequency (RF) sensing and ultra-wideband (UWB) radar technology, which overcomes the challenges posed by traditional vision-based systems. By leveraging deep learning models, the system interprets lips and mouth movements and achieves an overall accuracy of 90% for both mask-on and mask-off scenarios. The study utilized a trusted dataset from the University of Glasgow (UoG), consisting of spectrograms of lips motions stating five vowels and a voiceless class from distinct participants. The cutting-edge deep learning algorithm, Residual Neural Network (ResNet50), was used for the evaluation of the dataset and achieved an 87% accurate detection rate with a mask-on scenario, which is a 14% improvement compared to prior published work. The findings of this study contribute to the development of a robust lips-reading framework that can enhance communication accessibility in applications such as hearing aids, voice-controlled systems, biometrics, and more.
Original language | English |
---|---|
Pages (from-to) | (In-Press) |
Number of pages | 8 |
Journal | IEEE Sensors Journal |
Volume | (In-Press) |
Early online date | 31 Aug 2023 |
DOIs | |
Publication status | E-pub ahead of print - 31 Aug 2023 |
Bibliographical note
© 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Copyright © and Moral Rights are retained by the author(s) and/ or other copyright owners. A copy can be downloaded for personal non-commercial research or study, without prior permission or charge. This item cannot be reproduced or quoted extensively from without first obtaining permission in writing from the copyright holder(s). The content must not be changed in any way or sold commercially in any format or medium without the formal permission of the copyright holders.
This document is the author’s post-print version, incorporating any revisions agreed during the peer-review process. Some differences between the published version and this version may remain and you are advised to consult the published version if you wish to cite from it.
Funder
This work is supported in parts by EPSRC grant (EP/W037076/1).Keywords
- ResNet50
- InceptionV3
- VGG16
- RF sensing
- UWB radar
- lips-reading
- speech recognition
ASJC Scopus subject areas
- Instrumentation
- Electrical and Electronic Engineering