CochleaNet Supplementary Material : Audio-Visual Speech Enhancement

Supplementary Material

Please allow a few seconds for the page to fully load

Comparison with Audio-Visual Speech Enhancement Methods
Comparison with Audio-Only Speech Enhancement Methods
Example video clips from ASPIRE Corpus

Comparison with Audio-Visual Speech Enhancement Methods

Gabbay, Aviv, et al. "Seeing through noise: Visually driven speaker separation and enhancement." 2018 IEEE ICASSP and Ephrat, Ariel et al. "Vid2speech: speech reconstruction from silent video." 2017 IEEE ICASSP
Source: https://www.youtube.com/watch?v=qmsyj7vAzoI

Comparison with
Ephrat, Ariel, et al. "Looking to listen at the cocktail party: a speaker-independent audio-visual model for speech separation." ACM Transactions on Graphics (TOG) 37.4 (2018): 112 and Hou, Jen-Cheng, et al. "Audio-visual speech enhancement using multimodal deep convolutional neural networks." IEEE Transactions on Emerging Topics in Computational Intelligence
Source: https://youtu.be/rVQVAPiJWKU

Comparison with Audio-only Speech Enhancement Methods

Comparison with spectral subtraction, linear minimum mean-squared-error, SEGAN using real noisy ASPIRE Corpus

Comparison with Pascual et.al. SEGAN: Speech Enhancement Generative Adversarial Network." Proc. Interspeech 2017, LogMMSE, and Wiener Filter
Source: http://veu.talp.cat/seganp/

Samples from ASPIRE Corpus