logo
logo

Supplementary Material

Please allow a few seconds for the page to fully load

  1. Comparison with Audio-Visual Speech Enhancement Methods
  2. Comparison with Audio-Only Speech Enhancement Methods
  3. Example video clips from ASPIRE Corpus



Comparison with Audio-Visual Speech Enhancement Methods



Gabbay, Aviv, et al. "Seeing through noise: Visually driven speaker separation and enhancement." 2018 IEEE ICASSP and Ephrat, Ariel et al. "Vid2speech: speech reconstruction from silent video." 2017 IEEE ICASSP
Source: https://www.youtube.com/watch?v=qmsyj7vAzoI



Comparison with
Ephrat, Ariel, et al. "Looking to listen at the cocktail party: a speaker-independent audio-visual model for speech separation." ACM Transactions on Graphics (TOG) 37.4 (2018): 112 and Hou, Jen-Cheng, et al. "Audio-visual speech enhancement using multimodal deep convolutional neural networks." IEEE Transactions on Emerging Topics in Computational Intelligence
Source: https://youtu.be/rVQVAPiJWKU



Comparison with Audio-only Speech Enhancement Methods



Comparison with spectral subtraction, linear minimum mean-squared-error, SEGAN using real noisy ASPIRE Corpus



Comparison with Pascual et.al. SEGAN: Speech Enhancement Generative Adversarial Network." Proc. Interspeech 2017, LogMMSE, and Wiener Filter
Source: http://veu.talp.cat/seganp/



Samples from ASPIRE Corpus