silence-of-the-lambs
This page is generated via scout. For now it just shows the project README.
What if we do silence detection before sending data to our ASR?
Download audios
cd ./audios
aria2c --input-file urls
Old ASR predictions are in ./data/pred.old.csv. Metrics are:
eevee asr ./data/true.csv ./data/pred.old.csv
:: stanza not found
Value Support
Metric
WER 0.791417 2787
Utterance FPR 1.000000 1120
Utterance FNR 0.000000 1667
New ASR predictions are in ./data/pred.csv.
eevee asr ./data/true.csv ./data/pred.csv
:: stanza not found
Value Support
Metric
WER 0.525268 2787
Utterance FPR 0.298214 1120
Utterance FNR 0.108578 1667
Now let's patch with silence detection
python ./scripts/silence.py label-silence ./audios --output-csv=./data/pred.silence.csv
python ./scripts/silence.py patch-predictions ./data/pred.old.csv ./data/pred.silence.csv --output-csv=./data/pred.old.patched.csv
After patching
eevee asr ./data/true.csv ./data/pred.old.patched.csv
:: stanza not found
Value Support
Metric
WER 0.703150 2787
Utterance FPR 0.820536 1120
Utterance FNR 0.030594 1667
New predictions
python ./scripts/silence.py patch-predictions ./data/pred.csv ./data/pred.silence.csv --output-csv=./data/pred.patched.csv
After patching
eevee asr ./data/true.csv ./data/pred.patched.csv
:: stanza not found
Value Support
Metric
WER 0.522398 2787
Utterance FPR 0.293750 1120
Utterance FNR 0.112178 1667
Not much gain. We should try changing the silence detection method. Also many
silences in true labels were actually non English utterances so ¯\_(ツ)_/¯