silence-of-the-lambs



This page is generated via scout. For now it just shows the project README.

What if we do silence detection before sending data to our ASR?

Download audios

cd ./audios
aria2c --input-file urls

Old ASR predictions are in ./data/pred.old.csv. Metrics are:

eevee asr ./data/true.csv  ./data/pred.old.csv 
:: stanza not found
                  Value  Support
Metric                          
WER            0.791417     2787
Utterance FPR  1.000000     1120
Utterance FNR  0.000000     1667

New ASR predictions are in ./data/pred.csv.

eevee asr ./data/true.csv  ./data/pred.csv 
:: stanza not found
                  Value  Support
Metric                          
WER            0.525268     2787
Utterance FPR  0.298214     1120
Utterance FNR  0.108578     1667

Now let's patch with silence detection

python ./scripts/silence.py label-silence ./audios --output-csv=./data/pred.silence.csv
python ./scripts/silence.py patch-predictions ./data/pred.old.csv ./data/pred.silence.csv --output-csv=./data/pred.old.patched.csv

After patching

eevee asr ./data/true.csv  ./data/pred.old.patched.csv 
:: stanza not found
                  Value  Support
Metric                          
WER            0.703150     2787
Utterance FPR  0.820536     1120
Utterance FNR  0.030594     1667

New predictions

python ./scripts/silence.py patch-predictions ./data/pred.csv ./data/pred.silence.csv --output-csv=./data/pred.patched.csv

After patching

eevee asr ./data/true.csv  ./data/pred.patched.csv 
:: stanza not found
                  Value  Support
Metric                          
WER            0.522398     2787
Utterance FPR  0.293750     1120
Utterance FNR  0.112178     1667

Not much gain. We should try changing the silence detection method. Also many silences in true labels were actually non English utterances so ¯\_(ツ)_/¯