Attention Model

Recent I have done some experiments using new models such as attention model for spoofing detection(still working on the BTAS challenge). It is hard to improve the result because the error rate for development set is quite small. But it should be useful for the speaker verification.

Current attention model for spoofing detection:

Supervector $sv_i=concat(O_i,C_i)$ where $O_i$ indicates the representation for current frame and $C_i$ represents the context vector(usually given by recurrent models).

For each utterance $s$ we defined the weight vector $w_s$ as
$w_{s,r}=\frac{\exp{Wsv_{s,r}v}}{\sum_i \exp{Wsv_{s,i}v}}$
The overall representation for $s$ equals to $\sum_r w_{s,r}sv_{s,r}$.