Speaker Diarization

Ms. Apoorva Iyer; Ms. Deepika Kini; Mrs. Shanthi Therese

doi:https://doi.org/10.14445/22312803/IJCTT-V67I9P110

Research Article | Open Access | Download PDF

Volume 67 | Issue 9 | Year 2019 | Article Id. IJCTT-V67I9P110 | DOI : https://doi.org/10.14445/22312803/IJCTT-V67I9P110

Speaker Diarization

Ms. Apoorva Iyer , Ms. Deepika Kini , Mrs. Shanthi Therese

Citation :

Ms. Apoorva Iyer , Ms. Deepika Kini , Mrs. Shanthi Therese, "Speaker Diarization," International Journal of Computer Trends and Technology (IJCTT), vol. 67, no. 9, pp. 50-54, 2019. Crossref, https://doi.org/10.14445/22312803/IJCTT-V67I9P110

Abstract

Speaker Diarization is the task of determining ‘who spoke when?’.Speaker Diarization uses unsupervised as well as supervised approaches to detect the change of speaker in the temporal dimension. This paper primarily describes the implementation of Speaker Diarization using Neural Networks (a supervised method). First a summary of the clustering algorithms is given. Then the three approaches using neural networks is specified. They are Speaker Diarization using Artificial Neural Networks, Recurrent Neural Networks and Adaptive Long Short Term Memory or Multiple LSTMs. Finally the accuracy is calculated and the results are compared.

Keywords

Artificial Neural Network, Recurrent Neural Networks, LSTM, MFCC

References

[1] https://towardsdatascience.com/speaker-diarization-with-kaldi- e30301b05cc8
[2] Xavier Anguera, Simon Bozonnet, Nicholas Evans, Corinne Fredouille, Gerald Friedland, Oriol Vinyals, Speaker Diarization: A Review of Recent Research, First draft submitted to the IEEE, 19th August, 2010.
[3] Speaker Diarization for Meeting Room Audio Hanwu Sun, Tin Lay Nwe, Bin Ma and Haizhou Li
[4] Arun Chandhandrasekhar, Shashankar Sudarsan “AUTOMATIC SPEAKER DIARIZATION USING MACHINE LEARNING TECHNIQUES”
[5]http://practicalcryptography.com/miscellaneous/machine-learning/guid e-mel-frequency-cepstral-coefficients-mfccs/
[6] Speaker Diarization using Deep Recurrent Convolutional Neural Networks for Speaker Embeddings Pawe? Cyrta1 , Tomasz Trzcinski ´ 1,2 , Wojciech Stokowiec 1,3 1 Tooploox, Poland, 2 Warsaw University of Technology, Poland, 3 Polish-Japanese Academy of Information Technology, Poland
[7]https://www.isca-speech.org/archive_open/archive_papers /iscslp2006/ B11.pdf https://en.wikipedia.org/wiki/Mel-frequency_cepstrum