Loading...
「ツール」は右上に移動しました。
利用したサーバー: wtserver1
8いいね 115 views回再生

(Arabic) Exploring How Neural Networks Understand Speaker Identity | Gasser Elbanna

How do machines recognize who is speaking? In this talk, Gasser Elbanna dives into how speech-based neural network models represent speaker identity. While humans intuitively recognize voices, capturing the same in artificial systems remains a major challenge—especially with variations across and within speakers.

This study explores self-supervised models (SSMs) — including generative, predictive, and contrastive models — alongside traditional supervised models and handcrafted acoustic features. By analyzing how these models handle changes in acoustic, phonemic, prosodic, and linguistic features, the team reveals key insights into model interpretability and the parallels with human voice perception.

Whether you're interested in machine learning, neuroscience, or speech processing, this talk sheds light on the frontiers of understanding speaker identity through deep learning.

00:00 speaker’s career intro
02:50 Talk outline
04:06 Speech representation
11:22 Learning paradigms
30:03 Speaker identity perception
55:40 Takeaways

コメント