{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# BAYa class Assignment 2025\n", "\n", "In this assignment, your task will be to implement and analyze inference for the Probabilistic linear discriminant analysis (PLDA) model. This model was described in the corresponding [slides from BAYa class](http://www.fit.vutbr.cz/study/courses/BAYa/public/slides/2-Graphical%20Models.pdf). You will accomplish this task by completing this Jupyter Notebook, which already comes with a code generating the training data and some plotting functions for presenting the results. If you do not have any experience with Jupyter Notebook, the easiest way to start is to install Anaconda3, run Jupyter Notebook, and open this notebook downloaded from [BAYa_Assignment2025_PLDA.ipynb](http://www.fit.vutbr.cz/study/courses/BAYa/public/notebooks/BAYa_Assignment2025_PLDA.ipynb). You can also find some inspiration and pieces of code to reuse in the other [Jupyter Notebooks provided for this class](http://www.fit.vutbr.cz/study/courses/BAYa/public/notebooks).\n", "\n", "The Notebook is organized as follows:\n", "1. First comes a cell with a code of functions that will be later used for presenting the results and the learned models. You can skip this cell first as the use of the functions will be demonstrated later.\n", "2. Next comes a code that \"handcrafts\" some parameters of the PLDA model and implements the generative process assumed by the PLDA model. The code generates some artificial training data that you will use for PLDA model training. Please carefully read this code and the comments around it.\n", "3. Through this notebook, there are cells with instructions to fill in your implementations around the PLDA model. There are also fields with other tasks to accomplish and questions to answer. \n", "\n", "**Do not edit the code in the following cell for generating and presenting the training data!**\n", " $$\n", "\\newcommand{\\E}{\\mathbb{E}}\n", "\\newcommand{\\aalpha}{\\boldsymbol{\\alpha}}\n", "\\newcommand{\\bbeta}{\\boldsymbol{\\beta}}\n", "\\newcommand{\\NN}{\\mathbf{N}}\n", "\\newcommand{\\ppi}{\\boldsymbol{\\pi}}\n", "\\newcommand{\\mmu}{\\boldsymbol{\\mu}}\n", "\\newcommand{\\SSigma}{\\boldsymbol{\\Sigma}}\n", "\\newcommand{\\llambda}{\\boldsymbol{\\lambda}}\n", "\\newcommand{\\diff}{\\mathop{}\\!\\mathrm{d}}\n", "\\newcommand{\\zz}{\\mathbf{z}}\n", "\\newcommand{\\ZZ}{\\mathbf{Z}}\n", "\\newcommand{\\XX}{\\mathbf{X}}\n", "\\newcommand{\\xx}{\\mathbf{x}}\n", "\\newcommand{\\YY}{\\mathbf{Y}}\n", "\\newcommand{\\NormalGamma}{\\mathcal{NG}}\n", "\\newcommand{\\Tr}{Tr}\n", "$$" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# Run this code! But there is no need to pay much attention to this cell at the first pass through the notebook\n", "\n", "#%matplotlib inline \n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import scipy.stats as sps\n", "\n", "\n", "def rand_gauss(n, mu, cov):\n", " \"\"\"\n", " Sample n data points from multivariate Gaussian distribution with mean mu and covariance cov\n", " \"\"\"\n", " return np.atleast_2d(sps.multivariate_normal.rvs(mu, cov, n))\n", "\n", "def logpdf_gauss(x, mu, cov):\n", " \"\"\"\n", " Evaluation of the log probability density function for multivariate Gaussian with mean mu and covariance cov\n", " \"\"\"\n", " return sps.multivariate_normal.logpdf(x, mu, cov)\n", " \n", "def gellipse(mu, cov, n=100, *args, **kwargs):\n", " \"\"\"\n", " Contour plot of 2D Multivariate Gaussian distribution.\n", "\n", " gellipse(mu, cov, n) plots ellipse given by mean vector MU and\n", " covariance matrix COV. Ellipse is plotted using N (default is 100)\n", " points. Additional parameters can specify various line types and\n", " properties. See description of matplotlib.pyplot.plot for more details.\n", " \"\"\"\n", " if mu.shape != (2,) or cov.shape != (2, 2):\n", " raise RuntimeError('mu must be a two element vector and cov must be 2 x 2 matrix')\n", "\n", " d, v = np.linalg.eigh(4 * cov)\n", " d = np.diag(d)\n", " t = np.linspace(0, 2 * np.pi, n)\n", " x = v @ np.sign(d) @ np.sqrt(np.abs(d)) @ np.array([np.cos(t), np.sin(t)]) + mu[:,np.newaxis]\n", " return plt.plot(x[0], x[1], *args, **kwargs)\n", "\n", "def probit(a):\n", " from scipy.special import erfinv\n", " return np.sqrt(2.0) * erfinv(2.0 * a - 1.0)\n", "\n", "def plot_det(tar, non, label=\"\",\n", " axis = [0.2, 40, 0.2, 80],\n", " xticks = [0.2, 0.5, 1, 2, 5, 10, 20, 35, 50, 65, 80],\n", " yticks = [0.2, 0.5, 1, 2, 5, 10, 20, 35, 50, 65, 80],\n", " **kwargs):\n", " \"\"\"\n", " plots DET curve \n", " \"\"\"\n", " tar = np.array(tar)\n", " non = np.array(non)\n", " ntrue=len(tar)\n", " nfalse=len(non)\n", " ntotal=ntrue+nfalse\n", "\n", " Pmiss=np.zeros(ntotal+1,np.float32) # 1 more for the boundaries\n", " Pfa=np.zeros_like(Pmiss)\n", "\n", " scores=np.zeros((ntotal,2),np.float32)\n", " scores[0:nfalse,0]=non\n", " scores[0:nfalse,1]=0\n", " scores[nfalse:ntotal,0]=tar\n", " scores[nfalse:ntotal,1]=1\n", " scores=scores[scores[:,0].argsort(),]\n", "\n", " sumtrue=np.cumsum(scores[:,1])\n", " sumfalse=nfalse - (np.arange(1,ntotal+1)-sumtrue)\n", "\n", " Pmiss[0]=float(ntrue-ntrue) / ntrue\n", " Pfa[0]=float(nfalse) / nfalse\n", " Pmiss[1:]=(sumtrue+ntrue-ntrue) / ntrue\n", " Pfa[1:]=sumfalse / nfalse\n", " \n", " idxeer=np.argmin(np.abs(Pfa-Pmiss))\n", " EER = 0.5*(Pfa[idxeer]+Pmiss[idxeer])*100\n", "\n", " plt.plot(probit(Pfa), probit(Pmiss), label=label + ' EER=%.2f%%' % EER, **kwargs)\n", " plt.xticks(probit(np.array(xticks)/100), xticks)\n", " plt.yticks(probit(np.array(yticks)/100), yticks)\n", " plt.axis(probit(np.array(axis)/100))\n", "\n", " plt.xlabel(\"FA [%]\", fontsize = 12)\n", " plt.ylabel(\"Miss [%]\", fontsize = 12)\n", " plt.grid(True)\n", " plt.legend(loc='upper left', bbox_to_anchor=(1, 1))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## PLDA generative process\n", "\n", "A PLDA model is often used to model speaker embeddings in the speaker verification context.\n", "Such embeddings are obtained by means of a neural network (i.e. ResNet, TDNN, etc.) which is trained for speaker classification.\n", "The neural network transforms variable-length input speech utterances into some fixed-length low-dimensional (i.e. 512, 1024) vector representations (e.g. the embeddings are the output of a hidden layer of the neural network).\n", "\n", "The PLDA model assumes the following two-step generative process for the embeddings (our observations):\n", "\n", "1.\n", "\\begin{equation}\n", "{\\mathbf{z}_s} \\sim \\mathcal{N}(\\mathbf{z}_s;\\boldsymbol{\\mu},\\boldsymbol{\\Sigma}_{ac}) \\quad \\text{for } s=1, \\dots, S\n", "\\end{equation}\n", "\n", "where, $\\mathbf{z}_s$ is the continuous latent random variable representing the speaker-specific mean for speaker $s$, $\\boldsymbol{\\mu}$ is the global speaker mean, $\\boldsymbol{\\Sigma}_{ac}$ is the across-class (across-speaker) covariance matrix.\n", "\n", "\n", "2.\n", "\\begin{equation}\n", "{\\mathbf{x}_{sn}} \\sim \\mathcal{N}(\\mathbf{x}_{sn};\\mathbf{z}_{s},\\boldsymbol{\\Sigma}_{wc}) \\quad \\text{for } n=1, \\dots, N_s\n", "\\end{equation}\n", "\n", "where, $\\mathbf{x}_{sn}$ is the continuous random variable representing observations specific to speaker $s$ (per-speaker embeddings), $N_s$ is the number of observations (embeddings) for spearker $s$, $\\mathbf{z}_s$ is the mean for speaker $s$, and $\\boldsymbol{\\Sigma}_{wc}$ is the within-class (within-speaker) covariance matrix, which is shared among (the same for) all speakers.\n", "\n", "\n", "Therefore, we assume that $S$ speaker means were generated from a Gaussian distribution $\\mathcal{N}(\\mathbf{z}_s;\\boldsymbol{\\mu},\\boldsymbol{\\Sigma}_{ac})$, and then $N_s$ embeddings were generated for each of such speakers from the Gaussian distribution $\\mathcal{N}(\\mathbf{x}_{sn};\\mathbf{z}_{s},\\boldsymbol{\\Sigma}_{wc})$. This process can also be visulized in the Bayesian Network shown below.\n", "\n", "Obviously, this assumption is something we make up when defining our model, as the embeddings were generated by the neural network, and not by such PLDA model." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "\n", "
| \n",
" \n",
" | \n",
" \n",
" \n",
" | \n",
"