# Mercer's Theorem

Given data points $(X_1,Y_1)$ , $(X_2,Y_2)$ , .. , and $(X_n,Y_n)$ where we want to predict $Y$ via design matrix $X$.

Massachusetts General Hospital Postdoctoral Research Fellow

Harvard Medical School PhD, Computer Science

Yale University

Given data points $(X_1,Y_1)$ , $(X_2,Y_2)$ , .. , and $(X_n,Y_n)$ where we want to predict $Y$ via design matrix $X$.

Here we will talk about basic concepts of regression in sparse data.

We discussed the EM algorithm on Gaussian Mixture models in the previous post.

Ph.D. students studying machine learning on the human brain must conduct massive experiments on brain imaging datasets.

How to model data with a probability distribution? For example, if data looks like a circle or symmetric, we may model it with a Gaussian distribution:

Bayesian Inference what can we do with posterior distribution $p(\theta|x)$ that we can not do with point estimate: For example we can estimate by $\theta_{\text{MAP}} = \arg \max_{\theta} p(\theta|x)$

Empirical Risk Minimization: Given $(x_1,y_1), .., (x_n,y_n) \in X \times y$ and we want to predict $f(x)=y$.

Maximum Likelihood Estimation: Suppose that we have a probabilistic model of generating random data $x \in \mathcal{X}$ derived from function $P_{\theta}(x)$ that is parametrized by $\theta$.

Let’s estimate how many ice creams insomnia cookies in New Haven will sell this Fall.

Searching for optimal transport encourages a mapping that minimizes the total cost of transporting mass from $\alpha$ to $\beta$, as initially formulated by Monge (1781).

© iid.yale.edu | Yale University 2023