Optimal Transport and Brain Imaging

· 2 min read

Brain imaging studies often use different atlases, and that creates a hidden reproducibility problem: models trained in one atlas do not transfer cleanly to another without expensive reprocessing. This post explains a practical workaround based on optimal transport.

The storyline is simple: why atlas mismatch blocks downstream analysis, how we cast atlas translation as a transport problem, and what our MICCAI 2021 method enables when raw data cannot be rerun through a full pipeline.

Functional connectomes from fMRI are a good example of this challenge. Resting-state and task scans are short, preprocessing is costly, and atlas choices differ across groups. In MICCAI 2021, Amin Karbasi, Dustin Scheinost, and I proposed a data-driven transport method to estimate missing-atlas time series without rerunning full preprocessing.

Figure

The main idea is pretty simple; optimal transport is defined over probability distributions with different supports and possibly different geometries. Suppose there are $m$ sources $x_1, \ldots, x_m $ for a commodity, with $a(x_i)$ units of supply at $x_i$ and $n$ sinks $y_1, \ldots, y_n$ for the commodity, with the demand $b(y_j)$ at $y_j$. If $a(x_i,\ y_j)$ is the unit cost of shipment from $x_i$ to $y_j$, find a flow that satisfies demand from supplies and minimizes the flow cost.

Similarly, we define a transportation problem between $\mu$, the activity levels of $n$ brain regions based on a source atlas, and $\nu$, the $m$ target-atlas values. Once we have an optimal mapping between $\mu$ and $\nu$ based on a given cost matrix $M$, transportation comes for free. Therefore, we use training data between these two atlases to learn the mapping and then use it as a universal plan to obtain $\nu$ for individuals whose target-atlas values are unavailable. This process repeats across time frames to construct the final time series data. Using this algorithm, people don’t need to rerun all preprocessing steps to obtain target-atlas time series data. This method particularly shines when raw data are not available for reasons including privacy or storage limits.