About – Javid Dadashkarimi

With a passion for AI-driven solutions in medical imaging, I focus on developing cutting-edge tools that elevate diagnostic precision and advance patient care. My research has been featured in some of the leading journals and conferences, including Nature, Molecular Psychiatry, MICCAI, and COLING. In addition to medical imaging, I bring expertise in natural language processing, particularly in machine translation and information retrieval, where I also hold a patent. You can explore my personal projects on GitHub. I am currently available for freelance opportunities in prototyping, consulting, and technical advisory roles. For collaboration inquiries, feel free to reach out – let’s connect!

Also here is my updated Linkedin profile, academic X page, Google Scholar.

Acronyms Explained

DL	Deep Learning
NLP	Natural Language Processing
MRI	Magnetic Resonance Imaging
ECG	Electrocardiography
AF	Atrial Fibrillation
CLIR	Cross Lingual Information Retrieval
YINS	Yale Institute of Network Sciences

Selected Projects

I work on a variety of medical image analysis problems and develop tools that help researchers and doctors better understand the body and brain. These tools are designed to work on different types of images — from fetal MRI scans to brain scans of animals and full-body medical images.

One tool I developed helps find and extract the brain from fetal MRI scans. This is hard to do because the fetus moves a lot and the brain keeps changing as it grows. Our method works by scanning through the image step by step and locating just the brain.

I’ve also worked with animal brain scans, especially from pigs. Pig brains are similar to human brains and are used in research on traumatic brain injury (TBI). I built tools to automatically extract the brain from these images so researchers can study injury effects and potential treatments more easily.

Beyond the brain, I also build tools for body organ segmentation. This means teaching AI to find and outline organs like the liver, lungs, and kidneys in CT or MRI scans. These tools are useful for tracking diseases, measuring organ size, and preparing for surgery.

Another important area I focus on is tractography, which maps the brain's internal wiring — called white matter tracts. I build models that group these tracts into bundles doctors recognize. This is especially important for patients with brain tumors or TBI, where knowing the exact location of brain connections helps avoid damage during treatment.

A lot of my work uses machine learning, which needs large amounts of labeled training data. But in medicine, it's often hard to get enough real data. To solve this, I use methods like Gaussian Mixture Models (GMM) or Hidden Markov Random Fields (HMRF) to generate label maps — which are used to create unlimited synthetic training images and build more reliable AI tools.

Some of my projects are:

Title	Institution	Date	Topic
Cascade U-Nets	MGH/Harvard	2024/10/27	DL for Segmentation
This project implements a hybrid 3D, multi-scale model for fetal brain extraction from MRI scans using a cascade of U-Net architectures. Using a Breadth-Fine Search (BFS) and Deep-Focused Sliding Window (DFS) approach, our framework achieves precise segmentation results in full-uterus stack-of-slices scans, even with limited training data and annotations. Our method employs a cascade of four models—A, B, C, and D—trained to focus on 3D patches of decreasing sizes from a synthesized set of training images augmented with random geometric shapes. At inference, we pool sliding-window predictions across multiple sizes to detect and refine the region of interest (ROI) that most likely contains the fetal brain. Subsequently, we derive an accurate brain mask through voxel-wise majority voting. Pipeline Overview In the BFS stage, Models A and D generate probability maps \( P_A \) and \( P_D \) across the full input volume \( I \), producing a coarse brain mask from the largest connected components across maps. We then perform a bounding box fit on this mask to extract the 3D ROI \( R \subset I \). Figure 1: BFS-DFS Pipeline for fetal brain segmentation. In the DFS stage, Models B, C, and D progressively refine \( R \) through sliding-window passes, each stage narrowing the ROI and updating a binary mask \( S_{final} \) via majority voting across maps \( S_B, S_C, \) and \( S_D \), leading to an optimized fetal brain mask with minimized false positives. Label Maps Training involves generating synthetic shapes around the brain label maps, which represent human-annotated segmentations of the fetal brain. To create synthetic images from these label maps, we utilized the following parameters, indicating the standard deviation for each parameter: Table 1: Model parameters and sliding window configurations. Here are two examples illusterating this augmentation where labels 1 to 7 point to brain structures and others belong to the background: Figure 2: Example of synthetic shapes surrounding the brain label map to simulate diverse training conditions. Training Data Here are two examples of training data derived from the label maps: one for Model A, which is approapriate for the largest patches in the image, and another for Model D, which targets the smallest patches. Figure 2: Example of synthetic images derived from the label maps for model \( A \) and model \( D \). Key Advantages: False Positive Reduction: BFS and DFS steps progressively refine the region of interest, minimizing false positives. Adaptability to Sparse Data: Trained on synthetic data, the model performs well even with limited annotations. Precision Across Variability: Multi-scale sliding windows capture diverse head orientations and anatomy for accurate segmentation. High Accuracy: Outperforms existing methods, improving Dice scores by up to 5% on second-trimester and EPI scans. Reliable Localization: The initial localization process ensures accurate brain segmentation, allowing for finer navigation within the womb and reducing the risk of segmenting non-relevant areas. This repository is an invaluable resource for medical imaging problems, particularly for body organ segmentation. It provides a robust tool for addressing challenges related to limited training data and sparse annotations. View on GitHub
ECG2AF	Broad Institute	2024/10/29	AI for AF Prediction
ECG2AF Model Web Application This application demonstrates our ECG2AF tool, a clinical AI model designed to predict the risk of developing atrial fibrillation (AF) from ECG data. We have shut down the online demo temporarily but to run this app locally, place `app.py`, `ecg_model.py`, and the required folders inside the `app/` directory. Ensure that `Dockerfile` and `requirements.txt` are in the parent directory. I developed this project using editors like `vim` and `VS Code`, and also Copilot for some function autocompletion. The app was (will be) deployed on an `EC2` instance (Linux `t2.medium`), with an attached 32 GB `gp3` volume for additional storage. Additionally, We’ve set the Nginx file upload limit to 500 MB to accommodate larger files. Background AI is widely used in clinical applications to improve risk stratification and intervention. This project focuses on cardiovascular disease, aiming to predict AF risk based on ECG data. Objective Our objective is to allow users to: Upload an ECG file (`.hd5` format) Process the uploaded ECG using a pre-trained ECG2AF model Display the prediction results, including four output values How It Works Upload your file(s) in `.hd5` format by dragging and dropping it into the upload area. Process with ECG2AF: The model processes the file in real-time. View Predictions: The app displays four predictions for each ECG file, including AF risk and demographic estimations. You’ll also see charts for better interpretation. Prediction Outputs AF Risk: Estimated risk for developing atrial fibrillation. Sex Prediction (Male/Female): Probability-based estimate of biological sex. Age Prediction: Estimated age based on ECG data. AF In Read (Yes/No): Classification for AF presence in the read. Scalability Considerations Batch Processing: Process multiple ECG files simultaneously. Database and Caching: Store previously uploaded files separately for each user. Cloud Deployment and Load Balancing: AWS’s Elastic Load Balancer (ELB) is used for additional scaling. Visualization and Results The charts in the application are created using Chart.js. Each chart presents predictions related to the uploaded .hd5 files, including AF risk, sex prediction, and age prediction values. Adding Nginx to Flask with Docker Nginx acts as a reverse proxy for our Flask app in Docker, handling HTTP requests on port `80` and forwarding them to Flask on port 5000. Benefits of Using Nginx Security: Nginx hides the Flask server from direct internet exposure. Load Balancing: Manages traffic efficiently and scales easily. Static File Handling: Serves static files faster, reducing Flask's load. Production-Ready: Handles multiple users reliably, unlike Flask's dev server. If you have any questions, feel free to email me. View on GitHub
SEQ2SQL	YINS/YALE	2024/10/29	Text to SQL
What’s the Scoop? This project is all about turning natural language questions into SQL commands. We’ve got a model that uses LSTM and GRU units to make sense of your questions and turn them into SQL. Think of it as chatting with your database! Datasets We Played With We trained on some cool datasets: ATIS: Questions about flights. GEO880: Geographic info with 880 examples. Overnight: Covers 11 fun domains like sports and restaurants. Toy: A small dataset with 600 examples. WikiSQL: A solid 1,000 examples for generating SQL queries. Query Examples Check out some killer examples: What’s the description of a CH-47D Chinook? `SELECT col1 FROM table_1_10006830_1 WHERE col0='CH-47D Chinook';` What’s the max gross weight of the Robinson R-22? `SELECT col2 FROM table_1_10006830_1 WHERE col0='Robinson R-22';` What school did player number 6 come from? `SELECT col5 FROM table_1_10015132_1 WHERE col1='6';` How It Works So, how does it all come together? The model learns to associate pairs of natural language questions with their corresponding SQL commands, determining the best SQL query based on the input question. I accomplished this during my first year of grad school at Yale! I’m thrilled to see that my repository has received 93 stars and 23 forks on GitHub so far. If you have any questions, feel free to email me. View on GitHub
EM4QT	University of Tehran	2016/10/29	CLIR
Dictionary-based Query Translation in CLIR So, what’s the deal with this project? It’s all about improving the query language model by using top-ranked documents in a cross-lingual information retrieval (CLIR) setting. We use the Information derived from both source and target language documents, pulling them based on user queries. Query Translation Idea The main idea behind EM4QT is to grab relevant documents in both languages. When someone asks a question in their native language, we kick things off by retrieving a set of documents that match that query. This gives us our initial batch of source language documents. However, at the same time, we whip up an initial guess of translation candidates using a bilingual dictionary. Once we have both sets of documents, we fine-tune our translation candidates based on the top retrieved documents. This back-and-forth process makes sure we consider the most relevant and contextually appropriate translations, boosting the accuracy of our cross-lingual information retrieval. Expectation-Maximization Algorithm The EM algorithm is all about upping the translation quality by leveraging those top-retrieved documents in both languages. It boils down to two main steps: E-Step: We estimate the translation confidence probabilities using the documents we’ve retrieved. M-Step: Next, we maximize the likelihood of these probabilities, refining the translation model as we go. This iterative refinement keeps going until we reach convergence, resulting in a more accurate translation distribution that enhances the overall quality of our cross-lingual retrieval. The full detail of the algorithm is published at IP&M Journal. If you have any questions, feel free to email me. View on GitHub

Education

Year	Degree	Field	Institution	Location
2023	PhD	Computer Science	Yale University	USA
2015	MEng	Software Engineering	University of Tehran	Iran

Research Interests

I am broadly interested in medical image analysis, machine learning, Church Doors predictive modeling, and data-driven methods and precision medicine, with a focus on:.

-	Brain Segmentation
-	Fetal Brain Extraction
-	Functional Connectomics and Brain-behavior Associations
-	Optimal Transport and Data-driven Methods for Conectomics Data
-	High-Performance Computing

Honors and Scholarships

Year	Award	Details
2022	Best Paper Award	Graphs in Biomedical Imaging, MICCAI (url)
2022	MICCAI Student Award	Early Acceptance, Singapore
2021	Brain Initiative Trainee Award	Flash talk url
2020	Best Poster Award	Connectomics for NeuroImaging, MICCAI (url)
2012	Graduate School Fellowship	Ranked 5th in Undergraduate Studies, University of Tehran
2008	Ranked 331st/220,000	Top 0.2% in Iran's Nationwide University Entrance Exam

Javid Dadashkarimi

Selected Projects

Pipeline Overview

Label Maps

Training Data

Key Advantages:

ECG2AF Model Web Application

Background

Objective

How It Works

Prediction Outputs

Scalability Considerations

Visualization and Results

Adding Nginx to Flask with Docker

Benefits of Using Nginx

What’s the Scoop?

Datasets We Played With

Query Examples

How It Works

Dictionary-based Query Translation in CLIR

Query Translation Idea

Expectation-Maximization Algorithm

Education

Research Interests

Honors and Scholarships

Javid
Dadashkarimi