Fetus Ultrasound Anomaly Detection Research

A Baseline Experiment Report on Reconstruction-Based Anomaly Detection

Our First Baseline Experiment: Research Overview

This report presents the findings of a foundational baseline experiment investigating the application of deep generative models for anomaly detection in prenatal fetal ultrasound imaging. The objective is to establish the efficacy of Autoencoder (AE) and Variational Autoencoder (VAE) architectures in learning the distribution of normal fetal anatomy and subsequently identifying deviations indicative of potential anomalies. This approach is particularly pertinent in domains where anomalous data is scarce and highly heterogeneous, such as congenital heart defects, enabling a robust screening mechanism by flagging images that fall outside the learned manifold of normality.

Models Explored

Autoencoder (AE): A neural network trained for unsupervised feature learning by reconstructing its input. In anomaly detection, a significantly higher reconstruction error for an input image, relative to a learned threshold from normal data, serves as the primary anomaly indicator.
Variational Autoencoder (VAE): A probabilistic generative model that learns a latent space representation of the input data's underlying distribution. Anomalies are identified based on their low likelihood under the learned normal distribution, often manifested as high reconstruction error or a low probability density in the latent space.

Experiment Parameters

Parameter	Value
Image Size	(128, 128)
Batch Size	32
Learning Rate	0.001
Epochs	100

Model Performance: A Closer Look

Training Loss

0.0044

Validation Loss

0.0047

PR AUC

0.9528

ROC AUC

0.7050

Reconstruction Error Distribution

Precision-Recall Curve (AP = 0.95)

Discussion & Insights

Key Finding: Autoencoder Shows Strong Promise

In this initial baseline experiment, the standard Autoencoder (AE) model significantly outperformed the Variational Autoencoder (VAE) for fetus anomaly detection. The AE achieved an excellent Precision-Recall AUC of 0.95, indicating its strong ability to identify true anomalies while minimizing false alarms. This is a crucial characteristic for a potential clinical screening tool, aligning with our mission to improve early detection for affected children.

Understanding Performance with Imbalanced Data

The VAE's lower performance, especially its ROC AUC of 0.58 (close to random), highlights the inherent challenge of working with imbalanced datasets, where anomalies are far less common than normal cases. While its PR AUC of 0.88 is still good, it's clear the AE's simpler reconstruction task was more robust for this specific problem. For anomaly detection, where missing a true anomaly is critical, PR AUC often provides a more reliable assessment of model effectiveness.

Confirming the Anomaly Detection Principle

Both models successfully confirmed the core hypothesis of anomaly detection: images containing anomalies consistently produced higher reconstruction errors. The visualizations of reconstruction error distributions show a clear, though sometimes overlapping, separation between normal and anomalous data. This foundational principle is what allows us to flag potential issues for further expert review.

Our Path Forward: Next Steps in Research

Optimize Model Parameters

We will conduct a thorough search for optimal hyperparameters for both AE and VAE, especially focusing on the VAE's latent dimension and beta value, to unlock their full potential.

Explore Advanced Architectures

Our research will investigate more sophisticated generative models, such as AnoGAN, which utilize adversarial training to learn even more precise representations of "normal" data.

Deep Dive into Misclassifications

We will perform in-depth visual analysis of instances where our models misclassified images (false positives and false negatives) to gain crucial insights for model refinement.

Enhance Data Robustness

Refining our data augmentation strategies will improve model robustness to natural variations in ultrasound images without compromising the learning of true normality.

Validate with New Datasets

We plan to replicate this experiment using the FOCUS dataset, a collection of specific cardiac views, to assess how well our approach generalizes to different types of fetal heart ultrasound data.

Develop Clinical Guidance

A critical next step involves working towards establishing clinically relevant anomaly score thresholds, balancing the trade-offs between accurately detecting anomalies and minimizing unnecessary follow-ups.