Fetus Ultrasound Anomaly Detection Research
A Baseline Experiment Report on Reconstruction-Based Anomaly Detection
Our First Baseline Experiment: Research Overview
This report presents the findings of a foundational baseline experiment investigating the application of deep generative models for anomaly detection in prenatal fetal ultrasound imaging. The objective is to establish the efficacy of Autoencoder (AE) and Variational Autoencoder (VAE) architectures in learning the distribution of normal fetal anatomy and subsequently identifying deviations indicative of potential anomalies. This approach is particularly pertinent in domains where anomalous data is scarce and highly heterogeneous, such as congenital heart defects, enabling a robust screening mechanism by flagging images that fall outside the learned manifold of normality.
Models Explored
- Autoencoder (AE): A neural network trained for unsupervised feature learning by reconstructing its input. In anomaly detection, a significantly higher reconstruction error for an input image, relative to a learned threshold from normal data, serves as the primary anomaly indicator.
- Variational Autoencoder (VAE): A probabilistic generative model that learns a latent space representation of the input data's underlying distribution. Anomalies are identified based on their low likelihood under the learned normal distribution, often manifested as high reconstruction error or a low probability density in the latent space.
Experiment Parameters
Parameter | Value |
---|---|
Image Size | (128, 128) |
Batch Size | 32 |
Learning Rate | 0.001 |
Epochs | 100 |
Model Performance: A Closer Look
Training Loss
0.0044
Validation Loss
0.0047
PR AUC
0.9528
ROC AUC
0.7050
Reconstruction Error Distribution
Precision-Recall Curve (AP = 0.95)
Discussion & Insights
Key Finding: Autoencoder Shows Strong Promise
In this initial baseline experiment, the standard Autoencoder (AE) model significantly outperformed the Variational Autoencoder (VAE) for fetus anomaly detection. The AE achieved an excellent Precision-Recall AUC of 0.95, indicating its strong ability to identify true anomalies while minimizing false alarms. This is a crucial characteristic for a potential clinical screening tool, aligning with our mission to improve early detection for affected children.
Understanding Performance with Imbalanced Data
The VAE's lower performance, especially its ROC AUC of 0.58 (close to random), highlights the inherent challenge of working with imbalanced datasets, where anomalies are far less common than normal cases. While its PR AUC of 0.88 is still good, it's clear the AE's simpler reconstruction task was more robust for this specific problem. For anomaly detection, where missing a true anomaly is critical, PR AUC often provides a more reliable assessment of model effectiveness.
Confirming the Anomaly Detection Principle
Both models successfully confirmed the core hypothesis of anomaly detection: images containing anomalies consistently produced higher reconstruction errors. The visualizations of reconstruction error distributions show a clear, though sometimes overlapping, separation between normal and anomalous data. This foundational principle is what allows us to flag potential issues for further expert review.
Our Path Forward: Next Steps in Research
We will conduct a thorough search for optimal hyperparameters for both AE and VAE, especially focusing on the VAE's latent dimension and beta value, to unlock their full potential.
Our research will investigate more sophisticated generative models, such as AnoGAN, which utilize adversarial training to learn even more precise representations of "normal" data.
We will perform in-depth visual analysis of instances where our models misclassified images (false positives and false negatives) to gain crucial insights for model refinement.
Refining our data augmentation strategies will improve model robustness to natural variations in ultrasound images without compromising the learning of true normality.
We plan to replicate this experiment using the FOCUS dataset, a collection of specific cardiac views, to assess how well our approach generalizes to different types of fetal heart ultrasound data.
A critical next step involves working towards establishing clinically relevant anomaly score thresholds, balancing the trade-offs between accurately detecting anomalies and minimizing unnecessary follow-ups.