Experiment 2 Final Report: Iterative Improvements in Fetal Ultrasound Anomaly Detection

Objective & Methodology

The objective of this experiment was to systematically improve upon the baseline Autoencoder (AE) model for anomaly detection in fetal ultrasound images by introducing and evaluating a series of advanced features and techniques.

We conducted a series of controlled experiments, where each experiment introduced a single change to the model or training process. This allowed us to isolate the impact of each change and make data-driven decisions about which improvements to incorporate into the final model. We used MLflow to track all experiments, parameters, and metrics.

Experiment Results
ExperimentModelROC AUCPR AUC
BaselineAutoencoder0.7050.953
Exp 2.1U-Net0.7000.952
Exp 2.2Attention U-Net0.7440.958
Exp 2.3Advanced Augmentation0.7220.954
Exp 2.4Curriculum Learning0.7300.956

Detailed Experiment Breakdowns

Experiment 1: Baseline Autoencoder
The initial baseline was established using a simple Autoencoder architecture. This experiment served as the starting point for all subsequent improvements.

Autoencoder Error Distribution

Autoencoder Error Distribution

Autoencoder PR Curve

Autoencoder PR Curve

Autoencoder ROC Curve

Autoencoder ROC Curve
Experiment 2.1: UNet Baseline
The second experiment began by establishing a new baseline using a UNet architecture. The UNet, with its encoder-decoder structure and skip connections, is well-suited for image segmentation and reconstruction tasks. The model was trained on a dataset of normal fetal ultrasound images to learn a robust representation of healthy anatomy. Anomalies were then identified by measuring the reconstruction error between the model's output and the input image.

UNet Error Distribution

UNet Error Distribution

UNet PR Curve

UNet PR Curve

UNet ROC Curve

UNet ROC Curve
Experiment 2.2: Attention UNet
To improve upon the UNet baseline, attention mechanisms were integrated into the model architecture. The Attention UNet allows the model to focus on more salient regions of the input image, potentially leading to more accurate reconstructions and better anomaly detection.

Attention UNet Error Distribution

Attention UNet Error Distribution

Attention UNet PR Curve

Attention UNet PR Curve

Attention UNet ROC Curve

Attention UNet ROC Curve
Experiment 2.3: Advanced Augmentation
This experiment investigated the impact of a more comprehensive data augmentation strategy. In addition to the basic augmentations used in the previous experiments, this phase introduced affine transformations, elastic deformations, and Gaussian noise. The goal was to create a more diverse training set and improve the model's ability to generalize to unseen data.

Advanced Augmentation Error Distribution

Advanced Augmentation Error Distribution

Advanced Augmentation PR Curve

Advanced Augmentation PR Curve

Advanced Augmentation ROC Curve

Advanced Augmentation ROC Curve
Experiment 2.4: Curriculum Learning
The final experiment in this series explored the use of curriculum learning. This technique involves training the model on progressively more difficult examples. The difficulty of each image was determined by the reconstruction error of the baseline AutoEncoder model from Experiment 1. By starting with easier examples and gradually introducing more complex ones, the model can learn more effectively and achieve better performance.

Curriculum Learning Error Distribution

Curriculum Learning Error Distribution

Curriculum Learning PR Curve

Curriculum Learning PR Curve

Curriculum Learning ROC Curve

Curriculum Learning ROC Curve
Analysis & Conclusion

Analysis

The results of our experiments show that the most significant improvement in performance came from the introduction of the Attention U-Net in Experiment 2.2. This model achieved a ROC AUC of 0.744, a notable increase from the baseline AE's 0.705. This suggests that the attention mechanism was effective in helping the model to focus on relevant features in the ultrasound images.

The other experiments did not yield the expected improvements. The standard U-Net in Experiment 2.1 performed similarly to the baseline AE, and the advanced data augmentations in Experiment 2.3 and curriculum learning in Experiment 2.4 resulted in a decrease in performance compared to the Attention U-Net.

Conclusion

Based on the results of this experiment series, the Attention U-Net from Experiment 2.2 is the best performing model. It provides a significant improvement over the baseline model and represents the new state-of-the-art for this task.

Future Work

For future work, we recommend further investigation into the following areas:

  • Hyperparameter Tuning: A more thorough hyperparameter search for the Attention U-Net could lead to further performance gains.
  • Alternative Architectures: Exploring other advanced architectures, such as Vision Transformers (ViTs), could also be a promising direction.
  • Ensemble Methods: Combining the predictions of multiple models could lead to a more robust and accurate anomaly detection system.