HydraLA-Net: Multi-Lesion Segmentation for Diabetic Retinopathy Detection

Introduction

Diabetic retinopathy (DR) is one of the leading causes of preventable blindness worldwide, affecting millions of people with diabetes. Early detection and treatment can prevent up to 95% of vision loss cases, but manual screening is time-consuming and requires expert ophthalmologists.

In this project, I developed HydraLA-Net, a deep learning system that automatically segments four types of retinal lesions in fundus images: Hard Exudates (EX), Hemorrhages (HE), Microaneurysms (MA), and Soft Exudates (SE). The system is built on the LA-Net (Lesion-Aware Network) architecture with several novel enhancements, particularly a "Hydra" segmentation head that improves per-class performance.

HydraLA-Net Architecture

The Challenge: Tiny Lesions in Low-Contrast Images

Microaneurysms are particularly challenging to detect - they're often just a few pixels wide and appear as subtle red dots on the retina. The dataset suffers from severe class imbalance: microaneurysms comprise only a tiny fraction of pixels compared to the background.

Fundus Image Samples

Traditional approaches struggle with this imbalance, often missing small lesions entirely or generating too many false positives.

Part 1: Data Preprocessing - Why CLAHE and Stratified Splitting Matter

Dataset Aggregation

I combined three publicly available datasets to improve model generalization:

IDRiD (81 images): High-quality Indian dataset with binary masks
DDR (757 images): Large-scale dataset with .tif format masks
TJDR (561 images): Color-coded annotations requiring parsing

Total: 1,399 images split into:

Training: 827 samples (59%)
Validation: 207 samples (15%)
Test: 365 samples (26%)

Why Stratified Splitting?

Here's where the data pipeline gets interesting. Instead of random splitting, I used stratified splitting to ensure balanced class representation across splits.

Random splitting would randomly assign images to train/val/test sets, which could result in:

Some classes being over-represented in training but under-represented in validation
Rare lesions (like microaneurysms) potentially missing from certain splits
Inconsistent difficulty across splits

Stratified splitting ensures each split maintains the same class distribution as the original dataset. This is crucial for medical imaging where certain lesion types are already rare.

Here's a snippet from my data pipeline:

from sklearn.model_selection import train_test_split

# First split: separate out test set (26%)
train_val_df, test_df = train_test_split(
    combined_df,
    test_size=0.26,
    random_state=42,
    stratify=combined_df['dataset']  # Stratify by dataset source
)

# Second split: separate train and validation (59% / 15% of total)
train_df, val_df = train_test_split(
    train_val_df,
    test_size=0.2,  # 20% of 74% = 15% of total
    random_state=42,
    stratify=train_val_df['dataset']
)

This ensures each dataset (IDRiD, DDR, TJDR) is proportionally represented in all splits, preventing the model from overfitting to characteristics of a single data source.

CLAHE: Enhancing Contrast for Better Lesion Visibility

Fundus images often have low contrast, making tiny lesions difficult to detect. I implemented CLAHE (Contrast Limited Adaptive Histogram Equalization) to enhance local contrast without over-amplifying noise.

Why CLAHE instead of standard histogram equalization?

Standard histogram equalization operates globally on the entire image, which can:

Over-amplify noise in already bright/dark regions
Create unnatural-looking artifacts
Reduce diagnostic quality

CLAHE operates on small tiles (8×8 pixels in my case) and limits contrast enhancement via a clip limit (1.5), preventing noise amplification while still revealing subtle lesions.

I tested three CLAHE modes:

1. LAB Mode: Apply CLAHE to the L (lightness) channel in LAB color space 2. Green Channel Mode (selected for final model): Apply CLAHE only to the green channel 3. CASP Mode: Channel-Aware Selective Preprocessing with different processing per channel

CLAHE Comparison

Why did I choose Green Channel CLAHE?

The green channel provides the best contrast for blood vessels and microaneurysms in fundus photography - it's standard practice in retinal imaging. Red channels are often overexposed, and blue channels have poor SNR.

Here's the implementation from dataset_definition.py:

def apply_clahe(
    image_rgb, clip_limit = 2.5, tile_grid_size = (8, 8), mode = "casp"
):
    clahe = cv2.createCLAHE(
        clipLimit = clip_limit,
        tileGridSize = tile_grid_size
    )

    if mode == "lab":
        lab = cv2.cvtColor(image_rgb, cv2.COLOR_RGB2LAB)
        l, a, b = cv2.split(lab)
        l = clahe.apply(l)
        lab = cv2.merge((l, a, b))
        return cv2.cvtColor(lab, cv2.COLOR_LAB2RGB)

    elif mode == "green":
        out = image_rgb.copy()
        out[:, :, 1] = clahe.apply(out[:, :, 1])
        return out

Part 2: Model Architecture - Building HydraLA-Net

Base Architecture: LA-Net (Lesion-Aware Network)

I started with LA-Net, which uses:

ResNet-50 backbone (pretrained on ImageNet) for feature extraction
Feature Pyramid Blocks (FPB) to create multi-scale representations
Lesion-Aware Modules (LAM) with asymmetric convolutions to capture elongated lesion shapes
Feature Fusion Blocks (FFB) to combine encoder and decoder features

Innovation: The Hydra Segmentation Head

The key innovation in HydraLA-Net is the Hydra head - a multi-branch segmentation head where each lesion type gets its own dedicated branch.

Why separate branches?

Traditional multi-class segmentation uses a single convolutional layer to produce all class predictions simultaneously. This creates competition between classes and can hurt performance when classes have vastly different characteristics (e.g., large hemorrhages vs. tiny microaneurysms).

The Hydra head has 4 independent branches (hydralanet.py:14-33):

class HydraSegHead(nn.Module):
    def __init__(self, in_ch = 256, hidden_ratio = 0.25):
        super().__init__()
        hidden = max(1, int(in_ch * hidden_ratio))

        self.heads = nn.ModuleDict({
            name: nn.Sequential(
                nn.Conv2d(in_ch, hidden, kernel_size = 1, bias = True),
                nn.GELU(),
                nn.Conv2d(hidden, 1, kernel_size = 1, bias = True),
            )
            for name in ["EX", "HE", "MA", "SE"]
        })

        for head in self.heads.values():
            nn.init.constant_(head[-1].bias, -2.0)

    def forward(self, x):
        logits = [head(x) for head in self.heads.values()]
        return torch.cat(logits, dim = 1)

Key design choices:

Hidden ratio of 0.25: Reduces parameters (256→64→1) while maintaining expressiveness
GELU activation: Smoother gradients than ReLU, better for small lesions
Bias initialization to -2.0: With sigmoid activation, this starts predictions near 0.12, reducing false positives early in training

Each branch learns specialized features for its lesion type, avoiding the compromises of a shared prediction layer.

Part 3: Handling Class Imbalance with Dual Loss

The biggest challenge in this project was severe class imbalance. Microaneurysms might occupy less than 0.01% of image pixels, while the background dominates.

Pixel Distribution

I implemented a Dual Loss combining Focal Tversky Loss and Binary Cross-Entropy:

Why Focal Tversky Loss?

Tversky Index is a generalization of Dice/F1 that allows asymmetric weighting of false positives and false negatives:

Tversky = TP / (TP + α×FP + β×FN)

α (FP penalty): How much we penalize false positives
β (FN penalty): How much we penalize false negatives

For medical imaging, false negatives (missed lesions) are more dangerous than false positives, so β > α.

Here's the actual implementation from loss_functions.py:16-92:

class FocalTverskyLoss(nn.Module):
    """
    Multi-Label Focal Tversky Loss for Semantic Segmentation

    alpha: controls penalty on false positives
    beta: controls penalty on false negatives
    gamma: focal parameter
    smooth: used to prevent division by 0
    class_weights: explicit scalar reweighting per class
    """
    def __init__(
        self, alpha = 0.3, beta = 0.7, gamma = 1.3, smooth = 1e-6, class_weights = None,
    ):
        super().__init__()

        self.register_buffer("alpha", torch.as_tensor(alpha, dtype = torch.float32))
        self.register_buffer("beta", torch.as_tensor(beta, dtype = torch.float32))
        self.register_buffer("gamma", torch.as_tensor(gamma, dtype = torch.float32))

        if class_weights is None:
            class_weights = torch.ones(4)
        self.register_buffer(
            "class_weights",
            torch.as_tensor(class_weights, dtype = torch.float32)
        )
        self.register_buffer(
            "smooth",
            torch.tensor(smooth, dtype = torch.float32)
        )

    def forward(self, probs, targets):
        # ... safety checks omitted for brevity ...

        # Flatten spatial dims
        probs_flat = probs.view(B, C, -1)
        targets_flat = targets.view(B, C, -1)

        # TP / FP / FN
        TP = (probs_flat * targets_flat).sum(dim = 2)
        FP = (probs_flat * (1 - targets_flat)).sum(dim = 2)
        FN = ((1 - probs_flat) * targets_flat).sum(dim = 2)

        # Tversky + focal term
        tversky = (TP + self.smooth) / (
            TP + alpha * FP + beta * FN + self.smooth
        )

        loss_per_class = (1 - tversky) ** gamma

        weighted_loss = (loss_per_class.mean(dim = 0) * self.class_weights).mean()
        return weighted_loss

Note: For microaneurysms (MA), I used β=0.8, α=0.2 - heavily penalizing missed microaneurysms while being more tolerant of false positives.

The Dual Loss: Best of Both Worlds

I combined Focal Tversky with Binary Cross-Entropy (loss_functions.py:154-200):

class DualLoss(nn.Module):
    """
    Combined Loss Function (Mixed Focal Tversky and BCE)
    Final Loss: L = w_ft * FocalTverskyLoss + w_bce * BCELossMultiLabel
    """

    def __init__(
        self, class_weights = None, w_ft = 0.5, w_bce = 0.5, alpha = 0.3, beta = 0.7,
        gamma = 1.3, smooth = 1e-6, pos_weight = None
    ):
        super().__init__()
        self.w_ft = w_ft
        self.w_bce = w_bce

        self.ft = FocalTverskyLoss(
            alpha = alpha,
            beta = beta,
            gamma = gamma,
            smooth = smooth,
            class_weights = class_weights,
        )

        self.bce = BCELossMultiLabel(
            class_weights = class_weights,
            pos_weight = pos_weight,
        )

    def forward(self, probs, targets):
        loss_ft = self.ft(probs, targets)
        loss_bce = self.bce(probs, targets)

        return self.w_ft * loss_ft + self.w_bce * loss_bce

Why combine them?

Focal Tversky: Handles class imbalance and focuses on hard examples
BCE: Provides stable pixel-wise gradients and prevents collapse

With weights 0.8 and 0.2 respectively, plus class weights [1.0, 1.0, 5.0, 1.0], giving 5× importance to microaneurysm predictions.

Part 4: Training Strategy and Results

Training Configuration

From train.py:36-104:

# Global Constants
IMG_SIZE = 1024
DEFAULT_EPOCHS = 125
LEARNING_RATE = 1e-5
BATCH_SIZE = 2
NUM_WORKERS = 8

W_FTL = 0.8
W_BCE = 0.2

TVERSKY_ALPHA = [0.4, 0.4, 0.2, 0.4]
TVERSKY_BETA = [0.6, 0.6, 0.8, 0.6]
TVERSKY_GAMMA = 2

CLAHE_CLIP = 1.5
CLAHE_MODE = 'green'

CLASS_WEIGHTS = [1.0, 1.0, 5.0, 1.0]
THRESHOLDS = [0.35, 0.35, 0.35, 0.35]

# Model initialization
model = HydraLANet()
model = model.to(device)

# Optimizer
optimizer = torch.optim.AdamW(
    params = model.parameters(),
    lr = LEARNING_RATE,
    weight_decay = 1e-4
)

# Loss Function
loss_function = DualLoss(
    class_weights = CLASS_WEIGHTS,
    w_ft = W_FTL,
    w_bce = W_BCE,
    alpha = TVERSKY_ALPHA,
    beta = TVERSKY_BETA,
    gamma = TVERSKY_GAMMA,
    smooth = 1e-6
).to(device)

Key training decisions:

Small batch size (2): Required due to GPU memory with 1024×1024 images. I used frozen BatchNorm layers to avoid instability from batch statistics.
AdamW with low learning rate (1e-5): Fine-tuning pretrained ResNet-50 requires careful learning rates to avoid catastrophic forgetting.
High resolution (1024×1024): Microaneurysms can be just 3-5 pixels wide. Lower resolutions (512×512) caused significant performance drops.

Data Augmentation

From dataset_definition.py:131-144:

if self.transform_type == "train":
    transforms.extend([
        A.HorizontalFlip(p = 0.5),
        A.VerticalFlip(p = 0.5),
        A.Affine(
            translate_percent = 0.1,
            scale = (0.88, 1.25),
            rotate = (-15, 15),
            border_mode = cv2.BORDER_CONSTANT,
            fill = 0,
            fill_mask = 0,
            p = 0.75,
        ),
    ])

Medical images can have any orientation, so rotations and flips are valid augmentations. I avoided color augmentations since CLAHE preprocessing already handles contrast.

Evaluation Metrics

I tracked three metrics per class:

IoU (Intersection over Union): Segmentation accuracy
F1 Score: Harmonic mean of precision and recall (primary metric)
Recall: Critical for medical applications - we don't want to miss lesions

Results are calculated per class and then averaged, giving equal importance to each lesion type despite class imbalance.

Part 5: Deployment with Streamlit

I deployed the model as an interactive web app using Streamlit on Hugging Face Spaces.

Key preprocessing function from app.py:15-38:

def apply_green_clahe(image_rgb, clip_limit=1.5, tile_grid_size=(8, 8)):
    """Apply CLAHE to green channel only for contrast enhancement"""
    clahe = cv2.createCLAHE(
        clipLimit = clip_limit,
        tileGridSize = tile_grid_size
    )
    out = image_rgb.copy()
    out[:, :, 1] = clahe.apply(image_rgb[:, :, 1])
    return out

def preprocess_image(image_rgb):
    """Preprocess image with CLAHE and ImageNet normalization"""
    image_rgb = apply_green_clahe(image_rgb)

    image = image_rgb.astype(np.float32) / 255.0

    mean = np.array([0.485, 0.456, 0.406], dtype=np.float32)
    std = np.array([0.229, 0.224, 0.225], dtype=np.float32)
    image = (image - mean) / std

    image_tensor = torch.from_numpy(image).permute(2, 0, 1).unsqueeze(0)

    return image_tensor, image_rgb

The app allows users to:

Upload their own fundus images
Adjust detection threshold for sensitivity vs. specificity trade-offs
Toggle CLAHE preprocessing
View individual masks and coverage statistics

Conclusions and Key Takeaways

This project demonstrates several important principles for medical image segmentation:

1. Preprocessing Matters

CLAHE preprocessing on the green channel significantly improved lesion visibility without introducing artifacts. Domain-specific preprocessing (using medical imaging knowledge) outperformed generic approaches.

2. Architecture Innovations Can Be Simple

The Hydra head required minimal code changes but provided measurable improvements by giving each class its own decision pathway.

3. Loss Functions Are Critical for Imbalance

The combination of Focal Tversky Loss (with class-specific α/β parameters) and class weighting was essential for learning to detect rare microaneurysms.

4. Multi-Dataset Training Improves Generalization

Combining three datasets with different annotation formats and imaging protocols forced the model to learn robust features rather than dataset-specific artifacts.

5. Resolution Cannot Be Compromised

For tiny lesions like microaneurysms, high-resolution training (1024×1024) was non-negotiable. The performance drop at 512×512 was significant.

Impact

Automated diabetic retinopathy screening can help address the shortage of ophthalmologists in underserved areas, enabling earlier detection and treatment. By detecting multiple lesion types simultaneously, systems like HydraLA-Net can provide more comprehensive assessments than single-lesion detectors.

Try It Yourself

Live Demo: Hugging Face Spaces
Paper: Progressive Optimization of HydraLA-Net for Microaneurysm Segmentation
Code: GitHub Repository

Built with PyTorch, Streamlit, and a lot of coffee

Tags: #DeepLearning #MedicalAI #ComputerVision #DiabeticRetinopathy #SemanticSegmentation #PyTorch