Back to projects

Technical depth / education

Master's thesis: breast cancer from histological images

A research prototype (not deployed) that lets a patient, family member, or X-ray tech upload a histology image and get an early read on whether a lump looks malignant or benign. Built as an MSc thesis because waiting on a doctor's appointment for a scan or a second opinion in Europe takes too long.

Employer / client
Dublin City University
Duration
MSc capstone, DCU
Project type
Master's thesis

How it works

How this works

Bar chart comparing validation accuracy of CNN, VGG, ResNet, CNN-to-VGG hybrid, and VGG-to-CNN hybrid models. CNN scores 0.83, VGG 0.70, ResNet 0.69, CNN+VGG hybrid 0.80, VGG+CNN hybrid 0.77.

Validation accuracy across all five models. CNN was the strongest single architecture, the CNN-to-VGG hybrid the strongest stack.

Figure 7, Final Report
Bar chart comparing validation loss across the same five models. CNN 0.39, VGG 0.67, ResNet 0.62, CNN+VGG 0.51, VGG+CNN 0.51.

Validation loss across the same five models. The hybrids reduced loss versus standalone VGG and ResNet, but CNN held the lowest loss overall.

Figure 8, Final Report

Process flow

How I work the steps

  1. 01
    before Dataset

    Kaggle histopathology images, binary benign vs malignant labels.

    Public dataset
  2. 02
    control Preprocessing

    Resize, normalise, augment (rotation, scaling, flipping), histogram equalisation.

    BA / Researcher
  3. 03
    control Model training

    Custom CNN, transfer-learned VGG16 and ResNet50, two CNN/VGG hybrid stacks.

    BA / Researcher
  4. 04
    handoff Validation

    Stratified k-fold cross-validation, statistical significance tests across architectures.

    BA / Researcher
  5. 05
    after Output

    Benign vs malignant read with accuracy, sensitivity, specificity, and overfitting reporting.

    Patient / clinician

How I built it

  • Pulled a labelled histopathology image set from Kaggle, split 80/20 into train and test, then layered on resize, normalisation, rotation, scaling, flipping, and histogram-equalisation preprocessing to broaden the training distribution.
  • Built a custom CNN from scratch (convolutional + ReLU + pooling + fully-connected layers, tuned filter and kernel sizes for cell-level texture).
  • Fine-tuned pretrained VGG16 and ResNet50 with transfer learning, customising the heads for binary benign-vs-malignant classification.
  • Trained two hybrid stacks: CNN to VGG (CNN feature extraction feeding VGG depth) and VGG to CNN (the reverse order), to test whether mixing the architectures beat the best single model.
  • Used stratified k-fold cross-validation so every fold was used for both training and validation, and ran statistical-significance tests across architectures so the comparison wasn't just "this number was bigger".

Measured results

What I measured

83% accuracy

CNN (best single model)

Validation accuracy, validation loss 0.35.

80% accuracy

CNN to VGG hybrid

Validation accuracy, validation loss 0.30. Best hybrid.

77% accuracy

VGG to CNN hybrid

Validation accuracy, validation loss 0.32.

70% accuracy

VGG16 (transfer learning)

Validation accuracy, 0.45 loss. Volatile loss curves.

69% accuracy

ResNet50 (transfer learning)

Validation accuracy, 0.50 loss. Irregular training behaviour.

Findings

  • CNN came out on top at 83% validation accuracy with 0.35 validation loss, the best single model in the study.
  • CNN-to-VGG hybrid hit 80% accuracy at 0.30 loss, the second strongest and the best of the hybrids.
  • VGG-to-CNN hybrid landed at 77% accuracy with 0.32 loss.
  • Standalone VGG and ResNet50 came in at 70% and 69% respectively, with more volatile loss curves and clear overfitting signals.

Tools I used

  • Python
  • CNN
  • VGG16
  • ResNet50
  • Transfer learning
  • Stratified k-fold CV