Implementation of CapsNet for MHC Binding Prediction

Abstract

This report presents the implementation of Capsule Networks (CapsNet) for predicting MHC-peptide binding hit. The study covers dataset analysis, methodological explanation, experimental setup, and result evaluation. The advantages of CapsNet are explored through several experiments. This study is the implementation of CapsNet-MHS paper.

1. Introduction

Capsule Networks (CapsNet) have been proposed as an alternative to traditional Convolutional Neural Networks (CNNs) to better capture spatial hierarchies. This project aims to implement CapsNet for MHC-peptide binding prediction, evaluating its performance against conventional deep learning models.

2. Dataset Analysis

2.1 Dataset Overview

The dataset utilized is the NetMHC dataset. It comprises over 3.6 million peptide-allele pairs labeled for MHC class I binding. The distribution of alleles is heavily imbalanced.

Figure 1: Allele frequency distribution in the training set

Figure 2: Hierarchical distribution of alleles

2.2 Challenges

Extreme class imbalance (94.6% non-binding, 5.4% binding)
Imbalanced allele frequency and presence of unseen alleles in test set
Need for robust embeddings (e.g., using BLOSUM)

3. Method Explanation

3.1 Capsule Networks

Capsule Networks represent local features using vectors instead of scalars. Each vector's norm indicates probability, while the direction captures spatial relationships. Inputs \( u_i \) are transformed via matrices \( W_{ij} \), producing \( \hat{u}_{j|i} = W_{ij} u_i \). Outputs are computed as:

\( s_j = \sum_i c_{ij} \hat{u}_{j|i} \), and the final capsule output is: \( v_j = \frac{||s_j||^2}{1 + ||s_j||^2} \cdot \frac{s_j}{||s_j||} \)

Figure 3: Network architecture from reference [Kalemati 2023]

3.2 Implementation Details

Sigmoid output for binary classification
Binary Cross Entropy (BCE) loss used
Alleles represented using amino acid sequences
Input features encoded using BLOSUM matrices

4. Experiment Description

BLOSUM45, 62, and 80 embeddings compared
Trained for 10 epochs with Adam optimizer at \(10^{-6}\). Although the learning rate might appear small, the batch size used is rather big.
Used weighted BCE to compensate for the imbalance
Used vast.ai for training due to computational cost
Batch size: 512 (train), 1024 (validation)
Retrained on the whole training set with BLOSUM62 for 30 epochs
Evaluating with AUC PR, since properly classifying the positive class is more important than being able to distinguish between the classes, which is modeled by AUC ROC

5. Result Analysis

Figure 3: Training and validation loss, AUC Precision-Recal and Mathew's correlation coefficient curve for BLOSUM62 embedding

Figure 4: The final precision-recal curve

Table 1: AUC-ROC values on validation set
BLOSUM Matrix	AUC-ROC
BLOSUM45	0.8407
BLOSUM62	0.8375
BLOSUM80	0.8407

Table 2: Test set performance metrics for the final model trained for 30 epochs.
Metric	Value
AUC PR	0.370
MCC	0.333
Accuracy	0.798
F1-Score	0.306

Table 3: Validation set performance metrics for the final model trained for 30 epochs.
Metric	Value
AUC PR	0.374
MCC	0.339
Accuracy	0.805
F1-Score	0.314

Discussion

CapsNet-MHC showed strong results on imbalanced data. An alternative NLP-based approach using fastText-like embeddings was tested but was computationally prohibitive due to massive training pairs.

6. Conclusion

Capsule Networks effectively capture peptide-allele interactions. While promising, further tuning and training on all folds is necessary to realize full potential.

7. Further Work

Future efforts should focus on hyperparameter tuning, reconsidering the attention approach and deeper analysis of capsule outputs for the purpose of explainability.