MSA (evolutionary information)
β
ββββββββββββββββββββββββββββββββ
β ROW ATTENTION β "Which residues interact?"
β β’ Find local patterns β
β β’ Secondary structure β
β β’ Functional motifs β
ββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββββββββ
β COLUMN ATTENTION β "How does each position evolve?"
β β’ Conservation β
β β’ Functional importance β
β β’ Allowed mutations β
ββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββββββββ
β OUTER PRODUCT MEAN β "Which pairs co-evolve?"
β β’ Contacts β
β β’ Correlated mutations β
β β’ MSA β Pair transfer β
ββββββββββββββββββββββββββββββββ
β
Pair (pairwise features)
β
ββββββββββββββββββββββββββββββββ
β TRIANGLE ATTENTION β "Is this geometrically valid?"
β β’ 3D consistency β
β β’ Triangle inequality β
β β’ Error correction β
ββββββββββββββββββββββββββββββββ
β
Refined Pair β Structure PredictionEXample : Enzyme Active site
Catalytic triad (Serine protease):
His 57, Asp 102, Ser 195
ROW ATTENTION:
"These three residues are spatially close"
COLUMN ATTENTION:
His 57: 100% conserved (critical!)
Asp 102: 100% conserved (critical!)
Ser 195: 100% conserved (critical!)
OUTER PRODUCT:
His-Asp: Strong correlation (must maintain charge)
His-Ser: Strong correlation (hydrogen bond)
Asp-Ser: Strong correlation (catalytic mechanism)
TRIANGLE ATTENTION:
His-Asp: 10Γ
His-Ser: 8Γ
Asp-Ser: 12Γ
β Geometrically consistent! βExample 2:Disulfide Bond
Cys 20 - Cys 80 (disulfide bond)
ROW ATTENTION:
"Cys 20 and Cys 80 are in same sequence"
COLUMN ATTENTION:
Cys 20: Highly conserved (structural)
Cys 80: Highly conserved (structural)
OUTER PRODUCT:
Cys 20 - Cys 80: VERY strong correlation
"If one mutates, other must too!"
TRIANGLE ATTENTION:
Cys 20 - Cys 80: ~2Γ
(disulfide bond length)
Enforces: "Must be exactly this distance!"Example Alpha-helix
Helix: residues 10-20
ROW ATTENTION:
"Residues 10-20 form local pattern"
"i, i+3, i+4 spacing"
COLUMN ATTENTION:
Position 10: Hydrophobic (buried)
Position 14: Hydrophobic (buried)
Position 18: Hydrophobic (buried)
OUTER PRODUCT:
10-14: Close (helix turn)
14-18: Close (helix turn)
TRIANGLE ATTENTION:
10-14: 5.4Γ
(helix pitch)
14-18: 5.4Γ
(helix pitch)
10-18: 10.8Γ
(consistent!)Why all four?
- Evolution (column) + Structure (row) + Contacts (outer product) + Geometry (triangle)
- Each provides unique information
- Combining them = accurate structure prediction!
hwo many types of mask are there
- msa_mask
msa_mask = np.ones((10, 110), dtype=np.float32)
# Example:
# [[1, 1, 1, ..., 1], β All valid
# [1, 1, 1, ..., 1],
# ...]- seq_mask
seq_mask = np.ones(110, dtype=np.float32)
# Example: [1, 1, 1, 1, 1, ..., 1]- pair_mask
pair_mask = np.ones((110, 110), dtype=np.float32)
# Example:
# [[1, 1, 1, ..., 1],
# [1, 1, 1, ..., 1],
# ...]- bert_mask
bert_mask = (np.random.uniform((10, 110)) < 0.15).astype(np.float32)
# Example:
# [[0, 0, 1, 0, 0, ..., 0], β Random 15%
# [0, 1, 0, 0, 0, ..., 1],
# ...]- cluster_bias_mask
cluster_bias_mask = np.ones(10, dtype=np.float32)
# Example: [1.0, 1.0, 0.5, 1.0, 0.5, ...]
# β β
# Downweighted duplicatesAlphaFold2 Complete Pipeline: From MSA to Structure
Pipeline Overview with Mask Creation and Usage
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β PREPROCESSING PHASE β
β (Happens ONCE per protein) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Step 1: Load Raw MSA (A3M format)
βββββββββββββββββββββββββββββββββββ
Input: alignend_10_Seq.a3m (10 sequences, 1739 positions each)
>sp|P01308|INS_HUMAN
---MALWMR----LLPLL-----ALLALWGPDPAAAFVNQHLCG...
>sp|P01315|INS_PIG
---MALWTR----LLPLL-----ALLALWAPAPAQAFVNQHLCG...
...
β
Step 2: Remove Gaps & Create Deletion Matrix
βββββββββββββββββββββββββββββββββββββββββββββ
Query (no gaps): MALWMRLLPLLALLALWGPDPAAAFVNQHLCG...
Length: 110 residues (after removing gaps)
Deletion Matrix [10, 110]:
βββββββββββββββββββββββββββββββββββββββββββ
β Seq 0: [3, 0, 0, 0, 0, 0, 4, 0, 0, ...] β β 3 gaps before M, 4 before L
β Seq 1: [3, 0, 0, 0, 0, 0, 4, 0, 0, ...] β
β Seq 2: [3, 0, 0, 0, 0, 0, 4, 0, 0, ...] β
β ... β
βββββββββββββββββββββββββββββββββββββββββββ
β
Step 3: Integer Encode MSA
βββββββββββββββββββββββββββ
MSA [10, 110]:
ββββββββββββββββββββββββββββββββββββββ
β [12, 0, 10, 17, 12, 1, 10, 10, ...] β β M=12, A=0, L=10, W=17, etc.
β [12, 0, 10, 17, 16, 1, 10, 10, ...] β
β ... β
ββββββββββββββββββββββββββββββββββββββ
β
Step 4: CREATE MASKS (Static - Never Updated!)
βββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β MSA Mask [10, 110] β
β ββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ..., 1, 1, 1] β β All 1s (all valid)
β β [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ..., 1, 1, 1] β β
β β ... β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββ β
β Purpose: Mask padding (none in this case) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Seq Mask [110] β
β ββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ..., 1, 1, 1] β β All 1s
β ββββββββββββββββββββββββββββββββββββββββββββββββββ β
β Purpose: Mask invalid residues β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Pair Mask [110, 110] β
β ββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β [[1, 1, 1, ..., 1], β β
β β [1, 1, 1, ..., 1], β β All 1s
β β ... β β
β β [1, 1, 1, ..., 1]] β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββ β
β Purpose: Mask invalid residue pairs β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β BERT Mask [10, 110] (Training Only!) β
β ββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β [0, 0, 1, 0, 0, 0, 0, 1, 0, 0, ..., 0, 1, 0] β β 15% are 1s
β β [0, 1, 0, 0, 0, 1, 0, 0, 0, 0, ..., 0, 0, 0] β (randomly masked)
β β ... β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββ β
β Purpose: Masked language modeling loss β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Cluster Bias Mask [10] β
β ββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, ...] β β Equal weights
β ββββββββββββββββββββββββββββββββββββββββββββββββββ β
β Purpose: Weight sequences (downweight duplicates) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
Step 5: Initialize Pair Representation
βββββββββββββββββββββββββββββββββββββββ
Pair [110, 110, 128]:
βββββββββββββββββββββββββββββββββββββββββββ
β Relative position encoding (65 features)β
β Separation encoding (7 features) β
β Residue type pairs (40 features) β
βββββββββββββββββββββββββββββββββββββββββββ
β
Step 6: Save All Features
ββββββββββββββββββββββββββ
Output: alphafold2_ready.npz
β aatype [110]
β msa [10, 110]
β deletion_matrix [10, 110]
β deletion_mean [110]
β pair [110, 110, 128]
β msa_mask [10, 110] β CREATED HERE
β seq_mask [110] β CREATED HERE
β pair_mask [110, 110] β CREATED HERE
β bert_mask [10, 110] β CREATED HERE (training)
β cluster_bias_mask [10] β CREATED HERE
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β TRAINING/INFERENCE PHASE β
β (Masks are USED, not modified) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Step 7: Load Features
ββββββββββββββββββββββ
features = np.load('alphafold2_ready.npz')
msa = features['msa'] # [10, 110]
pair = features['pair'] # [110, 110, 128]
msa_mask = features['msa_mask'] # [10, 110] β LOADED
pair_mask = features['pair_mask'] # [110, 110] β LOADED
β
Step 8: Evoformer (48 blocks)
ββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Block 1 of 48 β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β ββββββββββββββββββββββββββββββββββββββββββ β
β β MSA Row Attention β β
β β Input: msa [10, 110, 256] β β
β β Mask: msa_mask [10, 110] β USED! β β
β β ββββββββββββββββββββββββββββ β β
β β attention = softmax(QK^T / βd) β β
β β attention = attention * msa_mask β β Masks padding!
β β output = attention @ V β β
β ββββββββββββββββββββββββββββββββββββββββββ β
β β β
β ββββββββββββββββββββββββββββββββββββββββββ β
β β MSA Column Attention β β
β β Input: msa [10, 110, 256] β β
β β Mask: msa_mask [10, 110] β USED! β β
β ββββββββββββββββββββββββββββββββββββββββββ β
β β β
β ββββββββββββββββββββββββββββββββββββββββββ β
β β Outer Product Mean β β
β β msa β pair update β β
β ββββββββββββββββββββββββββββββββββββββββββ β
β β β
β ββββββββββββββββββββββββββββββββββββββββββ β
β β Triangle Attention (Starting) β β
β β Input: pair [110, 110, 128] β β
β β Mask: pair_mask [110, 110] β USED! β β
β β ββββββββββββββββββββββββββββ β β
β β attention = attention * pair_mask β β Masks invalid pairs!
β ββββββββββββββββββββββββββββββββββββββββββ β
β β β
β ββββββββββββββββββββββββββββββββββββββββββ β
β β Triangle Attention (Ending) β β
β β Mask: pair_mask [110, 110] β USED! β β
β ββββββββββββββββββββββββββββββββββββββββββ β
β β β
β ββββββββββββββββββββββββββββββββββββββββββ β
β β Triangle Multiplication (Outgoing) β β
β β Mask: pair_mask [110, 110] β USED! β β
β ββββββββββββββββββββββββββββββββββββββββββ β
β β β
β ββββββββββββββββββββββββββββββββββββββββββ β
β β Triangle Multiplication (Incoming) β β
β β Mask: pair_mask [110, 110] β USED! β β
β ββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
(Repeat 47 more times)
β
Step 9: Structure Module (8 IPA blocks)
ββββββββββββββββββββββββββββββββββββββββ
Input: refined_pair [110, 110, 128]
Mask: seq_mask [110] β USED!
βββββββββββββββββββββββββββββββββββββββββββ
β IPA Block 1 of 8 β
β Invariant Point Attention β
β Mask: seq_mask [110] β USED! β
βββββββββββββββββββββββββββββββββββββββββββ
β
(Repeat 7 more times)
β
Step 10: Output 3D Structure
βββββββββββββββββββββββββββββ
Output: atom_positions [110, 37, 3]
(110 residues, 37 atoms each, xyz coordinates)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β TRAINING LOSS COMPUTATION β
β (BERT mask used here!) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Step 11: Masked MSA Loss (Training Only)
βββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Predict masked positions β
β predicted_msa = model_output['msa'] β
β true_msa = batch['true_msa'] β
β bert_mask = batch['bert_mask'] β USED! β
β β
β errors = cross_entropy(predicted, true) β
β loss = sum(errors * bert_mask) / sum(bert_mask) β β Only masked positions!
β βββββββββββββββββββββββββ β
β Only compute loss where bert_mask = 1 β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββExample with Your Data
Input MSA (alignend_10_Seq.a3m)
10 sequences Γ 1739 positions (with gaps)
β Remove gaps
10 sequences Γ 110 residues (no gaps)Deletion Matrix Example
Position: 0 1 2 3 4 5 6 7 8 9 10 11
Sequence 0: [3, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, ...]
β β
3 gaps before 'M' 4 gaps before 'L'
From: ---MALWMR----LLPLL...
^^^ ^^^^
3 gaps 4 gapsMSA Mask Example
All positions valid (no padding):
[[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ..., 1], β Sequence 0
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ..., 1], β Sequence 1
...
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ..., 1]] β Sequence 9
If we had padding:
[[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ..., 0], β Last position padded
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ..., 0],
...]BERT Mask Example (Training)
Random 15% of positions masked:
[[0, 0, 1, 0, 0, 0, 0, 1, 0, 0, ..., 0], β Positions 2 and 7 masked
[0, 1, 0, 0, 0, 1, 0, 0, 0, 0, ..., 1], β Positions 1, 5, and 109 masked
...]
Model predicts: What amino acid is at masked positions?
Loss computed: Only at positions where bert_mask = 1Key Takeaways
- Masks are CREATED once during preprocessing
- Masks are USED throughout training/inference
- Masks are NEVER UPDATED - they're static binary indicators
- Purpose: Tell the model which data is real vs padding/masked
Mask Usage Summary
| Mask | Created | Used In | Purpose |
|---|---|---|---|
msa_mask | Preprocessing | MSA attention (Evoformer) | Mask padding |
pair_mask | Preprocessing | Triangle attention (Evoformer) | Mask invalid pairs |
seq_mask | Preprocessing | IPA (Structure Module) | Mask padding |
bert_mask | Training data aug | Loss computation | Masked LM loss |
cluster_bias_mask | Preprocessing | MSA weighting | Downweight duplicates |