Graph Classification: DRESS + GNNs¶
Motivation¶
The fingerprint-based approach shows that DRESS captures useful structural information for graph classification using only off-the-shelf classifiers. A natural question follows: what happens when we feed DRESS values directly into Graph Neural Networks as edge and node features?
DRESS acts as a universal preprocessing step, one scalar per edge and one per node, precomputed once and cached. The GNN receives richer structural input without any architecture changes or additional learnable parameters.
Setup¶
All experiments use the ZINC-12K benchmark (Dwivedi et al., 2023):
- Task: Graph regression (predict constrained solubility)
- Metric: MAE (lower is better)
- Split: 10,000 / 1,000 / 1,000 (train / val / test)
- Parameter budget: ~100K parameters per model
- Seeds: 4 (42, 123, 456, 789)
- Training: 400 epochs max, early stopping (patience 50), ReduceLROnPlateau
DRESS preprocessing¶
For each graph in the dataset:
- Call
dress_fit()to obtain per-edge DRESS values and per-node DRESS values. - Optionally apply \(\Delta^k\)-DRESS (
delta_dress_fitwith \(k=1\)) for higher-order structural refinement. - Cache the results. Preprocessing the full ZINC-12K dataset takes < 1 second.
The GNN receives:
- Edge features: bond type (4 categories) + DRESS edge value (1 scalar)
- Node features: atom type (28 categories) + DRESS node value (1 scalar)
Results¶
| Model | MAE |
|---|---|
| GIN (Dwivedi+23) | 0.526 ± 0.051 |
| GAT (Dwivedi+23) | 0.384 ± 0.007 |
| GCN (Dwivedi+23) | 0.367 ± 0.011 |
| GatedGCN (Dwivedi+23) | 0.282 ± 0.015 |
| GPS + DRESS attn | 0.247 ± 0.004 |
| DRESSNet | 0.240 ± 0.009 |
| GIN + DRESS | 0.227 ± 0.029 |
| PNA + DRESS | 0.213 ± 0.004 |
| PNA (Dwivedi+23) | 0.188 ± 0.004 |
Headline result¶
A vanilla GIN with DRESS features achieves 0.227, a 57% error reduction over the published GIN baseline (0.526). This simple addition is enough to surpass GatedGCN (0.282), a model with edge gating, residual connections, and batch normalization.
Architectures tested¶
GIN + DRESS. GINEConv with bond type and DRESS edge value concatenated as edge features. DRESS node values are concatenated to atom embeddings. Hidden dimension 88, ~98K parameters.
PNA + DRESS. PNAConv with degree-scalers and the same DRESS edge/node augmentation. Hidden dimension 35, ~95K parameters. Achieves 0.213, within reach of the vanilla PNA baseline (0.188) despite the smaller hidden dimension forced by the additional edge input channels.
GPS + DRESS attention. GPS transformer layers where DRESS edge values are used as an attention bias in the self-attention mechanism, plus DRESS positional encodings. Hidden dimension 44, ~89K parameters. The lower parameter count (forced by the transformer overhead at 100K budget) limits performance, but it demonstrates that DRESS works with attention-based architectures.
DRESSNet. A DRESS-native architecture where message passing is explicitly gated by structural similarity: \(\text{gate}_{ij} = \sigma(W \cdot d_{ij})\), \(\text{msg}_{ij} = \text{gate}_{ij} \odot m_{ij}\). Hidden dimension 72, ~98K parameters. Achieves 0.240 with a simple design, competitive with much more complex baselines.
Ablation: component contributions¶
| Variant | MAE |
|---|---|
| GIN + DRESS (no bond) | 0.314 ± 0.025 |
| GIN + bond + DRESS (no node DRESS) | 0.295 ± 0.006 |
| GIN + bond + DRESS (full) | 0.235 ± 0.012 |
| GIN + bond + DRESS (full, \(k=1\)) | 0.227 ± 0.029 |
Each component contributes: bond features, DRESS edge values, DRESS node values, and higher-order \(\Delta^k\)-DRESS all improve the result.
Effect of \(\Delta^k\)-DRESS order¶
| Model | \(k=0\) | \(k=1\) |
|---|---|---|
| GIN + DRESS | 0.235 ± 0.012 | 0.227 ± 0.029 |
| PNA + DRESS | 0.213 ± 0.004 | 0.212 ± 0.009 |
| DRESSNet | 0.267 ± 0.025 | 0.240 ± 0.009 |
Higher-order DRESS (\(k=1\)) consistently helps, with the largest gains on DRESSNet (0.267 → 0.240) and GIN (0.235 → 0.227).
Key takeaways¶
-
Universal plug-in. DRESS improves GIN, PNA, GPS, and custom architectures. It is not tied to any specific GNN design.
-
Zero-cost preprocessing. DRESS values are precomputed once (< 1 second for ZINC-12K) and cached. No additional learnable parameters are introduced.
-
Simplicity wins. The best approach is the simplest: concatenate the raw DRESS scalar to existing edge/node features. More complex encodings (scatter statistics, RBF expansions) did not improve results.
-
Higher-order helps. \(\Delta^k\)-DRESS with \(k=1\) provides consistent improvements, especially for architectures that rely heavily on structural information (DRESSNet, GIN).
Relationship to fingerprint classification¶
The fingerprint approach summarizes DRESS values into a fixed-size vector and uses Random Forest or GBT. The GNN approach instead feeds raw DRESS values directly into the message-passing layers, letting the network learn how to use them.
Both approaches confirm the same insight: DRESS edge values carry structural information that improves graph classification, whether consumed by a simple classifier or a neural network.