Graph Retrieval¶
Task¶
Given a query graph, rank a database of graphs by structural similarity. This is the graph analogue of image retrieval or document search — and it requires a fixed-length graph representation that supports efficient nearest-neighbor lookup.
DRESS provides exactly this: an 84-dimensional fingerprint per graph, computed in a single unsupervised pass, with no training data required.
Method¶
- Compute DRESS on every graph in the database (one
dress_fitcall each). - Summarise each graph's edge values into an 84-dimensional percentile fingerprint (80 evenly spaced percentiles + entropy + edge count + unique count + zero fraction).
- Rank database graphs by L1 distance to the query fingerprint.
No GED labels are used, no model is trained, and the fingerprint is compatible with standard approximate nearest-neighbor indices (FAISS, Annoy, etc.) for sub-linear retrieval at scale.
Fingerprint construction¶
Same construction as classification, but with 80 quantiles instead of 20 for finer distributional resolution:
- 80 evenly spaced percentiles of the DRESS edge value distribution.
- Shannon entropy of a 50-bin histogram.
- Number of edges (size signal).
- Number of distinct edge values.
- Fraction of near-zero edges.
Total: 84 dimensions per graph.
Why 84 dimensions?¶
A sweep over n_quantiles ∈ {10, 20, 40, 80, 100, 120, 160} shows
performance saturating around 80–100 quantiles. Beyond that, extra
dimensions add noise without capturing new distributional information —
consistent with the small graph sizes in these benchmarks (LINUX: ~8 nodes,
AIDS: ~9 nodes).
| n_quantiles | Dimensions | LINUX P\@10 (L2) | AIDS P\@10 (L1) |
|---|---|---|---|
| 10 | 14 | 0.360 | 0.201 |
| 20 | 24 | 0.404 | 0.210 |
| 40 | 44 | 0.383 | 0.215 |
| 80 | 84 | 0.434 | 0.223 |
| 100 | 104 | 0.440 | 0.220 |
| 120 | 124 | 0.443 | 0.216 |
| 160 | 164 | 0.398 | 0.219 |
We use 80 quantiles (84d) as the default — near the peak, with no fragile hyperparameter sensitivity.
Ground truth¶
We evaluate against Graph Edit Distance (GED) — the minimum number of node/edge insertions, deletions, and substitutions to transform one graph into another. GED matrices from the GED-EXP benchmark (Bai et al., 2019) provide the "true" ranking for each query.
Datasets: LINUX (800 train / 200 test program call graphs) and AIDS700nef (560 train / 140 test molecular graphs).
Metrics¶
- P\@K: fraction of the predicted top-K that appear in the true GED top-K.
- NDCG\@K: normalised discounted cumulative gain (ranking quality).
- Spearman ρ: full-ranking correlation with GED.
Results¶
LINUX¶
| Method | P\@1 | P\@5 | P\@10 | P\@20 | NDCG\@10 | ρ |
|---|---|---|---|---|---|---|
| Size baseline | 0.170 | 0.293 | 0.312 | 0.375 | 0.872 | 0.867 |
| DRESS Pct-84d L2 | 0.505 | 0.404 | 0.434 | 0.522 | 0.987 | 0.838 |
| DRESS Pct-84d L1 | 0.230 | 0.331 | 0.365 | 0.453 | 0.988 | 0.786 |
| DRESS Pct-24d L1 | 0.290 | 0.394 | 0.460 | 0.556 | 0.987 | 0.818 |
| DRESS Wasserstein | 0.360 | 0.307 | 0.375 | 0.523 | 0.983 | 0.521 |
DRESS achieves P\@1 = 0.505 (3× the size baseline) and NDCG\@10 = 0.987 — near-perfect ranking quality in the top 10.
AIDS700nef¶
| Method | P\@1 | P\@5 | P\@10 | P\@20 | NDCG\@10 | ρ |
|---|---|---|---|---|---|---|
| Size baseline | 0.086 | 0.093 | 0.129 | 0.177 | 0.758 | 0.529 |
| DRESS Pct-84d L1 | 0.207 | 0.230 | 0.223 | 0.216 | 0.846 | 0.472 |
| DRESS Pct-24d L2 | 0.243 | 0.224 | 0.200 | 0.188 | 0.828 | 0.474 |
| DRESS Wasserstein | 0.214 | 0.224 | 0.212 | 0.193 | 0.819 | 0.307 |
DRESS achieves 1.7× the size baseline on P\@10 and NDCG\@10 = 0.846.
Comparison with supervised methods¶
Published GED prediction models (trained on GED labels):
| Method | Type | LINUX ρ | Training required |
|---|---|---|---|
| SimGNN | GNN | ~0.93 | Yes (GED labels) |
| GMN | GNN | ~0.95 | Yes (GED labels) |
| DRESS | Fingerprint | 0.84 | None |
DRESS reaches ~88% of the supervised correlation while requiring zero training, zero GED labels, and zero GPU time. The supervised methods also require \(O(N^2)\) pairwise GED computation for training data, while DRESS computes each fingerprint independently in \(O(m)\).
The ρ vs P\@K tension¶
An interesting pattern: the size baseline wins on Spearman ρ (0.867 vs 0.838 on LINUX) but loses badly on P\@K. This means size captures the broad trend (bigger graphs have higher GED to small ones) but cannot distinguish graphs of similar size — exactly where DRESS excels.
For retrieval, P\@K is the metric that matters: users want the right results in the top positions. A method with perfect ρ but poor P\@10 is useless for practical search.
Observations¶
- L1 distance is more robust than L2 for P\@10, consistent across both datasets. L1 downweights outlier dimensions in the fingerprint.
- Histograms (fixed-bin counts) performed consistently worse than percentile fingerprints — the CDF representation captures distribution shape more faithfully.
- DRESS+Size concatenation did not help on LINUX (where graphs have similar sizes) and helped only marginally on AIDS. The fingerprint already captures size implicitly via the edge count feature.
- Cosine distance was mediocre — magnitude (scale) matters for this task, and cosine discards it.