Settings

Theme

Why “negative vectors” can't delete data in FAISS – but weighted kernels can

github.com

21 points by loaderchips a month ago · 5 comments · 2 min read

Reader

The fix for machine unlearning in vector databases turns out to be conceptually simple, but it requires changing the semantics of retrieval.

Standard FAISS-style indices store vectors and compute:

argmax ⟨q, vᵢ⟩

If you insert -v, nothing happens. It’s just another point. The original vector is still maximally similar to itself and remains rank-1.

This isn’t a bug—it’s a consequence of selection-based retrieval.

If instead you store (vector, weight) pairs and evaluate: φ(q) = Σ wᵢ · K(q, vᵢ)

you get a different object entirely: a field, not a selection. Now inserting the same vector with w = −1 causes destructive interference. The contribution cancels. The attractor disappears.

Deletion becomes O(1) append-only (add the inverse), not a structural rebuild.

FAISS-style: Vec<Vec<f32>> → argmax (selection) Weighted form: Vec<(Vec<f32>, f32)> → Σ (field)

We validated this on 100k vectors: • FAISS: target stays rank-1 after “deletion” • Field-based model: exact cancellation (φ → 0), target unretrievable

The deeper point is that this isn’t a trick—it’s a semantic separation. • FAISS implements a selection operator over discrete points. • The weighted version implements a field operator where vectors act as kernels in a continuous potential. • Retrieval becomes gradient ascent to local maxima. • Deletion becomes destructive interference that removes attractors.

This shifts deletion from structural (modify index, rebuild, filter) to algebraic (append an inverse element). You get append-only logs, reversible unlearning, and auditable deletion records. The negative weight is the proof.

Implication: current vector DBs can’t guarantee GDPR/CCPA erasure without reconstruction. Field-based retrieval can—provably.

Paper with proofs: https://github.com/nikitph/bloomin/blob/master/negative-vect...

jey a month ago

That makes sense, but how do you efficiently evaluate the Gaussian kernel based approach (“operator-based data structures (OBDS)”)? Presumably you want to do it in a way that keeps a dynamically updating data structure instead of computing a low rank approximation to the kernel etc? In my understanding the upside of the kNN based approaches are fast querying and ability to dynamically insert additional vectors..?

  • loaderchipsOP a month ago

    Thank you for the thoughtful comment. Your questions are valid given the title, which I used to make the post more accessible to a general HN audience. To clarify: the core distinction here is not kernelization vs kNN, but field evaluation vs point selection (or selection vs superposition as retrieval semantics). The kernel is just a concrete example.

    FAISS implements selection (argmax ⟨q,v⟩), so vectors are discrete atoms and deletion must be structural. The weighted formulation represents a field: vectors act as sources whose influence superposes into a potential. Retrieval evaluates that field (or follows its gradient), not a point identity. In this regime, deletion is algebraic (append -v for cancellation), evaluation is sparse/local, and no index rebuild is required.

    The paper goes into this in more detail.

ricochet11 a month ago

This isn’t just X, it’s Y.

CamperBob2 a month ago

Hey man, nice slop

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection