watermark-forge: Green/Red-List Watermarking and Detection for LLM-Generated Text

View on GitHub

Pipeline Architecture

The Problem

Teams ship LLM-generated content at scale and have no reliable way to verify provenance after the fact - classifier-based detectors are brittle and adversaries paraphrase around them in one pass.

NEO built watermark-forge as a clean reference implementation of statistical watermarking: a scheme that is invisible to readers, survives common edits, and can be detected by anyone holding the seed without needing the original model.

Green/Red-List Embedding at Sampling Time

watermark-forge ships three focused modules - greenlist.py, embedder.py, and detector.py - implementing the Kirchenbauer et al. scheme. During generation, each step hashes the previous token into a pseudorandom seed, partitions the vocabulary into a "green list" (γ fraction, typically 0.25) and a "red list" (1−γ), and adds a bias δ to the logits of green-list tokens before sampling. Over hundreds of tokens this shifts the distribution of emitted tokens toward the green list by a statistically detectable amount without producing reader-visible artifacts.

from watermark_forge import WatermarkEmbedder

embedder = WatermarkEmbedder(
    model="meta-llama/Llama-3.1-8B-Instruct",
    gamma=0.25,
    delta=2.0,
    seed_key=0xDEADBEEF,
)
watermarked_text = embedder.generate("Summarize the attached policy.", max_tokens=400)

The seed key, γ, and δ form the "watermark configuration" - detection requires only these plus the tokenizer, not the original model.

One-Sided Z-Test Detection

The detector walks the token stream, reconstructs the green list at each position using the same seed, counts green-token hits, and computes a one-sided z-score against the null hypothesis of unwatermarked text (expected green fraction = γ). A z-score above 4 typically corresponds to a p-value below 1e-5, which is strong enough to act on in most pipelines.

ScenarioGreen fractionZ-scoreVerdict
Unwatermarked human text~0.25~0No mark
Fully watermarked (δ=2)~0.5512-18Strong mark
Paraphrased watermarked~0.354-7Mark present
Mixed (50% edited)~0.302-3Inconclusive

Detection operates at the token level, so short snippets (under ~100 tokens) can produce inconclusive verdicts by design.

CLI Workflow

python embedder.py \
  --model meta-llama/Llama-3.1-8B-Instruct \
  --prompt prompts/policy.txt \
  --gamma 0.25 --delta 2.0 --seed 0xDEADBEEF \
  --out watermarked.txt

python detector.py \
  --text watermarked.txt \
  --gamma 0.25 --seed 0xDEADBEEF \
  --threshold 4.0

The detector emits the z-score, p-value, green-token fraction, and a verdict string suitable for logging or gating downstream use.

How to Build This with NEO

Open NEO in VS Code or Cursor and describe what you want to build. A good starting prompt for this project:

"Build a reference implementation of Kirchenbauer green/red-list watermarking for LLM text. At each generation step, hash the previous token to seed a vocabulary partition, bias green-list logits by delta before sampling, and record the configuration. Build a separate detector that walks any text and computes a one-sided z-score against the green-fraction null hypothesis, emitting p-value and verdict. Package as three clean modules (greenlist, embedder, detector) with CLI entrypoints."

Build with NEO →

NEO generates the project structure and core implementation. From there you iterate - add the Aaronson scheme for a second style comparison, benchmark robustness under paraphrase attacks, or package the detector as a FastAPI service for post-hoc provenance checks. Each request builds on what's already there.

To run the finished project:

git clone https://github.com/dakshjain-1616/watermark-forge
cd watermark-forge
pip install -r requirements.txt
python embedder.py --model meta-llama/Llama-3.1-8B-Instruct --prompt "Write a short essay on..." --out out.txt
python detector.py --text out.txt --seed 0xDEADBEEF

The detector returns a z-score and verdict that downstream pipelines can use to gate, log, or label content.

NEO built a compact, inspectable watermarking reference that makes the Kirchenbauer scheme easy to plug into experiments and provenance pipelines. See what else NEO ships at heyneo.com.


Try NEO in Your IDE

Install the NEO extension to bring AI-powered development directly into your workflow: