Local-first PDF redaction for lawyers and anyone handling confidential documents. Deterministic detection finds names, parties, amounts, dates and identifiers in seconds — no AI required — and every export is machine-verified to be extraction-safe. Nothing leaves your machine.
Workflow
Drop a PDF into the local web app. It never touches a network.
Pattern rules, contract-structure extraction (parties, defined-term aliases, signature blocks), NER and your watchlist. A 100-page contract takes about 15 seconds.
One button adds a local LLM (via Ollama) for messy or unusual documents. Slower, never required, and it also runs entirely on your machine.
Every detection is a proposal until you accept it. Draw or resize boxes, reject false positives, redact every occurrence of any term with one click.
Content is removed from the file, not just covered. Export refuses unless two independent extractors confirm the accepted text is gone.
Result
The same contract page, three ways. These are real outputs generated by Blackbar from a synthetic test contract.
The app
A three-pane workspace: proposals grouped by category on the left (with the detection source of each), the document with live redaction boxes in the middle, and the server-rendered redacted preview on the right.
How to
Requirements: macOS or Linux, Python 3.10+, Node 18+. Ollama is optional — without it, Deep scan is simply unavailable and everything else works.
git clone https://codeberg.org/russkysong/blackbar.git cd blackbar
python3 -m venv .venv .venv/bin/pip install -e '.[dev]' .venv/bin/python -m spacy download en_core_web_md cd frontend && npm install && npm run build && cd ..
.venv/bin/redactor serve
# then open http://127.0.0.1:8000 — or double-click index.html in the project folder
.venv/bin/redactor redact contract.pdf # fast, no AI .venv/bin/redactor redact contract.pdf --deep # + local LLM deep scan .venv/bin/redactor redact contract.pdf --alias-labels # "Person 1" labels + alias map
Each export produces the redacted PDF plus a JSON manifest: SHA-256 of input and output, every proposal and your decision on it, the alias map, and the verification verdict.
Why trust it
After redaction the file is re-opened and searched by two independent PDF text extractors. If anything you accepted can still be extracted, the export is refused and destroyed.
Black bars are burned in after the underlying text is deleted from the content stream. Document metadata and annotations are scrubbed.
Image-only pages (where text detection is blind) are flagged and block export until you explicitly acknowledge them.
Measured on a labeled test corpus: recall 1.000, precision 0.954, 100-page contract in 14.9 seconds. Methodology and eval harness ship in the repository.