poveglia: Quarantine Your Imports

Poveglia is named after the Venetian island that served as a quarantine station for ships arriving with plague aboard, holding the dangerous cargo offshore until it was known to be safe. The library does the same for the files your users upload: it quarantines them at the door and runs them through a pipeline of classifiers before any of it reaches your system. Virus scanning, zip-bomb detection, explicit-content and CSAM checks, AI-generated-image detection, and more, all behind one async classify() call that returns a single verdict.

1 import asyncio
2 from poveglia import classify, Status
3 
4 result = asyncio.run(classify({
5     "url": "s3://uploads/photo.jpg",
6     "classifiers": ["virus", "explicit", "csam", "policy"],
7     "classifier_config": {
8         "explicit": {"api_callable": my_vision_api, "threshold": 0.7},
9         "csam": {"api_callable": my_csam_api, "callback": my_csam_reporter},
10         "policy": {"forbidden_mimetypes": ["video/*"]},
11     },
12 }))
13 
14 if result.status == Status.FORBID:
15     reject_upload(result)
16 elif result.status == Status.REVIEW:
17     queue_for_human_review(result)

The classifiers list is also the order they run in, in series, over the one file.

Fail open in the pipeline, fail loud in the results

The decision I care most about is that a classifier which throws never aborts the run, and a content verdict never reaches you as an exception. The two are kept rigorously apart. A classifier’s verdict (allow, review, forbid, or mandatory-action) goes into result.status, which is the worst verdict across the whole pipeline. A classifier’s failure (the vision API timed out, the download blew past its cap) goes into result.errors. Neither touches the other.

That separation has a sharp edge, and it’s the one thing to internalize before you ship: status == ALLOW does not mean “clean.” Errors don’t raise the status, so a run in which every single classifier threw comes back as ALLOW with a populated errors list. The predicate you actually want is result.is_clean, which is true only when the status is ALLOW and nothing errored. Treat a bare ALLOW as “safe to ship” and a total outage of your scanners reads as a clean bill of health. is_clean is the difference between “nothing flagged it” and “nothing could check it.”

The single exception to the no-raise rule is misconfiguration. Name a classifier that doesn’t exist and classify() raises KeyError immediately, before it downloads a thing, because a typo in your pipeline is a programming error and not a verdict about the file.

A pipeline and a contract, not a moderation service

I’ll be upfront: Poveglia doesn’t ship the hard part. The vision and CSAM classifiers take an api_callable you provide; the virus classifier expects a ClamAV daemon you run; the action classifiers (reporting, legal_hold, metadata) write through backends you wire in. What Poveglia gives you is the orchestration around them: serial execution that short-circuits on FORBID and MANDATORY_ACTION, a scoring mode that runs everything for ranking instead of gating, verdict aggregation, and a lazy content resolver that downloads the file once no matter how many classifiers ask for its bytes, caps that download as a denial-of-service guard, and cleans up its own temp files. Classifiers are discovered through pyproject.toml entry points rather than by file location, so anyone can ship one as a separate package, and they can hand intermediate work to each other through a shared blackboard (the face data your explicit-content check already fetched is sitting right there for the identifiable-faces check, no second API call), as long as each still works when the blackboard is empty.

The defaults lean the safe way. An uncertain or malformed score from a safety classifier comes back as REVIEW, never ALLOW; unknown is not clean. The CSAM classifier returns a mandatory action that fires your reporting callback automatically and records the outcome, and if you’ve marked that callback as required, a high-confidence hit with nothing wired up raises rather than quietly rejecting the file, so the compliance gap surfaces instead of hiding. The zip-bomb check trusts measured decompression rather than the sizes an archive declares about itself, since those are precisely what an attacker controls. As the README puts it, Poveglia gives you the tools, not the compliance; what you connect to them is on you.

pip install poveglia gets you the core and the dependency-light classifiers; the vision, ClamAV, and object-storage classifiers come in via poveglia[vision], poveglia[clamav], and poveglia[storage] (or poveglia[all]). It needs Python 3.11 or newer and is MIT licensed. Whether your users’ uploads ever make it off the island is, appropriately, up to you.

1	import asyncio
2	from poveglia import classify, Status
3
4	result = asyncio.run(classify({
5	"url": "s3://uploads/photo.jpg",
6	"classifiers": ["virus", "explicit", "csam", "policy"],
7	"classifier_config": {
8	"explicit": {"api_callable": my_vision_api, "threshold": 0.7},
9	"csam": {"api_callable": my_csam_api, "callback": my_csam_reporter},
10	"policy": {"forbidden_mimetypes": ["video/*"]},
11	},
12	}))
13
14	if result.status == Status.FORBID:
15	reject_upload(result)
16	elif result.status == Status.REVIEW:
17	queue_for_human_review(result)

Fail open in the pipeline, fail loud in the results

A pipeline and a contract, not a moderation service

Related