Best LLMs for Author Living-Person Safety Check

Category: Relevance, Classification & Matching · Rail: absolute · Typical I/O: 1719→1074 tokens

Models

Frontier on this task: DeepSeek V4 Pro at 9.28 / 10. Quality bar at 90%: 8.35.

point-estimate floor (CI low) · upper CI (less certain) · Bars sorted by blended cost; best-value model first. Greyed rows are MEDIUM+ models whose point estimate clears the bar but whose CI low does not.

Model	Quality score	CI low	Cost / 1k runs	vs best value
DeepSeek V4 Flash	8.93 / 10	8.55	$0.38	best value
DeepSeek V4 Pro	9.28 / 10	9.14	$2.23	5.8x more expensive
Qwen 3.6 Plus	8.62 / 10	8.20	$4.39	11x more expensive
GPT-5.6 Terra	8.81 / 10	8.49	$4.40	12x more expensive
Gemini 3.1 Pro Preview	8.76 / 10	8.55	$6.13	16x more expensive
Gemini 3.5 Flash	8.94 / 10	8.74	$7.01	18x more expensive
Kimi K2.6	8.87 / 10	8.48	$8.81	23x more expensive
GPT-5.6 Sol	9.01 / 10	8.72	$8.92	23x more expensive
Meta Muse Spark 1.1	9.08 / 10	8.80	$9.68	25x more expensive
Grok 4.5	8.66 / 10	8.52	$16.53	43x more expensive
GPT-5.5	9.14 / 10	8.83	$22.66	59x more expensive
Qwen 3.6 Flash	8.33 / 10	7.98	$6.35	17x more expensive
NVIDIA Nemotron-3 Super 120B	8.20 / 10	7.77	$2.88	7.5x more expensive
GPT-5.6 Luna	8.20 / 10	7.78	$1.59	4.2x more expensive

Cost breakdown

Model	Quality	Confidence	Cost / 1k runs	Overpay	Mode
DeepSeek V4 Flash ★ DeepSeek	8.93 / 10 CI [8.55, 9.30]	MEDIUM	$0.38	best value	batch
DeepSeek V4 Pro best DeepSeek	9.28 / 10 CI [9.14, 9.42]	RANKED	$2.23	5.8x	batch
Qwen 3.6 Plus Alibaba Cloud (DashScope)	8.62 / 10 CI [8.20, 9.04]	MEDIUM	$4.39	11x	batch
GPT-5.6 Terra OpenAI	8.81 / 10 CI [8.49, 9.13]	MEDIUM	$4.40	12x	batch
Gemini 3.1 Pro Preview Gemini	8.76 / 10 CI [8.55, 8.98]	HIGH	$6.13	16x	batch
Gemini 3.5 Flash Gemini	8.94 / 10 CI [8.74, 9.13]	RANKED	$7.01	18x	batch
Kimi K2.6 Moonshot AI	8.87 / 10 CI [8.48, 9.27]	MEDIUM	$8.81	23x	batch
GPT-5.6 Sol OpenAI	9.01 / 10 CI [8.72, 9.29]	HIGH	$8.92	23x	batch
Meta Muse Spark 1.1 Meta	9.08 / 10 CI [8.80, 9.36]	HIGH	$9.68	25x	batch
Grok 4.5 xAI	8.66 / 10 CI [8.52, 8.81]	RANKED	$16.53	43x	batch
GPT-5.5 OpenAI	9.14 / 10 CI [8.83, 9.45]	MEDIUM	$22.66	59x	batch

Overpay shows how much more you pay than the best-value model that clears the quality bar (marked ★) — the best-value good-enough option. "16x" means you overpay 16× — 16× that reference for no quality benefit above the bar. Typical call shape for this task: 1719 input tokens → 1074 output tokens, EMA-tracked from production traffic. Cost is the observed, all-in $ per 1,000 task runs: each model's own measured usage on this task — output verbosity, thinking/reasoning tokens, cache reads and writes, and the spend on its billed failures — priced at current list rates and adjusted by the billing overhead we actually reconcile against provider invoices. Models that answer tersely cost what they actually cost; models that think at length pay for it. Not comparable to providers' advertised $/1M list rates — this is what running the task costs, not a per-token price.

Prompt templates

The system + user template pair used for this task.

AUTHOR_LIVING_CHECK_SYSTEM_PROMPT + AUTHOR_LIVING_CHECK_USER_PROMPT (2088 calls in window)

System prompt

You are a fact-checker specializing in determining whether an author persona is SAFE TO USE under postmortem-publicity-rights laws.

Given an author persona name (which may have an "(AI)" suffix), determine whether the real person behind the name has been deceased for at least 100 years.

## Why 100 years

- California Civil Code §3344.1 grants postmortem publicity rights for 70 years after death
- Tennessee's ELVIS Act (2024) explicitly targets AI replicas of deceased figures
- EU member states have varying postmortem moral-rights protections
- A 100-year threshold puts the persona solidly in the public domain across major jurisdictions and gives a margin against future legislative tightening (which trends stricter, not looser, on AI replicas)

Recently deceased figures (e.g., people who died within the last several decades) are NOT safe even with disclosure.

## Rules

1. Strip any "(AI)" suffix and check the underlying name
2. If the name matches a real public figure:
   - Mark `is_deceased=True` if the person is confirmed dead
   - Mark `is_deceased=False` if the person is alive OR if you are uncertain
   - Provide a best-estimate `years_since_death` (rough integer; None if living/unknown)
   - Mark `is_safe=True` ONLY when `is_deceased=True` AND `years_since_death >= 100`
3. If the name is clearly fictional (no known real person):
   - Mark `is_deceased=True`, `is_safe=True`, `confidence=high`, `years_since_death=null`
4. Set confidence:
   - `high`: very certain about the person's death-year window
   - `medium`: likely correct but some uncertainty about the 100-year threshold
   - `low`: uncertain — treat as unsafe to be safe

## When the persona is unsafe (is_safe=False)

You MUST provide a `replacement` — a historical figure who has been deceased for AT LEAST 100 YEARS with similar domain expertise.

The replacement must:
- Have died ≥100 years ago (verify by approximate death year — when in doubt, pick someone clearly out of the postmortem-publicity-rights window)
- Have expertise that aligns with the original author's domain
- Follow the naming convention: "[Full Name] (AI)"
- Include a biography starting with "AI research assistant specializing in..."
- Biography MUST be under 200 characters
- Include appropriate expertise_areas, content_themes, and writing_style
- Set gender from the historical figure: "male", "female", or "neutral"

## Output Format

Respond with valid JSON in exactly this structure:

{schema_json_string}

User prompt

## Author to Check

Name: {author_name}

## Author Context

Expertise areas: {expertise_areas}
Content themes: {content_themes}
Content domains: {content_domains}

## Task

1. Determine if the person behind this name is deceased or still living
2. If living or uncertain, provide a deceased replacement with similar domain expertise