AI text usually sounds robotic for the same boring reasons. The sentences are too even. The transitions are too polished. The wording is technically fine, but it never quite sounds like a person.
So the fix is not just "make this sound human." You need a method that checks the draft, spots the AI tells, and rewrites it in passes until it reads naturally.
That is what the code below does.
What "humanized" actually means
Humanized text is not slangy or messy. It keeps the meaning intact, but it sounds like a real person wrote it once instead of a model polishing it three times.
In practice, that means:
- sentence lengths vary
- ideas do not all arrive in the same shape
- the tone matches the audience
- the wording is specific instead of generic
- the draft does not lean on obvious AI crutches like repetitive transitions, fake nuance, or overclean symmetry
The code at a glance
The code takes a long input string and runs it through an iterative loop.
- It splits the text into markdown chunks using top-level
##headings. - It runs a detector prompt on each chunk in parallel.
- It collects the strongest AI-writing signals.
- It rewrites the full original text using those findings.
- It repeats until the text looks clean or it hits the max iteration count.
The code does not guess from vibes. It asks the model to point at concrete snippets, explain why they sound AI-like, and say what should change.
The full workflow code
This is the actual implementation.
1import argparse
2import json
3import os
4import sys
5import re
6import uuid
7from typing import Any, Dict, List
8from concurrent.futures import ThreadPoolExecutor, as_completed
9from pathlib import Path
10
11from google import genai
12from google.genai import types
13from pydantic import BaseModel, Field
14
15
16MODEL = "gemini-3.5-flash"
17INPUT_COST_PER_1M = 1.50
18OUTPUT_COST_PER_1M = 9.00
19
20
21def load_local_env_var(key_name: str) -> str:
22 env_path = os.path.join(os.path.dirname(__file__), ".env")
23 if not os.path.exists(env_path):
24 raise KeyError(key_name)
25
26 with open(env_path, "r", encoding="utf-8") as handle:
27 for line in handle:
28 line = line.strip()
29 if not line or line.startswith("#") or "=" not in line:
30 continue
31 name, value = line.split("=", 1)
32 if name.strip() == key_name:
33 value = value.strip().strip('"').strip("'")
34 if value:
35 return value
36
37 raise KeyError(key_name)
38
39
40if "GEMINI_API_KEY" not in os.environ:
41 os.environ["GEMINI_API_KEY"] = load_local_env_var("GEMINI_API_KEY")
42
43client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])
44OUTPUT_DIR = Path(__file__).resolve().parent / "data"
45
46
47class Snippet(BaseModel):
48 snippet: str = Field(description="Short excerpt from the text")
49 reason: str = Field(description="Why the excerpt reads AI-like")
50 fix: str = Field(description="What to change")
51
52
53class DetectionResult(BaseModel):
54 has_issues: bool = Field(description="Whether the text has AI-writing issues")
55 snippets: List[Snippet] = Field(default_factory=list, max_length=5)
56
57
58class RewriteResult(BaseModel):
59 rewritten_text: str = Field(description="The full rewritten text")
60
61
62def get_usage_count(usage: Any, names: List[str]) -> int:
63 for name in names:
64 value = getattr(usage, name, None)
65 if value is not None:
66 return int(value)
67 return 0
68
69
70def estimate_cost(prompt_tokens: int, output_tokens: int) -> float:
71 return (prompt_tokens / 1_000_000.0) * INPUT_COST_PER_1M + (output_tokens / 1_000_000.0) * OUTPUT_COST_PER_1M
72
73
74def trim_to_first_words(text: str, word_count: int = 6) -> str:
75 words = text.split()
76 if len(words) <= word_count:
77 return text
78 return " ".join(words[:word_count])
79
80
81def split_into_chunks(text: str) -> List[str]:
82 chunks: List[str] = []
83 current: List[str] = []
84
85 for line in text.splitlines():
86 if re.match(r"^##\s+", line):
87 if current:
88 chunk = "\n".join(current).strip()
89 if chunk:
90 chunks.append(chunk)
91 current = []
92 current.append(line)
93 else:
94 current.append(line)
95
96 if current:
97 chunk = "\n".join(current).strip()
98 if chunk:
99 chunks.append(chunk)
100
101 return chunks or [text]
102
103
104def call_gemini(prompt: str, *, response_model: type[BaseModel]) -> tuple[str, BaseModel, int, int, float]:
105 response = client.models.generate_content(
106 model=MODEL,
107 contents=prompt,
108 config=types.GenerateContentConfig(
109 temperature=0.2,
110 top_p=0.95,
111 response_mime_type="application/json",
112 response_schema=response_model,
113 ),
114 )
115
116 raw_text = (response.text or "").strip()
117 if not raw_text:
118 raise RuntimeError("Gemini returned empty text")
119
120 parsed = response_model.model_validate_json(raw_text)
121 usage = getattr(response, "usage_metadata", None)
122 prompt_tokens = get_usage_count(usage, ["prompt_token_count", "input_token_count"])
123 output_tokens = get_usage_count(usage, ["candidates_token_count", "output_token_count", "response_token_count"])
124 cost = estimate_cost(prompt_tokens, output_tokens)
125 return raw_text, parsed, prompt_tokens, output_tokens, cost
126
127
128def _detect_single_chunk(index: int, chunk_text: str) -> tuple[int, str, List[Dict[str, Any]], float, int, int, int]:
129 detection_raw, detection, detection_prompt_tokens, detection_output_tokens, detection_cost = call_gemini(
130 build_detection_prompt(chunk_text),
131 response_model=DetectionResult,
132 )
133
134 chunk_snippets = [
135 {**snippet.model_dump(), "snippet": trim_to_first_words(snippet.snippet)}
136 for snippet in detection.snippets
137 ]
138 return index, detection_raw, chunk_snippets, detection_cost, detection_prompt_tokens, detection_output_tokens, len(chunk_snippets)
139
140
141def detect_issues_in_chunks(chunks: List[str]) -> tuple[List[Dict[str, Any]], float, int]:
142 return detect_issues_in_chunks_parallel(chunks, verbose=True)
143
144
145def detect_issues_in_chunks_silent(chunks: List[str]) -> tuple[List[Dict[str, Any]], float, int]:
146 return detect_issues_in_chunks_parallel(chunks, verbose=False)
147
148
149def detect_issues_in_chunks_parallel(chunks: List[str], verbose: bool) -> tuple[List[Dict[str, Any]], float, int]:
150 all_snippets: List[Dict[str, Any]] = []
151 seen = set()
152 total_cost = 0.0
153 total_chunks = 0
154
155 max_workers = max(1, min(8, len(chunks)))
156 with ThreadPoolExecutor(max_workers=max_workers) as executor:
157 futures = [executor.submit(_detect_single_chunk, index, chunk_text) for index, chunk_text in enumerate(chunks, start=1)]
158 chunk_results = [future.result() for future in as_completed(futures)]
159
160 for index, detection_raw, chunk_snippets, detection_cost, detection_prompt_tokens, detection_output_tokens, snippet_count in sorted(chunk_results, key=lambda row: row[0]):
161 if verbose:
162 print(f"\n=== Chunk {index} Detection Result ===")
163 print(detection_raw)
164 print(f"Chunk {index} detection cost: ${detection_cost:.6f} ({detection_prompt_tokens} input tokens, {detection_output_tokens} output tokens)")
165 if chunk_snippets:
166 print(json.dumps(chunk_snippets, ensure_ascii=False, indent=2))
167
168 for snippet in chunk_snippets:
169 key = (snippet.get("snippet"), snippet.get("reason"), snippet.get("fix"))
170 if key in seen:
171 continue
172 seen.add(key)
173 all_snippets.append(snippet)
174
175 total_cost += detection_cost
176 total_chunks += 1
177
178 return all_snippets, total_cost, total_chunks
179
180
181def process_text(input_text: str, max_iterations: int = 10, rewrite_style: str = "", verbose: bool = True) -> dict:
182 current_text = input_text
183 total_cost = 0.0
184 iteration = 0
185 first_iteration_issues: List[Dict[str, Any]] = []
186 forced_rewrite_done = False
187
188 for iteration in range(1, max_iterations + 1):
189 chunks = split_into_chunks(current_text)
190 if verbose:
191 print(f"Iteration {iteration}: {len(chunks)} chunk(s)")
192 all_snippets, detection_cost, _ = detect_issues_in_chunks_parallel(chunks, verbose=verbose)
193
194 if iteration == 1:
195 first_iteration_issues = all_snippets
196
197 has_issues = bool(all_snippets)
198 if verbose:
199 print(f"Iteration {iteration}: {len(all_snippets)} issue(s) found")
200
201 iteration_cost = detection_cost
202
203 should_force_rewrite = bool(rewrite_style.strip()) and not forced_rewrite_done
204 should_rewrite = has_issues or should_force_rewrite
205
206 if not should_rewrite:
207 total_cost += iteration_cost
208 if verbose:
209 print(f"Iteration {iteration} total cost: ${iteration_cost:.6f}")
210 break
211
212 rewrite_raw, rewrite_output, rewrite_prompt_tokens, rewrite_output_tokens, rewrite_cost = call_gemini(
213 build_rewrite_prompt(input_text, all_snippets, rewrite_style=rewrite_style),
214 response_model=RewriteResult,
215 )
216 current_text = rewrite_output.rewritten_text
217 forced_rewrite_done = True
218 iteration_cost += rewrite_cost
219 total_cost += iteration_cost
220 if verbose:
221 print(f"Iteration {iteration}: rewrite applied")
222
223 return {
224 "final_output": current_text,
225 "total_cost": round(total_cost, 6),
226 "iterations": iteration,
227 "model": MODEL,
228 "first_iteration_issues": first_iteration_issues,
229 }
230
231
232
233
234def build_detection_prompt(text: str) -> str:
235 return f"""Audit the text for AI-writing tells.
236
237Focus on concrete signals, not vague vibes:
238
239- Em dash overuse: repeated `—`, lists built around em dashes, em dashes used as a default stylistic crutch, or multiple paragraphs that lean on em dashes for rhythm.
240- Contrast formulas: "it's X, not Y", "not only X, but also Y", "this isn't just X, it's Y", "the real answer isn't X".
241- Meta-openers: "what usually gets skipped here is...", "what people often miss is...", "the part people forget is...", "what matters most is...", "what gets overlooked is...", "this is exactly the sort of...", "this is basically why...", "this is the kind of...".
242- One-line paragraph patterns: lots of short single-sentence paragraphs in a row.
243- Paragraph stubs: a paragraph that starts with a short setup sentence and then immediately follows with a tiny paragraph of 2-5 words, or a paragraph that is itself only 2-5 words long.
244- Outline scaffolding: numbered headings or label-only sections like `### 1. Security`, `### 2. Reliability`, or repeated short section labels that read like an outline instead of prose.
245- Choppy fragmentation: paragraphs with 2-4 very short sentences that all carry pieces of one idea and could naturally become a single sentence. Prioritize this signal even if the paragraph also contains a meta-opener.
246- Template transitions: "that works, until it doesn't", "in today's world", "at the end of the day", "ultimately", "overall".
247- Symmetry: repeated sentence openings, repeated clause shapes, evenly balanced comparisons, or neat bullet/list structures that feel engineered.
248- Generic polish: buzzwords, hedging, fake nuance, or conclusion sentences that sound like a marketing page.
249- Sycophancy: excessive praise toward the user, flattering language, "you nailed it", "great question", "this is exactly the right approach", or over-agreeable validation that adds no substance.
250- Over-polished cadence: text that is technically clean but too symmetrical, too tidy, too templated, or too optimized for scannability.
251- Repetitive feature-list formatting: bold labels, em-dash lists, or repeated "X — Y" constructions that create a slide-deck feel.
252
253Return JSON only in this exact shape:
254{{
255 "has_issues": true|false,
256 "snippets": [
257 {{"snippet": "short identifying excerpt", "reason": "short reason", "fix": "short fix"}}
258 ]
259}}
260
261Rules:
262- Return at most 5 snippets.
263- Snippets must be short. Use only the first few words needed to identify the line.
264- Keep `reason` and `fix` concise.
265- If the text is natural, return `has_issues: false` and `snippets: []`.
266- Prefer the strongest signals only.
267- If a paragraph splits one idea across multiple short sentences, flag that fragmentation even if each sentence is individually understandable.
268- If a paragraph is just 2-5 words, always flag it.
269
270Text:
271{text}"""
272
273
274def build_rewrite_prompt(original_text: str, issues: List[Dict[str, Any]], rewrite_style: str = "") -> str:
275 issues_json = json.dumps(issues, ensure_ascii=False, indent=2)
276 rewrite_style_block = f"\nAdditional rewrite style guidance:\n{rewrite_style}\n" if rewrite_style.strip() else ""
277 return f"""Rewrite the text so it sounds human and natural while preserving meaning, facts, proper nouns, and formatting that still helps the content.
278
279Use these rules:
280- Direct over ornate.
281- Specific over vague.
282- Mix short and long sentences.
283- Prefer simple verbs and concrete nouns.
284- Delete fluff instead of disguising it.
285- Avoid robotic symmetry, generic closings, and over-polished transitions.
286- Keep the level of formality appropriate to the original content.
287
288Address these flagged issues:
289{issues_json}
290{rewrite_style_block}
291
292Return valid JSON only with this schema:
293{{
294 "rewritten_text": "the full rewritten text"
295}}
296
297Original text:
298{original_text}"""
299
300
301def main() -> int:
302 parser = argparse.ArgumentParser(description="Humanise text with iterative Gemini rewrites.")
303 parser.add_argument("--input_text", required=True, help="The text to humanise")
304 parser.add_argument("--max_iterations", type=int, default=10, help="Maximum rewrite/detection loops to run")
305 parser.add_argument("--rewrite_style", default="", help="Additional style guidance for the rewrite prompt")
306 args = parser.parse_args()
307
308 summary = process_text(args.input_text, max_iterations=args.max_iterations, rewrite_style=args.rewrite_style, verbose=True)
309 OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
310 output_path = OUTPUT_DIR / f"final_rewrite_{uuid.uuid4().hex}.json"
311 output_path.write_text(summary["final_output"], encoding="utf-8")
312 console_summary = {
313 "output_path": str(output_path),
314 "iterations": summary["iterations"],
315 "total_cost": summary["total_cost"],
316 "model": summary["model"],
317 }
318 print(json.dumps(console_summary, ensure_ascii=False, indent=2))
319 return 0
320
321
322if __name__ == "__main__":
323 try:
324 raise SystemExit(main())
325 except Exception as exc:
326 print(f"Error: {exc}", file=sys.stderr)
327 raise SystemExit(1)How the code works
The code looks long, but the logic is simple.
1. It loads Gemini securely
The workflow expects GEMINI_API_KEY to be present in the runtime environment. If it is not there, it tries to read a local .env file inside the workflow folder.
That matters because TeamCopilot workflows are meant to run with secrets injected safely, not pasted into prompts or hardcoded into source.
2. It uses structured outputs
Both the detector and the rewrite step use Pydantic models.
That means the model cannot just return a messy blob of text. It has to return valid JSON in the shape the workflow expects:
DetectionResultgiveshas_issuesplus up to five flagged snippetsRewriteResultgives the full rewritten text
It makes the workflow predictable, which is what you want if other people are going to reuse the output.
3. It scans in chunks
The workflow splits the input on top-level ## headings so it can inspect each section separately.
That helps because long blog drafts usually mix clean sections with sections that still sound synthetic. Chunking keeps the detector focused instead of asking it to judge a giant wall of text all at once.
4. It runs detection in parallel
Each chunk is checked in parallel with a thread pool, which keeps the process moving on longer drafts.
The detector is not just looking for one thing. It flags patterns like:
- em dash overuse
- contrast formulas like "it's X, not Y"
- meta-openers like "what people often miss is"
- overclean section scaffolding
- repetitive sentence shapes
- generic closing lines
5. It rewrites only when needed
If the detector finds issues, the workflow rewrites the full original text using those findings.
If it finds nothing, it stops early.
There is one exception. If you pass a non-empty rewrite_style, the workflow forces at least one rewrite pass even if the detector says the text is clean. That is useful when you want a specific voice or house style layered in.
6. It writes the final result to disk
When the loop is done, the workflow writes the final rewrite to a file in data/ and prints a small JSON summary to stdout.
That summary includes:
output_pathiterationstotal_costmodel
How to use it
In TeamCopilot, the workflow takes these inputs:
input_text: the text you want humanizedmax_iterations: how many detect and rewrite loops to run, default10rewrite_style: optional style guidance that forces at least one rewrite pass
A simple run looks like this.
1{
2 "input_text": "AI tools are changing how teams work. We need to use these tools to make our workflows faster and more efficient, which ultimately helps us deliver better results.",
3 "max_iterations": 10,
4 "rewrite_style": "Keep it warm, practical, and a little conversational."
5}The workflow might return a summary like this:
1{
2 "output_path": "data/final_rewrite_7a1d8f2c3d9e4b31b4d2d1ef9c1c5e5f.json",
3 "iterations": 2,
4 "total_cost": 0.008412,
5 "model": "gemini-3.5-flash"
6}The file at output_path contains the final rewritten text.
Example inputs and outputs
Here are simple before and after examples.
Example 1
Input
1You are exactly right! The code indeed has a bug.Output
1The code has a bug.Here is another example with a style guide added.
Example 2
Input
1That's exactly the right approach. The way to solve this problem is to use an iterative loop with a detector and a rewrite step.rewrite_style
1Write it as a friendly reddit comment in first person style.Output
1I've had the best luck tackling this with an iterative loop. You basically just need a detector and a rewrite step to get it done.Why this is better than a single rewrite prompt
A single prompt can help, but it usually stops at surface cleanup.
This workflow is stronger because it separates the problem into two jobs:
- find the AI tells
- rewrite around those tells
That is how editors work too. First they spot the bad habits. Then they fix them. Then they read it again.
FAQ
What does the code actually do?
It audits text for AI-writing patterns, flags the strongest issues, rewrites the full draft, and repeats until the text reads naturally or the maximum iteration count is reached.
Does it change the meaning of the text?
It is designed not to. The rewrite prompt tells the model to preserve meaning, facts, proper nouns, and useful formatting.
Why does it split text into chunks?
Long drafts are easier to inspect section by section. Chunking also lets the detector focus on one section at a time instead of making one big judgment on the entire post.
Why use structured output instead of plain text?
Structured output makes the workflow reliable. The detector must return a clear JSON object with snippets, reasons, and fixes. The rewrite step must return the final text in a known shape.
What if the detector finds nothing?
The workflow stops early and returns the current text. That saves time and cost.
What does rewrite_style do?
It adds extra guidance for tone or voice. If you pass a non-empty value, the workflow runs at least one rewrite pass even when the detector does not find obvious issues.
Can I use this on blog posts only?
No. It can be used on blog posts, docs, support text, internal notes, landing pages, and other long-form writing.
What kind of AI tells does the code look for?
It looks for things like repetitive transitions, overly symmetrical sentence shapes, one-line paragraph patterns, vague buzzwords, fake nuance, em dash overuse, and contrast formulas that read like template writing.
What is the best input format?
Use a full draft with headings and normal paragraph flow. The code is built to work on substantial text, not just a sentence or two.
How many iterations should I use?
The default is 10. In practice, most drafts should not need that many. If the text is already decent, the code stops early.
Is the output deterministic?
No. It uses a model, so two runs can produce slightly different rewrites. The structure of the code stays the same, though, which is what matters for repeatability.
Does this replace human editing?
No. It gets the draft much closer, but the best results still come from a human pass at the end.
How is this different from a generic AI humanizer?
Most generic tools just rewrite surface phrasing. This code does the more useful thing: it detects the patterns first, rewrites in passes, and gives you a predictable process you can reuse.
Support the project
If this was useful, star TeamCopilot on GitHub.
TeamCopilot is a shared AI agent for teams with centralized context, permissions, and workflows.
