One API.
More Less tokens.
Lower bills.

Send any content. Get back a trimmer version that does the same job with fewer tokens.

Avg. reduction
−68%
P50 latency
147ms
Lossless on intent
99.4%
POST /v1/trim · bearer ltk_••••
compressing
input3,184
tok
<article class="prose">
  <h1>Welcome to our brand new docs</h1>
  <p>This extensive guide will help you learn
  about all of the APIs that are available.</p>
  <ul>
    <li>Auth</li>
    <li>Rate limits</li>
  </ul>
</article>
output0
tok
saved
2,247 tok
ratio
−71%
latency
112 ms
$ saved
$0.011
how it works

A small, surgical AI that only does one thing: drop tokens that don't carry weight.

Trained on token-equivalence — keep the meaning your model uses, throw away the rest. No fuzzy compression. No surprises in production.

01 · ingest

Send any payload

POST raw HTML, prompt strings, or chat history. We accept up to 1MB per request, stream up to 10MB.

inputhtml
modelltk-trim-1
02 · trim

Tokens recompiled

Our model rewrites the payload into a semantically equivalent form, optimized for the model you name.

input
3,184
output
937
03 · ship

Forward to any LLM

Pipe the trimmed payload into Claude, GPT, Gemini, Llama. Same intent, fewer tokens billed.

claude-opus-4.7gpt-5gemini-2.5llama-4
/v1/trim

The whole API is one endpoint.

One verb. One contract. Stays the same as we ship more types. Drop it in front of anything that bills per token.

RESTstreamingnode · python · goopenapi 3.1
Read the full docs ↗
200 OK · 112ms
# request
curl https://api.leantokn.dev/v1/trim \
  -H "authorization: Bearer ltk_••••" \
  -H "content-type: application/json" \
  -d '{
    "input":  "<html>…</html>",
    "type":   "html",
    "target": "claude-opus-4.7"
  }'

# response
{
  "result":   "# Welcome\n\nThis guide…",
  "type":     "markdown",
  "stats": {
    "input_tokens":    3184,
    "output_tokens":   937,
    "tokens_saved":    2247,
    "savings_percent": 71,
    "usd_saved":       0.011,
    "duration_ms":     112
  }
}
verified · openapi.json· v2026.05⎘ copy
what leantokn trims

One API. Many shapes of waste.

Every capability ships as a new type on the same endpoint. Your integration never changes.

HTML → Markdown

Strip the bytes a model already reads through. Average payload drops 65–75%.

live

Prompt trim

Detect dead weight in prompts — boilerplate, unused examples, duplicated context.

in training

Chat summary

Live, streaming summarization of chat history. Replace the first N turns with a tight digest.

coming

RAG distill

Trim retrieved documents to only the spans that load-bear on the user's question.

coming
initialize connection

Plug in. Watch your bill shrink.

1M tokens free, no card. Drop one line in front of any model call. If we don't save you money, the bill is on us.