API reference

Models

The Compresr compression models — `latte_v1` and `latte_v2`, their parameters, and the canonical meaning of target_compression_ratio.

Compresr exposes two query-specific compression models on the public API:

latte_v1 — battle-tested, predictable.
latte_v2 — up to 5x faster at the same compression quality, plus a dynamic mode that picks the compression ratio per input automatically.

Both consume a context plus a query and return only the spans of context that carry signal for the query. latte_v2 accepts every parameter latte_v1 accepts, with the same defaults and semantics — swapping models is a single string change to compression_model_name. When in doubt, start with latte_v2 + target_compression_ratio=0.5.

Both models are served by the same endpoints: POST /compress/question-specific/ (single), …/stream (SSE), and …/batch (up to 100 rows).

Parameters

Parameter	`latte_v1`	`latte_v2`	Default	Purpose
`context`	✓ required	✓ required	—	Source text to compress. Empty string returns an empty result, no billing.
`query`	✓ required	✓ required	—	Question/intent that grounds relevance. Cannot be empty.
`compression_model_name`	✓ required	✓ required	—	`"latte_v1"` or `"latte_v2"`; anything else is a `422`.
`target_compression_ratio`	✓	✓ (ignored if `dynamic=true`)	model default	Fixed compression strength — see below.
`coarse`	✓	✓	`true`	Paragraph-level scoring; set `false` for finer precision at higher latency.
`heuristic_chunking`	✓	✓	`false`	Structure-aware chunking (paragraphs, code blocks, sections) before scoring.
`disable_placeholders`	✓	✓	`false`	Return kept spans without `[...]` markers between dropped regions.
`dynamic`	—	✓	`false`	Pick the ratio per input; overrides `target_compression_ratio`.
`dynamic_min_ratio`	—	✓	`1.5`	Floor on the chosen Nx ratio. Must be `≥ 1.0`.
`dynamic_max_ratio`	—	✓	`10.0`	Ceiling on the chosen Nx ratio. Must be `≥ dynamic_min_ratio`.

A — means the parameter is rejected with 422 Unprocessable Entity on that model. The wire format is always snake_case; the TypeScript SDK accepts the camelCase forms (targetCompressionRatio, dynamicMinRatio, …).

Dynamic mode (`latte_v2` only)

Set dynamic=true and the model chooses a compression ratio per input, always inside [dynamic_min_ratio, dynamic_max_ratio]. Use it when input difficulty varies and one fixed ratio would be a guess; use a fixed target_compression_ratio when you need a predictable token budget per call.

Fixed vs dynamic — pick one per call

Sending both is not an error, but dynamic=true always wins and target_compression_ratio is silently ignored for that request. The dynamic* parameters are rejected on latte_v1 with 422.

target_compression_ratio

Controls how aggressive the compression is, interpreted two different ways depending on the value. Both models share these semantics; every page that mentions a ratio refers back to this table.

Value	Meaning	Example
`0 < r ≤ 1`	Removal strength	`0.5` removes ~50% of tokens
`r > 1`	Nx target (max `200`)	`4` → ~¼ original
omit	Model default	—

Bounds

r = 0 and values above 200 are rejected with 422 Unprocessable Entity — the API does not silently clamp.

Not a keep-fraction

0.3 does not mean "keep 30%"; it means "remove ~30%".

Examples

python

For workload patterns (RAG, agent retrieval, search-result trimming) and end-to-end examples, see the query-specific compression guide.

Parameters

Dynamic mode (latte_v2 only)

target_compression_ratio

Examples

Dynamic mode (`latte_v2` only)