DocsAvailable Models

Models

Compression models for different use cases.

espresso_v1

General-purpose compression — no query needed. Removes redundant tokens while preserving meaning. Ideal for pre-compressing documents, system prompts, or any context you want to reuse across multiple queries.

Parameters

  • context — text to compress

Best For

  • System prompts
  • Document pre-processing
  • Reusable context
latte_v1

Query-specific compression that preserves tokens relevant to a given query. Ideal for RAG pipelines and Q&A systems where you want to keep answer-relevant information while compressing the rest.

Parameters

  • context — text to compress
  • query — what to preserve

Best For

  • RAG pipelines
  • Q&A systems
  • Query-aware compression

Compression Ratio

Control how aggressively to compress via target_compression_ratio.

ValueResult
0.2Light — keeps 80%
0.5Balanced — keeps 50% (default)
0.9Aggressive — keeps 10%