Models
Compression models for different use cases.
espresso_v1General-purpose compression — no query needed. Removes redundant tokens while preserving meaning. Ideal for pre-compressing documents, system prompts, or any context you want to reuse across multiple queries.
Parameters
context— text to compress
Best For
- System prompts
- Document pre-processing
- Reusable context
latte_v1Query-specific compression that preserves tokens relevant to a given query. Ideal for RAG pipelines and Q&A systems where you want to keep answer-relevant information while compressing the rest.
Parameters
context— text to compressquery— what to preserve
Best For
- RAG pipelines
- Q&A systems
- Query-aware compression
Compression Ratio
Control how aggressively to compress via target_compression_ratio.
| Value | Result |
|---|---|
| 0.2 | Light — keeps 80% |
| 0.5 | Balanced — keeps 50% (default) |
| 0.9 | Aggressive — keeps 10% |