realvco Docs

How Tokens Are Priced

In AI, “how much does this conversation cost” is measured in tokens, not minutes. This page lays out the mechanics in plain language.


What Is a Token?

A token is the smallest unit the model processes. Not characters, not words — the model segments text into its own pieces.

Rough conversion:

Language1 Token ≈
English~0.75 word (e.g., “apple” = 1 token, “refrigerator” = 2 tokens)
Chinese~1 character = 1.5–2 tokens (e.g., “你好” = 3–4 tokens)
Code0.25–0.5 line (varies by language)

Example

In English:

“Hello, the weather is nice today, let’s go to the park!”

That’s 11 words, which the model typically splits into about 13 tokens.

In Chinese:

“你好,今天天氣真好,我們去公園散步吧!”

That’s 17 characters, which the model splits into about 28 tokens.

Chinese uses more tokens than English. Chatting in Chinese often costs 30–50% more than chatting in English.


Cost per Exchange

Formula:

Exchange cost =
  Input tokens × input price
+ Output tokens × output price

Key concept: input includes the conversation history.

Message 1 from you: 100 tokens
AI reply 1: 300 tokens
────────────────────────────────
Message 2 from you: 80 tokens + prior 400 = 480 tokens input
AI reply 2: 250 tokens
────────────────────────────────
Message 3 from you: 60 tokens + prior 730 = 790 tokens input
AI reply 3: 200 tokens

The longer the chat, the more each request costs, because every turn carries history. This is why context compression exists.


Model Pricing (2026-04)

OpenAI

ModelInput / 1MOutput / 1MBest For
GPT-4o$2.50$10.00Complex reasoning, creative work
GPT-4o-mini$0.15$0.60Daily chat, support
o1$15.00$60.00Deep reasoning (slow + expensive)
o1-mini$3.00$12.00Mid-tier reasoning

Anthropic

ModelInput / 1MOutput / 1MBest For
Claude 3.5 Sonnet$3.00$15.00Code, long-context
Claude 3.5 Haiku$0.80$4.00Fast responses
Claude 3 Opus$15.00$75.00Flagship

Google

ModelInput / 1MOutput / 1MBest For
Gemini 1.5 Pro$1.25$5.00Long context (2M)
Gemini 1.5 Flash$0.075$0.30Cheap, high volume
Gemini 2.0 Flash$0.10$0.40Newer version

Others

ModelNote
Groq (Llama, Mixtral)Fast and cheap
DeepSeekVery low price, strong on Chinese
Azure OpenAISame models as OpenAI, slightly different pricing

Prices change often — providers are in a price war. The table above reflects 2026-04; confirm current pricing on each vendor’s site.


Example: One Day of Support Conversations

Ada handles 50 customer questions per day. Each conversation averages 5 turns, with ~100 tokens per turn on either side.

Total tokens:

  • Message portion: 50 × 5 × 2 × 100 = 50,000 tokens (raw messages)
  • History accumulation: each turn carries prior turns, roughly 150,000 tokens input + 50,000 tokens output

By model:

ModelCost / Day (USD)Month (30 days)
GPT-4o-mini$0.05$1.5
GPT-4o$0.88$26.4
Claude Sonnet$1.20$36.0
Gemini Flash$0.03$0.9

Takeaway: same workload, expensive vs cheap can differ by 40×.


Checking Your Own Consumption

In the Admin Panel:

Quickest “gut check”:

  1. Ask the AI any question
  2. Some chat UIs show “this exchange: X tokens” at the bottom
  3. Multiply by the model’s unit price → cost of this exchange

Estimating Your Bill

Personal use (~20 exchanges/day):

  • GPT-4o-mini → $5–10 / month
  • Claude Haiku → $15–30 / month
  • GPT-4o → $30–60 / month

Small-team support (100 exchanges/day):

  • GPT-4o-mini → $30–60 / month
  • Claude Sonnet → $200–400 / month

Mid-size e-commerce support (500 exchanges/day):

  • GPT-4o-mini → $150–300 / month
  • GPT-4o → $2,000–4,000 / month

Rule of thumb: picking the right model can cut cost by 10–20×. If you just say “hello” and the AI replies “hi” in 5 seconds, you probably don’t need $15/1M-token o1.