🔢 Token Counter

OpenAI GPT-4 / GPT-3.5 compatible · Real‑time stats · Mixed English/Chinese estimation

📝 Input text (real‑time counting)

TOKENS

OpenAI estimate

CHARACTERS

with spaces

WORDS

English/numbers

💰 API Cost Estimate

GPT-4 Turbo

Input cost $0.0000

Output cost (assuming same tokens) $0.0000

Total estimate $0.0000

* GPT-4 Turbo: $0.01/1K input tokens, $0.03/1K output tokens

📊 1 Chinese char ≈ 1.8 tokens

📉 1 English word ≈ 1.3 tokens

⚡ cl100k_base encoding

📖 Online Token Counter: OpenAI API Cost Estimator

Tokens are the basic unit processed by large language models like OpenAI and Anthropic. One token is roughly 0.75 English words or 0.4 Chinese characters. ng.cc's online token counter is based on OpenAI's cl100k_base encoding, providing accurate token estimates for GPT-4, GPT-3.5, and GPT-4 Turbo. Real‑time character, word, and token counts, plus automatic API cost calculation. 100% client‑side – your prompts never leave your browser.

⚡ Real‑time stats

Count as you type, zero delay. Supports mixed English/Chinese text, automatically identifies Chinese characters, English, digits, punctuation.

💰 Cost estimation

Based on official OpenAI pricing, computes input/output costs. Supports GPT-4 Turbo and GPT-3.5 Turbo.

🔢 Three‑metric display

Tokens, characters, and words side by side – fully evaluate text length.

🔒 Privacy first

100% local computation, zero network requests. Your API prompts, business copy, paper drafts stay in your browser.

🎯 Token Counting Use Cases

🤖 Prompt optimization: Ensure prompts fit model context windows (GPT-4 Turbo 128K).
💰 Cost control: Estimate API costs before calling, avoid surprises.
📚 Long document processing: Calculate total tokens for novels like The Three‑Body Problem.
🎓 Research: Statistically analyze token conversion ratios across languages.
🧪 Model comparison: Compare token counts for the same text under different encodings.

💡 Token Estimation Details

🔹 Chinese characters

1 Chinese char ≈ 1.8 tokens. The cl100k_base encoding splits Chinese characters into 2‑3 bytes, resulting in about 2 tokens on average. This tool uses a weighted average, more accurate than simply dividing by 2.

🔹 English characters

1 English letter ≈ 0.25 tokens; 1 English word ≈ 1.3 tokens. Common words like "the", "is" occupy 1 token, longer words may be split into multiple tokens.

🔹 Punctuation/Spaces

About 0.5 tokens/character. Punctuation is often merged with words; isolated punctuation counts as 1 token.

📊 Model Context Windows

GPT-3.5 Turbo (16K): 16,385 tokens
GPT-4 (8K): 8,192 tokens
GPT-4 (32K): 32,768 tokens
GPT-4 Turbo (128K): 128,000 tokens
Claude 2.1: 200,000 tokens

❓ Frequently Asked Questions

❓ Q1: Is this tool 100% accurate compared to OpenAI's official counter?

This tool uses an estimation algorithm; the difference from OpenAI's official tiktoken library is within ±5%. The official library uses BPE (Byte‑Pair Encoding) exact tokenization; this is a lightweight frontend implementation that doesn't require loading large files. For most scenarios, the estimate is accurate enough. For 100% precision, use the official OpenAI SDK.

❓ Q2: Why 1.8 tokens per Chinese character instead of 2?

Measurements show that in cl100k_base: common characters (like “的”, “是”) use 1 token, while rare characters use 2‑3 tokens. This tool applies a weighted average, which is closer to the true distribution than a fixed factor of 2. You can test with pure Chinese paragraphs – the error is usually within an acceptable range.

❓ Q3: How can I reduce token consumption?

1. Trim your prompt: remove redundant modifiers.
2. Use English: for the same information, English tends to use fewer tokens than Chinese.
3. Shorten instructions: replace “Please explain in detail” with “Explain”.
4. Reuse conversation context with the assistant role. Our “Concise” polishing mode can help compress text.

❓ Q4: Does it support token counting for other models?

This tool is based on cl100k_base encoding, optimized for OpenAI GPT‑4/GPT‑3.5 series. Anthropic Claude uses a different tokenizer and is not compatible. Llama 2, Ernie, and other models define tokens differently – do not apply this estimate directly.

❓ Q5: Is my text sent to any server?

Absolutely not. This is a static HTML page – all token counting and cost estimation run inside your browser's JavaScript engine. You can even go offline and it still works. Your prompts, API keys, and business data never leave your device.

❓ Q6: What is the maximum text length supported?

There is no hard limit, but browsers have performance boundaries for textareas. We recommend inputting no more than 10,000 characters at once. For longer documents, split them and count in batches. File upload is planned for a future version.

🔗 Recommended Tools

This token counter is part of the ng.cc AI toolkit. You might also like:

AI Prompt Engineer – Structured prompt generation
Chat Cleaner – Extract clean user‑assistant conversations
AI Writing Polisher – Local optimization of AI‑generated text
JSON Formatter & Validator – Beautify and validate JSON
Markdown Editor – Live preview writing

⚡ All statistics are computed locally – your text never leaves your browser.