How fast are Cerebras models?

Last updated: June 30, 2025

Here are the output speeds (tokens per second) of each of our available models:

  • llama3.1-8b: ~2200 tok/sec

  • llama-3.3-70b: ~2100 tok/sec

  • llama-4-scout-17b-16e-instruct: ~2600 tok/sec

  • qwen-3-32b: ~2100 tok/sec

  • deepseek-r1-distill-llama-70b: ~1700 tok/sec