Cerebras Code FAQ
What is Cerebras Code?
Cerebras Code is a set of subscriptions for developers to access high-speed code generation LLMs via API powered by ZAI-GLM 4.7. It runs on Cerebras hardware at up to 1,000 tokens/sec.
What Plans are Available?
Cerebras Code Pro ($50/month**)
Requests Per Minute | Tokens Per Minute | Tokens Per Day |
50 | 1,000,000 | 24M |
Cerebras Code Max ($200/month**)
Requests Per Minute | Tokens Per Minute | Tokens Per Day |
120 | 1,500,000 | 120M |
Note that requests per second is 10% of requests per minute.
Both plans offer tremendous value over buying a la carte.
**All rate limits listed are subject to change.
What happens if I hit my daily request limit?
Once you reach your quota, you’ll get a clear error. You can either wait for your daily limit to reset or upgrade to a higher plan (e.g. Pro → Max) for more throughput.
Why am I seeing variance speeds much lower than advertised?
We’re currently operating at near 100% utilization due to high demand. During peak times, your request may be briefly queued. Queue times are typically shorter during off-peak times (outside US working hours).
Our hardware processes output tokens at approximately 1,000 tokens / sec. However, some 3rd party integrations/applications may add server latency and queue times into their reported speed calculation. This may cause reported token gen speed from 3rd parties to be lower than our actual hardware’s token gen speed.
We are actively working to reduce queue times by bringing up more systems to address our demand.
I’m getting 429 errors even though I haven’t hit my limits—what’s going on?
Cerebras enforces requests per second (RPS) limits. Because our inference is much faster than other providers, some tools (like RooCode) may send rapid bursts that exceed those limits. To avoid errors, configure retry delays between requests to smooth out spikes.
Can I cancel or change my plan?
Yes—plans are monthly and can be managed anytime via your dashboard.
Scheduled downgrades and cancellations take place at the end of your billing period. You'll still have access to your current rate limits until the end of the billing period. There are no prorated refunds.
Upgrades will immediately bill the prorated difference (between the old and new plan) for the current billing cycle. Your rate limits will also be immediately upgraded.
What happens when I cancel?
All plans are prepaid. When you cancel, your access continues through the end of your billing cycle. There are no partial refunds.
For questions, please email support@cerebras.net
What are messages? How do they differ from rate limits?
Cerebras initially framed messages as a way to align with developer expectations around services that other coding platforms provide. Because this varies based on your choice of IDE/tool, we are providing explicit rate limits going forward. This transparency will help developers better plan their usage.
