This is an old revision of the document!
| Model | Tokens/Minute (TPM) | Requests/Minute (RPM) | Latency | Throughput | Error Handling |
|---|---|---|---|---|---|
| gpt-4o | Up to 450K per region | Varies by deployment | Low (real-time) | High (streaming) | Handles large prompts; monitor for 429/500; implement retries/backoff |
| gpt-4 | Varies by deployment | Varies by deployment | Moderate (~1.3s avg) | Moderate | Monitor 429; limit prompt size; retries |
| gpt-4.1 | Up to 30K TPM (enforced) | Varies by deployment | Moderate | Moderate | Known 500s in regions; monitor 429/500 |
| gpt-4.1-mini | Not publicly documented | Not publicly documented | Likely low | Likely high | General best practices apply |
| gpt-4-32k | Varies by deployment | Varies by deployment | Higher (context size) | Lower | Monitor 429; 32K max prompt |
| gpt-35-turbo-16k | Varies by deployment | Varies by deployment | Low (~900ms avg) | High | Monitor 429; 16K max prompt |
| gpt-35-turbo | Varies by deployment | Varies by deployment | Low (~900ms avg) | High | Monitor 429; 4K max prompt |
| gpt-35-turbo-instruct | Varies by deployment | Varies by deployment | Low | High | Monitor 429; 4K max prompt |
| gpt-4.5-preview | Not publicly documented | Not publicly documented | Experimental | Experimental | Pre-release; expect bugs; robust error handling |
| gpt-4.1-nano | Not publicly documented | Not publicly documented | Likely very low | Likely very high | General best practices apply |
| gpt-image-1 | Not publicly documented | Not publicly documented | Moderate | Moderate | Monitor image-specific errors |
| gpt-4o-mini / tts / audio | Not publicly documented | Not publicly documented | Very low (real-time) | High | Monitor audio errors; use proper input format |
| Category | GPT-4o | GPT-4.1 (Turbo) | Winner |
|---|---|---|---|
| Reasoning | Equal or slightly better | Strong performance | Tie |
| Coding | Better real-time | Better in benchmarks | GPT-4.1 |
| Math | Better interpretive | Better symbolic precision | Tie / GPT-4.1 |
| Instruction Following | More expressive | More formal | GPT-4o |
| Multilingual | Better tokenization | Less efficient | GPT-4o |
| Image Understanding | Native support | Not supported | GPT-4o |
| Speech/TTS | Built-in STT/TTS | Not supported | GPT-4o |
| Expressiveness | Dynamic & expressive | Flat tone | GPT-4o |
| Factual Accuracy | Similar cutoff | Similar cutoff | Tie |
| Steerability | Strong tone/style ctrl | Text only | GPT-4o |
| Token Efficiency | Better compression | Slightly worse | GPT-4o |
Summary:
| Model | Avg Latency (Time to First Token) | Notes |
|---|---|---|
| GPT-4o | ~5 seconds (Azure) | Optimized for low latency + multimodal tasks |
| GPT-4.1 | ~45 seconds for 1000–1500 tokens | Higher latency, especially for long completions |
| Model | TPM | RPM | Notes |
|---|---|---|---|
| GPT-4o | 150,000 | 900 | Higher quotas available via enterprise |
| GPT-4.1 | 3,000/PTU | Varies | Dependent on Provisioned Throughput |
Prompt: Calculate ΔH (kJ/mol NaNO₃) using a calorimeter (451 J/°C), 0.0300 mol NaOH, 1000 mL of 0.0300 M HNO₃, T↑ from 23.000°C → 23.639°C. Assume 4.18 J/g°C, 1.00 g/mL.
GPT-4.1 Output:
GPT-4o Output:
Correct Calculation: