====== MAIT-510 – Learn Azure OpenAI: GPT ====== ===== Model Overview and Comparison ===== ^ Model ^ Tokens/Minute (TPM) ^ Requests/Minute (RPM) ^ Latency ^ Throughput ^ Error Handling ^ | **gpt-4o** | Up to 450K per region | Varies by deployment | Low (real-time) | High (streaming) | Handles large prompts; monitor for 429/500; implement retries/backoff | | **gpt-4** | Varies by deployment | Varies by deployment | Moderate (~1.3s avg) | Moderate | Monitor 429; limit prompt size; retries | | **gpt-4.1** | Up to 30K TPM (enforced) | Varies by deployment | Moderate | Moderate | Known 500s in regions; monitor 429/500 | | **gpt-4.1-mini** | Not publicly documented | Not publicly documented| Likely low | Likely high | General best practices apply | | **gpt-4-32k** | Varies by deployment | Varies by deployment | Higher (context size) | Lower | Monitor 429; 32K max prompt | | **gpt-35-turbo-16k** | Varies by deployment | Varies by deployment | Low (~900ms avg) | High | Monitor 429; 16K max prompt | | **gpt-35-turbo** | Varies by deployment | Varies by deployment | Low (~900ms avg) | High | Monitor 429; 4K max prompt | | **gpt-35-turbo-instruct** | Varies by deployment | Varies by deployment | Low | High | Monitor 429; 4K max prompt | | **gpt-4.5-preview** | Not publicly documented | Not publicly documented| Experimental | Experimental | Pre-release; expect bugs; robust error handling | | **gpt-4.1-nano** | Not publicly documented | Not publicly documented| Likely very low | Likely very high | General best practices apply | | **gpt-image-1** | Not publicly documented | Not publicly documented| Moderate | Moderate | Monitor image-specific errors | | **gpt-4o-mini / tts / audio** | Not publicly documented | Not publicly documented| Very low (real-time) | High | Monitor audio errors; use proper input format | ===== GPT-4o vs GPT-4.1 Turbo Comparison ===== ^ Category ^ GPT-4o ^ GPT-4.1 (Turbo) ^ Winner ^ | Reasoning | Equal or slightly better| Strong performance | Tie | | Coding | Better real-time | Better in benchmarks | GPT-4.1 | | Math | Better interpretive | Better symbolic precision | Tie / GPT-4.1 | | Instruction Following | More expressive | More formal | GPT-4o | | Multilingual | Better tokenization | Less efficient | GPT-4o | | Image Understanding | Native support | Not supported | GPT-4o | | Speech/TTS | Built-in STT/TTS | Not supported | GPT-4o | | Expressiveness | Dynamic & expressive | Flat tone | GPT-4o | | Factual Accuracy | Similar cutoff | Similar cutoff | Tie | | Steerability | Strong tone/style ctrl | Text only | GPT-4o | | Token Efficiency | Better compression | Slightly worse | GPT-4o | **Summary:** * **GPT-4.1**: Best for symbolic reasoning, coding, structured QA. * **GPT-4o**: Best for multimodal, expressiveness, efficiency, speech/image. ===== Latency Comparison ===== ^ Model ^ Avg Latency (Time to First Token) ^ Notes ^ | GPT-4o | ~5 seconds (Azure) | Optimized for low latency + multimodal tasks | | GPT-4.1 | ~45 seconds for 1000–1500 tokens | Higher latency, especially for long completions | ===== Throughput Comparison ===== ^ Model ^ TPM ^ RPM ^ Notes ^ | GPT-4o | 150,000 | 900 | Higher quotas available via enterprise | | GPT-4.1 | 3,000/PTU | Varies | Dependent on Provisioned Throughput | ===== Use Cases ===== ==== 1. Automated IT Support & Triage ==== * **Use**: GPT-4o or GPT-4.1 * **Tasks**: Triage tickets, Tier-1 fixes, generate CLI, summarize alerts * **Benefits**: Faster, reduces L1 work, integrates with ServiceNow or DevOps ==== 2. Infrastructure-as-Code Review ==== * **Use**: GPT-4.1 * **Tasks**: Review Bicep/ARM/Pulumi, validate configs * **Benefits**: Promotes standardization, catches misconfigs ==== 3. Security & Policy Review ==== * **Use**: GPT-4o / GPT-4.1 * **Tasks**: Analyze IAM, firewalls, audit logs; policy translation * **Benefits**: Faster audits, stronger compliance, cross-team alignment ===== Manual Testing: Thermodynamics Prompt ===== **Prompt:** Calculate ΔH (kJ/mol NaNO₃) using a calorimeter (451 J/°C), 0.0300 mol NaOH, 1000 mL of 0.0300 M HNO₃, T↑ from 23.000°C → 23.639°C. Assume 4.18 J/g°C, 1.00 g/mL. **GPT-4.1 Output:** * Heat (solution): **2673.3 J** ❌ (should be 2671.02) * Calorimeter: 288.4 J * Total q: 2961.7 J * ΔH = **–98.7 kJ/mol** **GPT-4o Output:** * Heat (solution): **2672.82 J** ❌ * Calorimeter: 288.69 J * Total q: 2961.51 J * ΔH = **–98.7 kJ/mol** **Correct Calculation:** * 1000 × 4.18 × 0.639 = **2671.02 J** ✅ * Total heat = 2671.02 + 288.69 = **2959.71 J** * ΔH = –2959.71 / 0.0300 = **–98.7 kJ/mol** ===== Conclusion ===== * GPT-4.1 = better **explanations**, but made arithmetic errors. * GPT-4o = better **numerical skill**, but also rounded incorrectly. * Both models accepted feedback—but repeated the **same mistake**. * ChatGPT (web version) corrected its error and gave the **correct final answer**. * Playground versions seem more prone to **repeating numeric errors**. * **GPT-4.1** = best for detailed QA/debug work. * **GPT-4o** = best for expressive, real-time, multimodal tasks.