This shows you the differences between two versions of the page.
| wiki:ai:ai-operational-plan [2025/05/28 17:35] – created ddehamer | wiki:ai:ai-operational-plan [2025/05/28 17:37] (current) – ddehamer | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| ====== Operational Plan: Managed AI Services Team ====== | ====== Operational Plan: Managed AI Services Team ====== | ||
| - | ====== 1. Team Structure & Roles ====== | + | ===== 1. Team Structure & Roles ===== |
| | Role | Responsibilities | | | Role | Responsibilities | | ||
| Line 11: | Line 11: | ||
| | Customer Success Engineer | Handle service requests, documentation, | | Customer Success Engineer | Handle service requests, documentation, | ||
| - | ====== 2. Scope of Services | + | ===== 2. Scope of Services ===== |
| AI Platforms Only: | AI Platforms Only: | ||
| Line 37: | Line 37: | ||
| - Issue response and remediation | - Issue response and remediation | ||
| - | ====== 3. Core Operations | + | ===== 3. Core Operations ===== |
| - | ===== Provisioning & Deployment | + | ==== Provisioning & Deployment ==== |
| · Use IaC tools (Terraform, Bicep, Deployment Manager) | · Use IaC tools (Terraform, Bicep, Deployment Manager) | ||
| Line 47: | Line 47: | ||
| · Bootstrap scripts for API/ | · Bootstrap scripts for API/ | ||
| - | ===== Automation & Shell Command Support | + | ==== Automation & Shell Command Support ==== |
| · Secure shell (SSH) access with audit logging | · Secure shell (SSH) access with audit logging | ||
| Line 55: | Line 55: | ||
| · CI/CD pipelines for model deployment | · CI/CD pipelines for model deployment | ||
| - | ===== Monitoring & Observability | + | ==== Monitoring & Observability ==== |
| · System monitoring: CPU, GPU, disk | · System monitoring: CPU, GPU, disk | ||
| Line 65: | Line 65: | ||
| · Alerts via Slack/ | · Alerts via Slack/ | ||
| - | ===== Python Programming Services | + | ==== Python Programming Services ==== |
| · Support JupyterHub | · Support JupyterHub | ||
| Line 73: | Line 73: | ||
| · Support SDKs: openai, boto3, google-cloud-aiplatform, | · Support SDKs: openai, boto3, google-cloud-aiplatform, | ||
| - | ===== Issue Remediation Workflow | + | ==== Issue Remediation Workflow ==== |
| · Detection – Alert received | · Detection – Alert received | ||
| Line 85: | Line 85: | ||
| · Postmortem – RCA documentation | · Postmortem – RCA documentation | ||
| - | ====== 4. Security and Access Control | + | ===== 4. Security and Access Control ===== |
| · RBAC and IAM per platform with least privilege | · RBAC and IAM per platform with least privilege | ||
| Line 95: | Line 95: | ||
| · Data encryption at rest and in transit | · Data encryption at rest and in transit | ||
| - | ====== 5. Toolchain | + | ===== 5. Toolchain ===== |
| IaC: Terraform, Bicep, Deployment Manager | IaC: Terraform, Bicep, Deployment Manager | ||
| Line 109: | Line 109: | ||
| CI/CD: GitHub, GitLab, Azure DevOps | CI/CD: GitHub, GitLab, Azure DevOps | ||
| - | ====== 6. SLA & Reporting | + | ===== 6. SLA & Reporting ===== |
| | Metric | Target | | | Metric | Target | | ||
| Line 118: | Line 118: | ||
| | Monthly Review | Cost, optimization, | | Monthly Review | Cost, optimization, | ||
| - | ====== 7. Knowledge Management | + | ===== 7. Knowledge Management ===== |
| · Maintain runbooks and playbooks | · Maintain runbooks and playbooks | ||