====== SOW Analyzer – Teams Bot Application ====== ===== 1. Purpose and Scope ===== This document provides **full technical documentation** for the SOW Analyzer Teams Bot application. It is intended for: * Engineers maintaining or extending the system * Security and architecture reviewers * Operations and platform teams * Knowledge transfer and onboarding This document describes: * Application runtime behavior * Source code responsibilities (file-by-file) * Authentication and cross-tenant identity * Teams message and file upload handling * Data handling, storage, and retention * Error cases and operational constraints --- ===== 2. Functional Description ===== The SOW Analyzer is a **document ingestion and analysis system** exposed through **Microsoft Teams**. Users interact with the system by uploading a **PDF Statement of Work** directly into a **1:1 Teams chat** with a bot. The system: - Validates the uploaded file - Downloads the PDF securely - Compares the document against known SOW templates - Uses Azure OpenAI to generate a structured comparison - Returns a summarized analysis to the user At no point does the system store uploaded SOWs long-term. --- ===== 3. High-Level System Flow ===== ==== 3.1 Request Lifecycle ==== User uploads PDF in Teams → Teams stores file in OneDrive/SharePoint → Teams sends attachment metadata to bot → Bot validates + downloads file → Bot POSTs PDF to analyzer endpoint → Analyzer extracts text and compares templates → Azure OpenAI produces structured diff → Result returned to Teams ==== 3.2 Key Constraints ==== * PDF-only uploads * Personal chat only * Stateless processing * No persistence of uploaded documents --- ===== 4. Tenant and Identity Model ===== ==== 4.1 Tenant Separation ==== | Component | Tenant | |----------|--------| | Teams users | CDW Tenant | | Bot App Registration | CDW Tenant | | Azure Bot Service | Azure Tenant | | Web App (FastAPI) | Azure Tenant | | Azure OpenAI | Azure Tenant | | Blob Storage (templates) | Azure Tenant | ==== 4.2 Bot Identity ==== The bot uses a **single-tenant App Registration in the CDW Tenant**. This App Registration: * Is referenced by Azure Bot Service * Signs JWT tokens presented to the bot * Is validated by the Bot Framework Adapter Required environment variables: MicrosoftAppId MicrosoftAppPassword MicrosoftAppTenantId The bot **will reject** any token whose AppId does not match `MicrosoftAppId`. --- ===== 5. Teams Integration Details ===== ==== 5.1 Teams App Manifest ==== Key configuration: "bots": [ { "botId": "", "scopes": ["personal"], "supportsFiles": true } ] ==== 5.2 Supported Interaction Types ==== | Interaction | Supported | |------------|-----------| | 1:1 Chat | YES | | Group Chat | NO | | Channel | NO | | Adaptive Cards | Optional | | Message Extensions | NO | ==== 5.3 File Upload Behavior ==== Teams uploads files to **OneDrive or SharePoint**, then sends metadata to the bot. Expected attachment type: application/vnd.microsoft.teams.file.download.info The bot **does not receive raw bytes directly**. --- ===== 6. Source Code Breakdown ===== ===== 6.1 app.py ===== ==== Purpose ==== Acts as the FastAPI entry point for: * Teams bot messages * SOW analysis workflow ==== Endpoints ==== === POST /api/messages === * Receives Teams activities * Delegates processing to `teams_messages()` * No business logic lives here === POST /analyze-sow === * Accepts multipart/form-data * Expects field name: `file` * Reads PDF bytes into memory * Executes SOW analysis pipeline * Returns structured JSON ==== Important Behavior ==== * No disk writes * No blob uploads * File exists only in memory --- ===== 6.2 teams_bot.py ===== ==== Purpose ==== Implements all Teams bot logic. ==== Major Responsibilities ==== * Bot Framework authentication * Activity validation * File validation and download * Integration with analyzer * User-facing messaging ==== Adapter Initialization ==== BotFrameworkAdapterSettings( MicrosoftAppId, MicrosoftAppPassword, channel_auth_tenant=MicrosoftAppTenantId ) This enforces: * Single-tenant authentication * Cross-tenant correctness ==== on_turn() Execution Flow ==== 1. Trust incoming service URL 2. Ignore non-message activities 3. Extract attachments 4. If no attachment → return silently 5. Extract filename + download URL 6. Enforce `.pdf` extension 7. Download file bytes 8. Validate PDF magic bytes (`%PDF`) 9. POST file to analyzer 10. Format and send result ==== PDF Enforcement ==== Two layers: * Filename check * Byte-level PDF header check ==== File Size Enforcement ==== Maximum allowed size enforced before analysis. ==== Storage Behavior ==== * File exists only as a Python byte array * Eligible for garbage collection after request --- ===== 6.3 gpt_compare_multi.py ===== ==== Purpose ==== Performs AI-driven comparison between uploaded SOW and known templates. ==== Responsibilities ==== * Load templates * Construct structured prompt * Call Azure OpenAI * Parse JSON response * Normalize output fields ==== Output Schema ==== { "chosen_template_title": "string", "summary": "string", "missing_sections": [], "extra_sections": [], "changed_clauses": {} } ==== AI Safety ==== * No fine-tuning * No training on customer data * Prompt-only inference --- ===== 6.4 layout_client.py ===== ==== Purpose ==== Provides access to stored SOW templates. ==== Storage ==== * Azure Blob Storage * Read-only access ==== Important Notes ==== * Only templates are stored * Uploaded SOWs are never written here --- ===== 7. Data Handling and Retention ===== ==== 7.1 Uploaded PDFs ==== * Stored in memory only * Lifetime limited to request scope * Never written to disk or blob ==== 7.2 External Storage ==== * Teams stores original file in OneDrive/SharePoint * Retention governed by M365 policies ==== 7.3 Logging ==== * File contents never logged * Metadata only (filename, size, correlation id) --- ===== 8. Error Handling ===== ==== User-Facing Errors ==== | Condition | Message | |---------|---------| | Non-PDF upload | "I can only analyze PDF files" | | Download failure | "I couldn’t download the file" | | Analyzer failure | "Analysis failed" | ==== Internal Errors ==== * Logged without sensitive data * Returned as generic user messages --- ===== 9. Security Model ===== ==== Authentication ==== * Bot Framework JWT validation * AppId + TenantId enforcement ==== Authorization ==== * Teams-scoped access * Optional user allow-listing via AAD object ID ==== Data Protection ==== * No persistent storage * No request body logging * No file caching --- ===== 10. Operational Considerations ===== ==== Deployment ==== * Azure App Service * Environment variable configuration * No startup jobs or migrations ==== Scaling ==== * Stateless * Horizontal scale-out supported ==== Monitoring ==== * Application Insights optional * Sensitive logging disabled --- ===== 11. Known Limitations ===== * Personal chat only * PDF files only * Channel uploads not supported * No user authentication beyond Teams identity --- ===== 12. Compliance Summary ===== The system: * Does not store customer documents * Minimizes data exposure * Uses Azure-native security boundaries * Aligns with enterprise data governance practices --- ===== 13. Maintenance Notes ===== Future enhancements may include: * Channel support via Graph * Adaptive Card responses * Template versioning * Per-user access controls