Ready for approval 2026/02/12 17:42 by mcarver | Approver: @ai-us-principals
SOW Analyzer – Technical User Guide & Reference Manual
1. System Overview
The SOW Analyzer is a Microsoft Teams-integrated AI validation system that evaluates uploaded Statements of Work (SOWs) against predefined “golden” template documents.
Architecture Components:
Microsoft Teams Bot (Bot Framework)
Azure Web App (FastAPI backend)
Azure Document Intelligence (prebuilt-layout model)
Azure OpenAI (GPT-based semantic comparison engine)
The system supports:
Uploaded documents are processed in-memory and are not persisted.
2. High-Level Architecture Flow
Teams User
↓
Teams Bot (teams_bot.py)
↓
File Download + Validation
↓
POST /analyze-sow (app.py)
↓
Azure Document Intelligence (layout extraction)
↓
Azure OpenAI GPT comparison
↓
Structured Response to Teams
3. Teams Bot (teams_bot.py)
3.1 Message Handling
Entry point:
async def on_turn(turn_context: TurnContext)
Behavior:
3.2 Supported File Types
Defined in:
SUPPORTED_FILE_TYPES = {
".pdf": "application/pdf",
".doc": "application/msword",
".docx": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
}
Extension gating:
if not any(filename.lower().endswith(ext) for ext in SUPPORTED_FILE_TYPES.keys()):
3.3 File Signature Validation
Function:
def _validate_file_signature(filename: str, data: bytes)
Validation Rules:
PDF → Must begin with %PDF
DOCX → Must begin with PK (ZIP header)
DOC → Must begin with OLE header (D0 CF 11 E0 A1 B1 1A E1)
Purpose:
Prevent file type spoofing
Enforce integrity of uploaded documents
Mitigate malicious upload risk
3.4 Download Logic
Function:
async def _download_bytes(url: str, filename: str) -> bytes
Security controls:
3.5 Analyzer POST Logic
Function:
async def _post_to_analyzer(filename: str, file_bytes: bytes, content_type: str)
Behavior:
4. Backend API (app.py)
4.1 Endpoint Definition
@app.post("/analyze-sow")
async def analyze_sow(file: UploadFile = File(...))
Validation:
Content-Type allowlist
Extension allowlist
Empty file detection
Allowed MIME types:
4.2 Content-Type Mapping Logic
if filename.endswith(".docx"):
content_type = "application/vnd.openxmlformats-officedocument.wordprocessingml.document"
elif filename.endswith(".doc"):
content_type = "application/msword"
else:
content_type = "application/pdf"
Purpose:
def extract_layout_text(file_bytes: bytes, content_type: str) -> str
Uses:
Call Pattern:
poller = client.begin_analyze_document(
"prebuilt-layout",
document=file_bytes,
content_type=content_type
)
Output:
Structured page-aware text
Preserves headings, paragraphs, and tables
Returns full reconstructed textual representation
6. GPT Comparison Engine
6.1 Function
compare_against_best_template(
uploaded_sow_text,
golden_templates
)
Purpose:
Compare uploaded SOW text against approved template corpus
Identify missing clauses
Detect semantic deviations
Provide structured gap analysis
Characteristics:
7. Golden Template Management
Golden templates are loaded at application startup and held in memory.
Design Principles:
8. Security Architecture
8.1 No Persistent Storage
Uploaded files:
Processing is memory-bound only.
Controls implemented:
Extension allowlist
MIME type enforcement
Signature verification
File size limits
HTTP status validation
Timeout configuration
8.3 AI Isolation
Service separation:
Teams Bot → Presentation layer
FastAPI → Orchestration layer
Document Intelligence → Extraction layer
Azure OpenAI → Reasoning layer
Benefits:
Clear responsibility boundaries
Easier auditing
Reduced lateral movement risk
Improved troubleshooting clarity
9. Configuration Reference
Required Environment Variables:
Timeouts:
Download: 60s
Analyzer call: 180s
File Size Limit:
10. Error Handling Behavior
Error Categories:
Unsupported file type → User-facing validation message
Empty file → 400 response
Layout extraction error → 500 with diagnostic message
GPT comparison error → 500
Analyzer HTTP failure → surfaced to Teams user
Errors are intentionally explicit to simplify debugging in production.
11. Extension Points
Potential enhancements:
Risk scoring model
Structured JSON scoring output
Audit log integration
Template versioning
Clause-level similarity scoring
Confidence scoring metrics
12. Operational Characteristics
Performance Factors:
Scalability:
13. Compliance Considerations
No long-term data retention
No model fine-tuning on customer data
Deterministic template baseline
Azure-hosted inference services
Explicit file validation controls
14. Summary
The SOW Analyzer is a stateless, AI-driven SOW validation engine embedded directly in Microsoft Teams. It combines structured document extraction and semantic reasoning while maintaining strong governance controls and zero persistent storage of uploaded contractual documents.
It is designed for:
Secure enterprise AI adoption
Operational contract standardization
Reduced review cycle time
Enforced template compliance
Scalable AI document validation