User Tools

Site Tools


wiki:ai:video-keyword-transcription

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
wiki:ai:video-keyword-transcription [2025/05/30 17:07] – created ddehamerwiki:ai:video-keyword-transcription [2025/06/02 12:41] (current) ddehamer
Line 61: Line 61:
  
 azure_video_highlight_demo.py azure_video_highlight_demo.py
-<code>+ 
 +<code ->
 #!/Users/don.dehamer/.local/pipx/venvs/azure-ai-textanalytics/bin/python3.13 #!/Users/don.dehamer/.local/pipx/venvs/azure-ai-textanalytics/bin/python3.13
  
Line 169: Line 170:
     for phrase in key_phrases:     for phrase in key_phrases:
         print("-", phrase)         print("-", phrase)
 +</code>
 +
 +===== Telemetry =====
 +
 +To connect telemetry from this script and the Azure Cognitive Services it uses (Video Indexer and Text Analytics) to **Azure Application Insights**, you would need to **instrument the Python script** to:
 +
 +  - Send custom telemetry (e.g., video processing status, API latency, errors)
 +  - Optionally, use **dependency tracking** to capture outbound API calls
 +
 +----
 +
 +===== ✅ Step-by-Step: Connect Script Telemetry to Application Insights =====
 +
 +----
 +
 +==== 🧱 1. Create an Application Insights Resource ====
 +
 +  - Go to Azure Portal
 +  - Click **Create a resource**
 +  - Search for **Application Insights**
 +  - Choose:
 +    * **Resource group**
 +    * **Region**
 +    * **Name**
 +    * **Application Type**: General
 +  - Click **Create**
 +  - After it's created, go to the resource and copy the **Instrumentation Key** or **Connection String**
 +
 +----
 +
 +==== 🧪 2. Install Application Insights SDK for Python ====
 +
 +<code ->
 +bashCopyEditpip install opencensus-ext-azure opencensus-ext-requests opencensus
 +
 +</code>
 +
 +----
 +
 +==== 🧰 3. Add Instrumentation to Your Script ====
 +
 +At the top of your script, import and configure the telemetry client:
 +
 +<code ->
 +pythonCopyEditfrom opencensus.ext.azure.log_exporter import AzureLogHandler
 +from opencensus.ext.azure.trace_exporter import AzureExporter
 +from opencensus.trace.samplers import ProbabilitySampler
 +from opencensus.trace.tracer import Tracer
 +import logging
 +
 +# Replace with your Application Insights connection string
 +APP_INSIGHTS_CONNECTION_STRING = "InstrumentationKey=<your-key>"
 +
 +# Set up logging
 +logger = logging.getLogger(__name__)
 +logger.addHandler(AzureLogHandler(connection_string=APP_INSIGHTS_CONNECTION_STRING))
 +logger.setLevel(logging.INFO)
 +
 +# Set up tracing (optional)
 +tracer = Tracer(
 +    exporter=AzureExporter(connection_string=APP_INSIGHTS_CONNECTION_STRING),
 +    sampler=ProbabilitySampler(1.0),
 +)
 +
 +</code>
 +
 +----
 +
 +==== 📝 4. Log Custom Events, Metrics, and Errors ====
 +
 +Throughout your script, add telemetry like this:
 +
 +<code ->
 +pythonCopyEditlogger.info("Uploading video to Azure Video Indexer...")
 +
 +# On success
 +logger.info("Video uploaded successfully. Video ID: %s", video_id)
 +
 +# On API error
 +logger.error("Chunk %d failed: %s", i, response[0].error)
 +
 +# Add custom dimensions
 +logger.info("Processing completed", extra={
 +    "custom_dimensions": {
 +        "video_id": video_id,
 +        "transcript_length": len(transcript),
 +        "total_key_phrases": len(all_phrases)
 +    }
 +})
 +
 +</code>
 +
 +----
 +
 +==== 📡 5. Monitor in Application Insights ====
 +
 +After running the script:
 +
 +  * Go to **Application Insights > Logs (Analytics)** and run queries like:
 +
 +<code ->
 +kustoCopyEdittraces | where customDimensions.video_id contains "xyz"
 +
 +</code>
 +
 +Or view:
 +
 +  * **Failures** (to track errors)
 +  * **Performance** (request/response durations if traced)
 +  * **Custom Events & Metrics**
 +
 +----
 +
 +===== 🧠 Bonus: Dependency Tracking =====
 +
 +If you want to **automatically track outbound HTTP requests** (e.g., calls to Video Indexer or Text Analytics APIs):
 +
 +<code ->
 +pythonCopyEditimport requests
 +from opencensus.ext.requests import trace
 +
 +trace.trace_integration()
 +
 +</code>
 +
 +This will auto-record dependency duration and failures to App Insights.
 +
 +----
 +
 +===== 📘 Summary =====
 +
 +^ Element ^ Purpose ^
 +| ''AzureLogHandler'' | Sends custom logs to App Insights |
 +| ''Tracer + AzureExporter'' | Sends operation spans and dependency telemetry |
 +| ''trace_integration()'' | Automatically tracks outgoing HTTP requests |
 +| Custom ''logger.info()'' | Manually report app behavior, events, errors |
 +
 +===== Combined code in one script =====
 +
 +  Fun Fact:  Python 3.13 broke compatibility with Analytics so you have to run an older version for this to work.
 +
 +<code>
 +#!/Users/don.dehamer/.local/pipx/venvs/azure-ai-textanalytics/bin/python3.13
 +
 +import time
 +import requests
 +import re
 +from azure.ai.textanalytics import TextAnalyticsClient
 +from azure.core.credentials import AzureKeyCredential
 +from opencensus.ext.azure.log_exporter import AzureLogHandler
 +from opencensus.ext.azure.trace_exporter import AzureExporter
 +from opencensus.trace.samplers import ProbabilitySampler
 +from opencensus.trace.tracer import Tracer
 +import logging
 +
 +# --- CONFIGURATION ---
 +
 +# Azure Video Indexer
 +LOCATION = "trial"  # or your region
 +ACCOUNT_ID = "your_account_id"
 +VIDEO_INDEXER_API_KEY = "your_video_indexer_api_key"
 +VIDEO_PATH = "sample_video.mp4"
 +VIDEO_NAME = "demo_video_ai"
 +
 +# Azure Text Analytics
 +TEXT_ANALYTICS_KEY = "your_text_analytics_key"
 +TEXT_ANALYTICS_ENDPOINT = "https://your-textanalytics-resource.cognitiveservices.azure.com/"
 +
 +# Application Insights
 +APP_INSIGHTS_CONNECTION_STRING = "InstrumentationKey=your_instrumentation_key"
 +
 +# --- Logging and Telemetry Setup ---
 +logger = logging.getLogger(__name__)
 +logger.addHandler(AzureLogHandler(connection_string=APP_INSIGHTS_CONNECTION_STRING))
 +logger.setLevel(logging.INFO)
 +
 +tracer = Tracer(
 +    exporter=AzureExporter(connection_string=APP_INSIGHTS_CONNECTION_STRING),
 +    sampler=ProbabilitySampler(1.0),
 +)
 +
 +# --- Utility Functions ---
 +
 +def split_text_by_characters(text, max_chars=4000):
 +    chunks = []
 +    while len(text) > max_chars:
 +        end = text.rfind('.', 0, max_chars)
 +        if end == -1:
 +            end = max_chars
 +        chunk = text[:end + 1].strip()
 +        chunks.append(chunk)
 +        text = text[end + 1:].strip()
 +    if text:
 +        chunks.append(text)
 +    return chunks
 +
 +# --- Azure Video Indexer Functions ---
 +
 +def get_access_token():
 +    url = f"https://api.videoindexer.ai/Auth/{LOCATION}/Accounts/{ACCOUNT_ID}/AccessToken?allowEdit=true"
 +    headers = {"Ocp-Apim-Subscription-Key": VIDEO_INDEXER_API_KEY}
 +    response = requests.get(url, headers=headers)
 +    response.raise_for_status()
 +    return response.text.strip('"')
 +
 +def upload_video(token):
 +    logger.info("Uploading video to Azure Video Indexer...")
 +    with open(VIDEO_PATH, 'rb') as video_file:
 +        files = {'file': (VIDEO_NAME, video_file, 'video/mp4')}
 +        url = f"https://api.videoindexer.ai/{LOCATION}/Accounts/{ACCOUNT_ID}/Videos?name={VIDEO_NAME}&accessToken={token}"
 +        response = requests.post(url, files=files)
 +        response.raise_for_status()
 +        video_id = response.json()['id']
 +        logger.info("Video uploaded successfully.", extra={"custom_dimensions": {"video_id": video_id}})
 +        return video_id
 +
 +def wait_for_processing(token, video_id):
 +    logger.info("Waiting for video indexing to complete...")
 +    url = f"https://api.videoindexer.ai/{LOCATION}/Accounts/{ACCOUNT_ID}/Videos/{video_id}/Index?accessToken={token}"
 +    while True:
 +        response = requests.get(url)
 +        response.raise_for_status()
 +        state = response.json().get('state')
 +        logger.info(f"Indexing state: {state}")
 +        if state == 'Processed':
 +            return
 +        time.sleep(10)
 +
 +def download_transcript(token, video_id):
 +    logger.info("Downloading transcript from Azure Video Indexer...")
 +    url = f"https://api.videoindexer.ai/{LOCATION}/Accounts/{ACCOUNT_ID}/Videos/{video_id}/Captions?format=ttml&accessToken={token}"
 +    response = requests.get(url)
 +    response.raise_for_status()
 +    return response.text
 +
 +# --- Azure Text Analytics Function ---
 +
 +def extract_key_phrases(text, video_id):
 +    client = TextAnalyticsClient(endpoint=TEXT_ANALYTICS_ENDPOINT, credential=AzureKeyCredential(TEXT_ANALYTICS_KEY))
 +    chunks = split_text_by_characters(text)
 +    all_phrases = []
 +
 +    for i, chunk in enumerate(chunks):
 +        try:
 +            response = client.extract_key_phrases([chunk])
 +            if not response[0].is_error:
 +                all_phrases.extend(response[0].key_phrases)
 +                logger.info(f"Processed chunk {i + 1} successfully.", extra={"custom_dimensions": {"chunk_length": len(chunk)}})
 +            else:
 +                logger.error(f"Chunk {i + 1} failed: {response[0].error}", extra={"custom_dimensions": {"chunk": i + 1}})
 +        except Exception as e:
 +            logger.exception(f"Chunk {i + 1} raised exception: {str(e)}", extra={"custom_dimensions": {"chunk": i + 1}})
 +
 +    logger.info("All chunks processed.", extra={"custom_dimensions": {"video_id": video_id, "total_key_phrases": len(all_phrases)}})
 +    return list(set(all_phrases))
 +
 +# --- MAIN EXECUTION FLOW ---
 +
 +if __name__ == "__main__":
 +    try:
 +        token = get_access_token()
 +        video_id = upload_video(token)
 +        wait_for_processing(token, video_id)
 +        transcript = download_transcript(token, video_id)
 +        key_phrases = extract_key_phrases(transcript, video_id)
 +
 +        print("\n--- Key Phrases Extracted ---")
 +        for phrase in key_phrases:
 +            print("-", phrase)
 +    except Exception as e:
 +        logger.exception("Unhandled exception occurred during processing.")
 </code> </code>
  
 [[ai_knowledge|AI Knowledge]] [[ai_knowledge|AI Knowledge]]
 +
  
wiki/ai/video-keyword-transcription.1748624862.txt.gz · Last modified: by ddehamer