Differences

This shows you the differences between two versions of the page.

--- wiki:ai:ml-pipeline-test [2025/06/06 13:13] – ddehamer
+++ wiki:ai:ml-pipeline-test [2025/06/06 13:46] (current) – ddehamer
@@ Line 265: / Line 265: @@
 <code ->
 pythonCopyEditpipeline_job.outputs["model_output"] = Output(...)
 </code>
@@ Line 294: / Line 293: @@
   * Logs and artifacts are persisted and inspectable via both SDK and UI
   * The pipeline can now be scheduled, automated, and extended
+===== Uses for the output model =====
+==== 🚀 1. Deploy the Model for Real-Time Inference ====
+==== Purpose: ====
+Allow other applications (e.g., web apps, mobile apps, services) to query the model in real time via an API.
+==== Implementation: ====
+  * Deploy the model using **Azure ML Online Endpoints**
+  * Wrap it in a scoring script (''score.py'') with a defined input/output schema
+  * Use Azure’s **managed REST API** for secure, scalable access
+==== Example Use Cases: ====
+  * Predict customer churn during a support call
+  * Make fraud detection decisions as a transaction is processed
+  * Recommend next-best-actions in a CRM interface
+----
+==== 🗃 2. Use the Model for Batch Scoring ====
+=== Purpose: ===
+Process large datasets periodically to generate predictions at scale.
+=== Implementation: ===
+  * Use **Azure ML batch endpoints**, or submit a batch scoring pipeline job
+  * Read input from blob storage or a database
+  * Write predictions back to storage for analysis or ingestion into other systems
+=== Example Use Cases: ===
+  * Score all users nightly to update risk profiles
+  * Predict part failures across all equipment in a factory
+  * Run loan approval predictions across pending applications
+----
+==== 🧪 3. Evaluate and Explain the Model ====
+=== Purpose: ===
+Ensure the model is fair, explainable, and performant — especially critical in regulated environments.
+=== Tools: ===
+  * **Responsible AI Dashboard** for fairness, explanation, counterfactuals
+  * **SHAP or LIME** for feature importance
+  * **Model metrics dashboards** for precision, recall, ROC, etc.
+=== Example Use Cases: ===
+  * Validate that your loan approval model isn’t biased against a demographic group
+  * Provide per-prediction feature attributions for compliance
+  * Tune decision thresholds based on business objectives
+----
+==== 🔐 4. Embed the Model in a Business Workflow ====
+=== Purpose: ===
+Integrate predictions into real-time or batch operational systems to drive action.
+=== Integration Options: ===
+  * Azure Functions or Logic Apps (real-time triggers)
+  * Azure Data Factory or Synapse pipelines (batch workflows)
+  * Event Grid / Event Hub for prediction-driven messaging
+=== Example Use Cases: ===
+  * Auto-assign support tickets based on urgency prediction
+  * Escalate flagged transactions to fraud review team
+  * Enqueue predicted high-risk patients into care follow-up workflow
+----
+==== 🛡 5. Monitor and Manage the Model in Production ====
+=== Purpose: ===
+Ensure the model performs well over time as real-world data changes.
+=== Actions: ===
+  * Monitor prediction drift and data quality with **Azure ML Data Monitor**
+  * Set up retraining pipelines if performance drops
+  * Use **MLflow** or Azure model registry to version models and manage lifecycles
+=== Example Use Cases: ===
+  * Detect concept drift in customer behavior post-promotion
+  * Auto-retrain recommendation model every 2 weeks
+  * Compare performance of two deployed model versions (A/B testing)
+----
+==== 🔁 6. Retrain or Fine-Tune the Model ====
+=== Purpose: ===
+Keep the model up-to-date with fresh data, domain changes, or new features.
+=== Strategies: ===
+  * Use a scheduled pipeline to retrain with new labeled data
+  * Add new features or tune hyperparameters
+  * Replace the model with an upgraded architecture (e.g., switching from logistic regression to XGBoost)
+----
+==== 🧠 Real-World Examples by Industry ====
+^ Industry ^ Use of ''model.joblib'' ^
+| Finance | Credit risk scoring, fraud detection |
+| Retail | Product recommendation, churn prediction |
+| Healthcare | Diagnosis support, patient readmission risk |
+| Manufacturing | Predictive maintenance, quality defect scoring |
+| Logistics | Delivery delay prediction, route optimization |
+| Cybersecurity | Threat classification, anomaly detection |
+===== Reusability =====
+===== ✅ Reusable As-Is If: =====
+You are solving **the same kind of problem** (e.g., binary classification using logistic regression) and the following stay consistent:
+  * **Input data structure**: New datasets have the same column names:
+    * ''feature1'', ''feature2'', ''label''
+  * **Preprocessing logic**: You still just sum ''feature1 + feature2'' to create ''feature_sum''
+  * **Model type**: You're still using a ''LogisticRegression'' model from scikit-learn
+  * **Output format**: You expect the model to be saved as ''model.joblib''
+==== In this case: ====
+✅ You only need to change the **CSV file** and re-register it as a new version of ''sample-csv-data'', then update the pipeline call with the new version:
+<code ->
+pythonCopyEditinput_data=Input(type=AssetTypes.URI_FILE, path="azureml:sample-csv-data:5")
+</code>
+----
+==== 🔄 Requires Changes If: ====
+Your pipeline needs to be adapted for a different data structure or task. Here’s when you'd need to modify the scripts:
+=== 🔁 If your data columns change: ===
+  * You'll need to update:
+    * ''prep.py'' to transform new columns appropriately
+    * ''train.py'' to use the correct feature and label columns
+    * Possibly retrain on different targets (multi-class, regression, etc.)
+=== 🔁 If your model type changes: ===
+  * If you switch from ''LogisticRegression'' to ''XGBoost'', ''RandomForest'', or a neural network:
+    * Update ''train.py'' to import and instantiate the new model
+    * Possibly adjust hyperparameters and training logic
+=== 🔁 If your pipeline steps change: ===
+  * Want to add validation?
+  * Want to split data into train/test?
+  * Want to evaluate model metrics?
+    * You’ll need new component scripts and return more outputs (e.g., ''metrics.json'')
+=== 🔁 If your deployment format changes: ===
+  * If your consumers expect ONNX or TensorFlow SavedModel instead of ''joblib'', you’ll need to:
+    * Serialize the model differently
+    * Possibly update the pipeline to convert formats
+----
+==== 🧰 To Make it Highly Reusable: ====
+You can make the pipeline truly production-grade and reusable by:
+^ Feature ^ How to Do It ^
+| Parametrize column names | Add ''--feature_cols'' and ''--label_col'' arguments |
+| Generalize preprocessing | Add preprocessing config file or flags |
+| Model selector | Add ''--model_type'' argument (''logistic'', ''xgb'', etc.) |
+| Versioned output naming | Return ''model_output'' with model name + timestamp |
+| Dynamic data input | Register new data via CLI, UI, or pipeline parameter |
+----
+==== ✅ Summary ====
+^ Scenario ^ Reusable? ^ What to Change ^
+| Same data structure and model type | ✅ | Just update the input dataset version |
+| Same structure, different model | 🔁 | Modify ''train.py'' only |
+| Different data columns or prediction target | 🔁 | Modify ''prep.py'' and ''train.py'' |
+| More complex workflow (e.g., evaluation, deployment) | 🔁 | Add steps and new component scripts |
+===== How to Deploy Model =====
+==== ✅ High-Level Overview ====
+  - **Prepare Scoring Script (''score.py'')**
+  - **Create Inference Environment**
+  - **Register the Trained Model**
+  - **Create an Online Endpoint**
+  - **Deploy the Model to the Endpoint**
+  - **Test the Deployed Service**
+====== Errors Encountered During Session ======
+===== 🔁 Environment Definition Issue =====
+==== ❌ Problem: ====
+The ''conda_file'' was passed as a multi-line string instead of a dictionary. Azure ML interpreted it as a file path, resulting in a ''FileNotFoundError''.
+==== ✅ Solution: ====
+The ''conda_file'' was rewritten as a Python dictionary inside the ''Environment()'' constructor, which Azure ML correctly interpreted and registered.
+----
+===== 🔁 Dataset Reference Issue =====
+==== ❌ Problem: ====
+When submitting the pipeline, Azure ML failed to resolve the dataset because the dataset path was given as ''"sample-csv-data:4"'' without the required ''azureml:'' prefix. This caused a ''ValidationException'' about a missing asset version.
+==== ✅ Solution: ====
+The dataset path was updated to use the full Azure ML URI syntax: ''"azureml:sample-csv-data:4"'', resolving the issue.
+----
+===== 🔁 Output Not Persisted =====
+==== ❌ Problem: ====
+Even though the ''train.py'' script wrote a ''model.joblib'' file, Azure ML did not surface the output in the UI or download tools.
+==== ✅ Root Cause: ====
+The output directory was not explicitly registered in the pipeline job, and Azure ML silently discarded it.
+==== ✅ Solution: ====
+The pipeline job was updated to explicitly register ''model_output'' using ''pipeline_job.outputs[...]''. Additionally, a unique name was generated for the training component to avoid using cached versions that might not include the output.
+----
+===== 🔁 Missing Script Execution =====
+==== ❌ Problem: ====
+The ''train.py'' file executed but produced no logs or output.
+==== ✅ Root Cause: ====
+The wrong ''train.py'' file (outside of the ''/src'' folder) was being edited, and Azure ML was executing an outdated or incorrect version.
+==== ✅ Solution: ====
+The correct file (''/src/train.py'') was updated with ''print()'' statements to confirm execution. After correcting this, output logs began appearing as expected.
+----
+===== 🔁 Scoping Error in train.py =====
+==== ❌ Problem: ====
+Print statements accessing ''args.model_output'' were placed outside the ''main()'' function, resulting in a ''NameError''.
+==== ✅ Solution: ====
+The logging and ''print()'' statements were moved inside the ''main()'' function, ensuring access to the ''args'' object.
+----
+===== 🔁 Model Download Error =====
+==== ❌ Problem: ====
+An attempt to use the ''overwrite=True'' parameter in ''ml_client.jobs.download()'' caused a ''TypeError'' because that parameter is unsupported in the Azure ML v2 SDK.
+==== ✅ Solution: ====
+The ''overwrite'' parameter was removed, and if needed, the local folder was deleted manually before calling ''download()'' again.
+----
+===== 🔁 Silent Step Failure Due to Typo =====
+==== ❌ Problem: ====
+The dataset path was mistyped as ''"asureml:"'' instead of ''"azureml:"'', causing the ''prep_step'' to fail silently with no user-code execution.
+==== ✅ Solution: ====
+The typo was corrected, and the step executed normally once a valid dataset path was provided.
+----
+===== ✅ Final Outcome =====
+After resolving these issues:
+  * The pipeline executed end-to-end
+  * The model output was persisted and downloadable
+  * Logs confirmed proper script execution
+  * The deployment strategy was outlined, ready for API-based use
 [[ai_knowledge|AI Knowledge]]

Combined Cloud Managed Services

Site Tools

Differences

Page Tools