User Tools

Site Tools


wiki:ai:ml-pipeline-test

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
wiki:ai:ml-pipeline-test [2025/06/06 13:19] ddehamerwiki:ai:ml-pipeline-test [2025/06/06 13:46] (current) ddehamer
Line 45: Line 45:
 print("Files in output dir:", os.listdir(args.model_output)) print("Files in output dir:", os.listdir(args.model_output))
 </code> </code>
- 
-Download 
  
   NOTE:  The print statements on the end were for troubleshooting and shouldn't be there for production runs.   NOTE:  The print statements on the end were for troubleshooting and shouldn't be there for production runs.
Line 186: Line 184:
 ml_client.jobs.stream(pipeline_job.name) ml_client.jobs.stream(pipeline_job.name)
 </code> </code>
- 
-Download 
  
   NOTE:  This is ran from the Notebook, not from a python script.  At least not without changes.   NOTE:  This is ran from the Notebook, not from a python script.  At least not without changes.
Line 423: Line 419:
 | Logistics | Delivery delay prediction, route optimization | | Logistics | Delivery delay prediction, route optimization |
 | Cybersecurity | Threat classification, anomaly detection | | Cybersecurity | Threat classification, anomaly detection |
 +
 +===== Reusability =====
 +
 +===== ✅ Reusable As-Is If: =====
 +
 +You are solving **the same kind of problem** (e.g., binary classification using logistic regression) and the following stay consistent:
 +
 +  * **Input data structure**: New datasets have the same column names:
 +    * ''feature1'', ''feature2'', ''label''
 +  * **Preprocessing logic**: You still just sum ''feature1 + feature2'' to create ''feature_sum''
 +  * **Model type**: You're still using a ''LogisticRegression'' model from scikit-learn
 +  * **Output format**: You expect the model to be saved as ''model.joblib''
 +
 +==== In this case: ====
 +
 +✅ You only need to change the **CSV file** and re-register it as a new version of ''sample-csv-data'', then update the pipeline call with the new version:
 +
 +<code ->
 +pythonCopyEditinput_data=Input(type=AssetTypes.URI_FILE, path="azureml:sample-csv-data:5")
 +</code>
 +
 +----
 +
 +==== 🔄 Requires Changes If: ====
 +
 +Your pipeline needs to be adapted for a different data structure or task. Here’s when you'd need to modify the scripts:
 +
 +=== 🔁 If your data columns change: ===
 +
 +  * You'll need to update:
 +    * ''prep.py'' to transform new columns appropriately
 +    * ''train.py'' to use the correct feature and label columns
 +    * Possibly retrain on different targets (multi-class, regression, etc.)
 +
 +=== 🔁 If your model type changes: ===
 +
 +  * If you switch from ''LogisticRegression'' to ''XGBoost'', ''RandomForest'', or a neural network:
 +    * Update ''train.py'' to import and instantiate the new model
 +    * Possibly adjust hyperparameters and training logic
 +
 +=== 🔁 If your pipeline steps change: ===
 +
 +  * Want to add validation?
 +  * Want to split data into train/test?
 +  * Want to evaluate model metrics?
 +    * You’ll need new component scripts and return more outputs (e.g., ''metrics.json'')
 +
 +=== 🔁 If your deployment format changes: ===
 +
 +  * If your consumers expect ONNX or TensorFlow SavedModel instead of ''joblib'', you’ll need to:
 +    * Serialize the model differently
 +    * Possibly update the pipeline to convert formats
 +
 +----
 +
 +==== 🧰 To Make it Highly Reusable: ====
 +
 +You can make the pipeline truly production-grade and reusable by:
 +
 +^ Feature ^ How to Do It ^
 +| Parametrize column names | Add ''--feature_cols'' and ''--label_col'' arguments |
 +| Generalize preprocessing | Add preprocessing config file or flags |
 +| Model selector | Add ''--model_type'' argument (''logistic'', ''xgb'', etc.) |
 +| Versioned output naming | Return ''model_output'' with model name + timestamp |
 +| Dynamic data input | Register new data via CLI, UI, or pipeline parameter |
 +
 +----
 +
 +==== ✅ Summary ====
 +
 +^ Scenario ^ Reusable? ^ What to Change ^
 +| Same data structure and model type | ✅ | Just update the input dataset version |
 +| Same structure, different model | 🔁 | Modify ''train.py'' only |
 +| Different data columns or prediction target | 🔁 | Modify ''prep.py'' and ''train.py'' |
 +| More complex workflow (e.g., evaluation, deployment) | 🔁 | Add steps and new component scripts |
 +
 +===== How to Deploy Model =====
 +
 +==== ✅ High-Level Overview ====
 +
 +  - **Prepare Scoring Script (''score.py'')**
 +  - **Create Inference Environment**
 +  - **Register the Trained Model**
 +  - **Create an Online Endpoint**
 +  - **Deploy the Model to the Endpoint**
 +  - **Test the Deployed Service**
 +
 +====== Errors Encountered During Session ======
 +
 +===== 🔁 Environment Definition Issue =====
 +
 +==== ❌ Problem: ====
 +
 +The ''conda_file'' was passed as a multi-line string instead of a dictionary. Azure ML interpreted it as a file path, resulting in a ''FileNotFoundError''.
 +
 +==== ✅ Solution: ====
 +
 +The ''conda_file'' was rewritten as a Python dictionary inside the ''Environment()'' constructor, which Azure ML correctly interpreted and registered.
 +
 +----
 +
 +===== 🔁 Dataset Reference Issue =====
 +
 +==== ❌ Problem: ====
 +
 +When submitting the pipeline, Azure ML failed to resolve the dataset because the dataset path was given as ''"sample-csv-data:4"'' without the required ''azureml:'' prefix. This caused a ''ValidationException'' about a missing asset version.
 +
 +==== ✅ Solution: ====
 +
 +The dataset path was updated to use the full Azure ML URI syntax: ''"azureml:sample-csv-data:4"'', resolving the issue.
 +
 +----
 +
 +===== 🔁 Output Not Persisted =====
 +
 +==== ❌ Problem: ====
 +
 +Even though the ''train.py'' script wrote a ''model.joblib'' file, Azure ML did not surface the output in the UI or download tools.
 +
 +==== ✅ Root Cause: ====
 +
 +The output directory was not explicitly registered in the pipeline job, and Azure ML silently discarded it.
 +
 +==== ✅ Solution: ====
 +
 +The pipeline job was updated to explicitly register ''model_output'' using ''pipeline_job.outputs[...]''. Additionally, a unique name was generated for the training component to avoid using cached versions that might not include the output.
 +
 +----
 +
 +===== 🔁 Missing Script Execution =====
 +
 +==== ❌ Problem: ====
 +
 +The ''train.py'' file executed but produced no logs or output.
 +
 +==== ✅ Root Cause: ====
 +
 +The wrong ''train.py'' file (outside of the ''/src'' folder) was being edited, and Azure ML was executing an outdated or incorrect version.
 +
 +==== ✅ Solution: ====
 +
 +The correct file (''/src/train.py'') was updated with ''print()'' statements to confirm execution. After correcting this, output logs began appearing as expected.
 +
 +----
 +
 +===== 🔁 Scoping Error in train.py =====
 +
 +==== ❌ Problem: ====
 +
 +Print statements accessing ''args.model_output'' were placed outside the ''main()'' function, resulting in a ''NameError''.
 +
 +==== ✅ Solution: ====
 +
 +The logging and ''print()'' statements were moved inside the ''main()'' function, ensuring access to the ''args'' object.
 +
 +----
 +
 +===== 🔁 Model Download Error =====
 +
 +==== ❌ Problem: ====
 +
 +An attempt to use the ''overwrite=True'' parameter in ''ml_client.jobs.download()'' caused a ''TypeError'' because that parameter is unsupported in the Azure ML v2 SDK.
 +
 +==== ✅ Solution: ====
 +
 +The ''overwrite'' parameter was removed, and if needed, the local folder was deleted manually before calling ''download()'' again.
 +
 +----
 +
 +===== 🔁 Silent Step Failure Due to Typo =====
 +
 +==== ❌ Problem: ====
 +
 +The dataset path was mistyped as ''"asureml:"'' instead of ''"azureml:"'', causing the ''prep_step'' to fail silently with no user-code execution.
 +
 +==== ✅ Solution: ====
 +
 +The typo was corrected, and the step executed normally once a valid dataset path was provided.
 +
 +----
 +
 +===== ✅ Final Outcome =====
 +
 +After resolving these issues:
 +
 +  * The pipeline executed end-to-end
 +  * The model output was persisted and downloadable
 +  * Logs confirmed proper script execution
 +  * The deployment strategy was outlined, ready for API-based use
  
 [[ai_knowledge|AI Knowledge]] [[ai_knowledge|AI Knowledge]]
  
  
wiki/ai/ml-pipeline-test.1749215973.txt.gz · Last modified: by ddehamer