This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| wiki:ai:ml-pipeline-test [2025/06/06 13:19] – ddehamer | wiki:ai:ml-pipeline-test [2025/06/06 13:46] (current) – ddehamer | ||
|---|---|---|---|
| Line 45: | Line 45: | ||
| print(" | print(" | ||
| </ | </ | ||
| - | |||
| - | Download | ||
| NOTE: The print statements on the end were for troubleshooting and shouldn' | NOTE: The print statements on the end were for troubleshooting and shouldn' | ||
| Line 186: | Line 184: | ||
| ml_client.jobs.stream(pipeline_job.name) | ml_client.jobs.stream(pipeline_job.name) | ||
| </ | </ | ||
| - | |||
| - | Download | ||
| NOTE: This is ran from the Notebook, not from a python script. | NOTE: This is ran from the Notebook, not from a python script. | ||
| Line 423: | Line 419: | ||
| | Logistics | Delivery delay prediction, route optimization | | | Logistics | Delivery delay prediction, route optimization | | ||
| | Cybersecurity | Threat classification, | | Cybersecurity | Threat classification, | ||
| + | |||
| + | ===== Reusability ===== | ||
| + | |||
| + | ===== ✅ Reusable As-Is If: ===== | ||
| + | |||
| + | You are solving **the same kind of problem** (e.g., binary classification using logistic regression) and the following stay consistent: | ||
| + | |||
| + | * **Input data structure**: | ||
| + | * '' | ||
| + | * **Preprocessing logic**: You still just sum '' | ||
| + | * **Model type**: You're still using a '' | ||
| + | * **Output format**: You expect the model to be saved as '' | ||
| + | |||
| + | ==== In this case: ==== | ||
| + | |||
| + | ✅ You only need to change the **CSV file** and re-register it as a new version of '' | ||
| + | |||
| + | <code -> | ||
| + | pythonCopyEditinput_data=Input(type=AssetTypes.URI_FILE, | ||
| + | </ | ||
| + | |||
| + | ---- | ||
| + | |||
| + | ==== 🔄 Requires Changes If: ==== | ||
| + | |||
| + | Your pipeline needs to be adapted for a different data structure or task. Here’s when you'd need to modify the scripts: | ||
| + | |||
| + | === 🔁 If your data columns change: === | ||
| + | |||
| + | * You'll need to update: | ||
| + | * '' | ||
| + | * '' | ||
| + | * Possibly retrain on different targets (multi-class, | ||
| + | |||
| + | === 🔁 If your model type changes: === | ||
| + | |||
| + | * If you switch from '' | ||
| + | * Update '' | ||
| + | * Possibly adjust hyperparameters and training logic | ||
| + | |||
| + | === 🔁 If your pipeline steps change: === | ||
| + | |||
| + | * Want to add validation? | ||
| + | * Want to split data into train/test? | ||
| + | * Want to evaluate model metrics? | ||
| + | * You’ll need new component scripts and return more outputs (e.g., '' | ||
| + | |||
| + | === 🔁 If your deployment format changes: === | ||
| + | |||
| + | * If your consumers expect ONNX or TensorFlow SavedModel instead of '' | ||
| + | * Serialize the model differently | ||
| + | * Possibly update the pipeline to convert formats | ||
| + | |||
| + | ---- | ||
| + | |||
| + | ==== 🧰 To Make it Highly Reusable: ==== | ||
| + | |||
| + | You can make the pipeline truly production-grade and reusable by: | ||
| + | |||
| + | ^ Feature ^ How to Do It ^ | ||
| + | | Parametrize column names | Add '' | ||
| + | | Generalize preprocessing | Add preprocessing config file or flags | | ||
| + | | Model selector | Add '' | ||
| + | | Versioned output naming | Return '' | ||
| + | | Dynamic data input | Register new data via CLI, UI, or pipeline parameter | | ||
| + | |||
| + | ---- | ||
| + | |||
| + | ==== ✅ Summary ==== | ||
| + | |||
| + | ^ Scenario ^ Reusable? ^ What to Change ^ | ||
| + | | Same data structure and model type | ✅ | Just update the input dataset version | | ||
| + | | Same structure, different model | 🔁 | Modify '' | ||
| + | | Different data columns or prediction target | 🔁 | Modify '' | ||
| + | | More complex workflow (e.g., evaluation, deployment) | 🔁 | Add steps and new component scripts | | ||
| + | |||
| + | ===== How to Deploy Model ===== | ||
| + | |||
| + | ==== ✅ High-Level Overview ==== | ||
| + | |||
| + | - **Prepare Scoring Script ('' | ||
| + | - **Create Inference Environment** | ||
| + | - **Register the Trained Model** | ||
| + | - **Create an Online Endpoint** | ||
| + | - **Deploy the Model to the Endpoint** | ||
| + | - **Test the Deployed Service** | ||
| + | |||
| + | ====== Errors Encountered During Session ====== | ||
| + | |||
| + | ===== 🔁 Environment Definition Issue ===== | ||
| + | |||
| + | ==== ❌ Problem: ==== | ||
| + | |||
| + | The '' | ||
| + | |||
| + | ==== ✅ Solution: ==== | ||
| + | |||
| + | The '' | ||
| + | |||
| + | ---- | ||
| + | |||
| + | ===== 🔁 Dataset Reference Issue ===== | ||
| + | |||
| + | ==== ❌ Problem: ==== | ||
| + | |||
| + | When submitting the pipeline, Azure ML failed to resolve the dataset because the dataset path was given as ''" | ||
| + | |||
| + | ==== ✅ Solution: ==== | ||
| + | |||
| + | The dataset path was updated to use the full Azure ML URI syntax: ''" | ||
| + | |||
| + | ---- | ||
| + | |||
| + | ===== 🔁 Output Not Persisted ===== | ||
| + | |||
| + | ==== ❌ Problem: ==== | ||
| + | |||
| + | Even though the '' | ||
| + | |||
| + | ==== ✅ Root Cause: ==== | ||
| + | |||
| + | The output directory was not explicitly registered in the pipeline job, and Azure ML silently discarded it. | ||
| + | |||
| + | ==== ✅ Solution: ==== | ||
| + | |||
| + | The pipeline job was updated to explicitly register '' | ||
| + | |||
| + | ---- | ||
| + | |||
| + | ===== 🔁 Missing Script Execution ===== | ||
| + | |||
| + | ==== ❌ Problem: ==== | ||
| + | |||
| + | The '' | ||
| + | |||
| + | ==== ✅ Root Cause: ==== | ||
| + | |||
| + | The wrong '' | ||
| + | |||
| + | ==== ✅ Solution: ==== | ||
| + | |||
| + | The correct file (''/ | ||
| + | |||
| + | ---- | ||
| + | |||
| + | ===== 🔁 Scoping Error in train.py ===== | ||
| + | |||
| + | ==== ❌ Problem: ==== | ||
| + | |||
| + | Print statements accessing '' | ||
| + | |||
| + | ==== ✅ Solution: ==== | ||
| + | |||
| + | The logging and '' | ||
| + | |||
| + | ---- | ||
| + | |||
| + | ===== 🔁 Model Download Error ===== | ||
| + | |||
| + | ==== ❌ Problem: ==== | ||
| + | |||
| + | An attempt to use the '' | ||
| + | |||
| + | ==== ✅ Solution: ==== | ||
| + | |||
| + | The '' | ||
| + | |||
| + | ---- | ||
| + | |||
| + | ===== 🔁 Silent Step Failure Due to Typo ===== | ||
| + | |||
| + | ==== ❌ Problem: ==== | ||
| + | |||
| + | The dataset path was mistyped as ''" | ||
| + | |||
| + | ==== ✅ Solution: ==== | ||
| + | |||
| + | The typo was corrected, and the step executed normally once a valid dataset path was provided. | ||
| + | |||
| + | ---- | ||
| + | |||
| + | ===== ✅ Final Outcome ===== | ||
| + | |||
| + | After resolving these issues: | ||
| + | |||
| + | * The pipeline executed end-to-end | ||
| + | * The model output was persisted and downloadable | ||
| + | * Logs confirmed proper script execution | ||
| + | * The deployment strategy was outlined, ready for API-based use | ||
| [[ai_knowledge|AI Knowledge]] | [[ai_knowledge|AI Knowledge]] | ||