User Tools

Site Tools


wiki:ai:ml-pipeline-test

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
wiki:ai:ml-pipeline-test [2025/06/06 13:29] ddehamerwiki:ai:ml-pipeline-test [2025/06/06 13:46] (current) ddehamer
Line 45: Line 45:
 print("Files in output dir:", os.listdir(args.model_output)) print("Files in output dir:", os.listdir(args.model_output))
 </code> </code>
- 
-Download 
- 
-Download 
- 
-Download 
  
   NOTE:  The print statements on the end were for troubleshooting and shouldn't be there for production runs.   NOTE:  The print statements on the end were for troubleshooting and shouldn't be there for production runs.
Line 190: Line 184:
 ml_client.jobs.stream(pipeline_job.name) ml_client.jobs.stream(pipeline_job.name)
 </code> </code>
- 
-Download 
- 
-Download 
- 
-Download 
  
   NOTE:  This is ran from the Notebook, not from a python script.  At least not without changes.   NOTE:  This is ran from the Notebook, not from a python script.  At least not without changes.
Line 517: Line 505:
   - **Deploy the Model to the Endpoint**   - **Deploy the Model to the Endpoint**
   - **Test the Deployed Service**   - **Test the Deployed Service**
 +
 +====== Errors Encountered During Session ======
 +
 +===== 🔁 Environment Definition Issue =====
 +
 +==== ❌ Problem: ====
 +
 +The ''conda_file'' was passed as a multi-line string instead of a dictionary. Azure ML interpreted it as a file path, resulting in a ''FileNotFoundError''.
 +
 +==== ✅ Solution: ====
 +
 +The ''conda_file'' was rewritten as a Python dictionary inside the ''Environment()'' constructor, which Azure ML correctly interpreted and registered.
 +
 +----
 +
 +===== 🔁 Dataset Reference Issue =====
 +
 +==== ❌ Problem: ====
 +
 +When submitting the pipeline, Azure ML failed to resolve the dataset because the dataset path was given as ''"sample-csv-data:4"'' without the required ''azureml:'' prefix. This caused a ''ValidationException'' about a missing asset version.
 +
 +==== ✅ Solution: ====
 +
 +The dataset path was updated to use the full Azure ML URI syntax: ''"azureml:sample-csv-data:4"'', resolving the issue.
 +
 +----
 +
 +===== 🔁 Output Not Persisted =====
 +
 +==== ❌ Problem: ====
 +
 +Even though the ''train.py'' script wrote a ''model.joblib'' file, Azure ML did not surface the output in the UI or download tools.
 +
 +==== ✅ Root Cause: ====
 +
 +The output directory was not explicitly registered in the pipeline job, and Azure ML silently discarded it.
 +
 +==== ✅ Solution: ====
 +
 +The pipeline job was updated to explicitly register ''model_output'' using ''pipeline_job.outputs[...]''. Additionally, a unique name was generated for the training component to avoid using cached versions that might not include the output.
 +
 +----
 +
 +===== 🔁 Missing Script Execution =====
 +
 +==== ❌ Problem: ====
 +
 +The ''train.py'' file executed but produced no logs or output.
 +
 +==== ✅ Root Cause: ====
 +
 +The wrong ''train.py'' file (outside of the ''/src'' folder) was being edited, and Azure ML was executing an outdated or incorrect version.
 +
 +==== ✅ Solution: ====
 +
 +The correct file (''/src/train.py'') was updated with ''print()'' statements to confirm execution. After correcting this, output logs began appearing as expected.
 +
 +----
 +
 +===== 🔁 Scoping Error in train.py =====
 +
 +==== ❌ Problem: ====
 +
 +Print statements accessing ''args.model_output'' were placed outside the ''main()'' function, resulting in a ''NameError''.
 +
 +==== ✅ Solution: ====
 +
 +The logging and ''print()'' statements were moved inside the ''main()'' function, ensuring access to the ''args'' object.
 +
 +----
 +
 +===== 🔁 Model Download Error =====
 +
 +==== ❌ Problem: ====
 +
 +An attempt to use the ''overwrite=True'' parameter in ''ml_client.jobs.download()'' caused a ''TypeError'' because that parameter is unsupported in the Azure ML v2 SDK.
 +
 +==== ✅ Solution: ====
 +
 +The ''overwrite'' parameter was removed, and if needed, the local folder was deleted manually before calling ''download()'' again.
 +
 +----
 +
 +===== 🔁 Silent Step Failure Due to Typo =====
 +
 +==== ❌ Problem: ====
 +
 +The dataset path was mistyped as ''"asureml:"'' instead of ''"azureml:"'', causing the ''prep_step'' to fail silently with no user-code execution.
 +
 +==== ✅ Solution: ====
 +
 +The typo was corrected, and the step executed normally once a valid dataset path was provided.
 +
 +----
 +
 +===== ✅ Final Outcome =====
 +
 +After resolving these issues:
 +
 +  * The pipeline executed end-to-end
 +  * The model output was persisted and downloadable
 +  * Logs confirmed proper script execution
 +  * The deployment strategy was outlined, ready for API-based use
  
 [[ai_knowledge|AI Knowledge]] [[ai_knowledge|AI Knowledge]]
  
  
wiki/ai/ml-pipeline-test.1749216542.txt.gz · Last modified: by ddehamer