User Tools

Site Tools


wiki:ai:cli-workspace-deployment-upload

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
wiki:ai:cli-workspace-deployment-upload [2025/06/04 15:46] ddehamerwiki:ai:cli-workspace-deployment-upload [2025/06/04 15:51] (current) ddehamer
Line 252: Line 252:
 In **Azure Machine Learning (Azure ML)**, datasets used for training can be in a variety of formats. The platform is flexible, but depending on how you plan to use the dataset (e.g., for AutoML, custom training, or pipelines), certain formats are more appropriate. In **Azure Machine Learning (Azure ML)**, datasets used for training can be in a variety of formats. The platform is flexible, but depending on how you plan to use the dataset (e.g., for AutoML, custom training, or pipelines), certain formats are more appropriate.
  
-==== ✅ Commonly Supported Formats ====+=== ✅ Commonly Supported Formats ===
  
 ^ Format ^ Description ^ Notes ^ ^ Format ^ Description ^ Notes ^
Line 277: Line 277:
 ---- ----
  
-==== ✅ Where to Store ====+=== ✅ Where to Store ===
  
 Datasets must be accessible to Azure ML compute. Typical storage options: Datasets must be accessible to Azure ML compute. Typical storage options:
Line 314: Line 314:
 ---- ----
  
-==== ✅ Option 1: Export a Table to CSV Using SQL (INTO OUTFILE) ====+=== ✅ Option 1: Export a Table to CSV Using SQL (INTO OUTFILE) ===
  
 <code -> <code ->
Line 333: Line 333:
 ---- ----
  
-==== ✅ Option 2: Use mysql CLI and sed/awk (CSV-like output) ====+=== ✅ Option 2: Use mysql CLI and sed/awk (CSV-like output) ===
  
 <code -> <code ->
Line 344: Line 344:
 ---- ----
  
-==== ✅ Option 3: Python Script to Dump Entire Database to CSV Files (per Table) ====+=== ✅ Option 3: Python Script to Dump Entire Database to CSV Files (per Table) ===
  
 If you want a full database export (one CSV per table): If you want a full database export (one CSV per table):
Line 369: Line 369:
 ---- ----
  
-==== 🔐 Permissions Reminder ====+=== 🔐 Permissions Reminder ===
  
 If you're getting errors with ''INTO OUTFILE'', check: If you're getting errors with ''INTO OUTFILE'', check:
Line 392: Line 392:
 ---- ----
  
-===== ✅ 1. Via Azure ML Studio (Web UI) =====+=== ✅ 1. Via Azure ML Studio (Web UI) ===
  
-==== 📍 Steps: ====+== 📍 Steps: ==
  
   - Go to Azure ML Studio   - Go to Azure ML Studio
Line 409: Line 409:
 ---- ----
  
-===== ✅ 2. Via Python SDK =====+=== ✅ 2. Via Python SDK ===
  
-==== 🔧 Example: Upload a CSV as a tabular dataset ====+== 🔧 Example: Upload a CSV as a tabular dataset ==
  
 <code -> <code ->
Line 442: Line 442:
 ---- ----
  
-===== ✅ 3. Via Azure CLI =====+=== ✅ 3. Via Azure CLI ===
  
 <code -> <code ->
Line 456: Line 456:
 ---- ----
  
-===== ✅ 4. From Azure Blob Storage or External URL =====+=== ✅ 4. From Azure Blob Storage or External URL ===
  
 If your data is already in Azure Blob: If your data is already in Azure Blob:
Line 478: Line 478:
 ---- ----
  
-===== 🔄 Upload Types =====+=== 🔄 Upload Types ===
  
 ^ Type ^ Use Case ^ ^ Type ^ Use Case ^
Line 508: Line 508:
 ---- ----
  
-===== ✅ Goal =====+=== ✅ Goal ===
  
 You want to **input a character name** and retrieve **associated details** from your dataset. This suggests: You want to **input a character name** and retrieve **associated details** from your dataset. This suggests:
Line 517: Line 517:
 ---- ----
  
-===== ✅ Step-by-Step Plan (Using Azure ML + OpenAI or Embeddings) =====+=== ✅ Step-by-Step Plan (Using Azure ML + OpenAI or Embeddings) ===
  
-==== Step 1: Understand Your Data ====+== Step 1: Understand Your Data ==
  
 I'll examine your dataset. I'll examine your dataset.
Line 535: Line 535:
 ---- ----
  
-===== ✅ ML-Based Query System Options =====+=== ✅ ML-Based Query System Options ===
  
 Since you're aiming to **query by character** and get full details, you have two main paths: Since you're aiming to **query by character** and get full details, you have two main paths:
Line 541: Line 541:
 ---- ----
  
-==== 🔹 Option 1: Traditional Filter-Based Query ====+== 🔹 Option 1: Traditional Filter-Based Query ==
  
 No ML needed — just build a UI or API to filter rows where ''character = input''. No ML needed — just build a UI or API to filter rows where ''character = input''.
Line 549: Line 549:
 ---- ----
  
-==== 🔹 Option 2: AI-Powered Search (Semantic or Fuzzy Matching) ====+== 🔹 Option 2: AI-Powered Search (Semantic or Fuzzy Matching) ==
  
 Use **OpenAI embeddings** + **Azure ML** to let the user type: Use **OpenAI embeddings** + **Azure ML** to let the user type:
Line 557: Line 557:
 **Steps:** **Steps:**
  
-=== Step 1: Generate Embeddings for Each Row ===+= Step 1: Generate Embeddings for Each Row =
  
 Use OpenAI's ''text-embedding-ada-002'' or Azure OpenAI equivalent on a string like: Use OpenAI's ''text-embedding-ada-002'' or Azure OpenAI equivalent on a string like:
Line 571: Line 571:
 ---- ----
  
-===== ✅ Recommendation =====+=== ✅ Recommendation ===
  
 Would you prefer: Would you prefer:
Line 581: Line 581:
 I can generate code or deploy scripts for either path. Which would you like? I can generate code or deploy scripts for either path. Which would you like?
  
-== You said: ==+= You said: =
  
 I would like to us ML for this. Do you have suggestions for some example that would be more suited to using ML with this dataset? I would like to us ML for this. Do you have suggestions for some example that would be more suited to using ML with this dataset?
Line 593: Line 593:
 ---- ----
  
-===== ✅ 1. Price Estimation (Regression) =====+=== ✅ 1. Price Estimation (Regression) ===
  
-==== 📌 Problem: ====+== 📌 Problem: ==
  
 Predict ''approximate_value'' based on other fields (''character'', ''type'', ''manufacturer'', ''list_price'', etc.). Predict ''approximate_value'' based on other fields (''character'', ''type'', ''manufacturer'', ''list_price'', etc.).
  
-==== ✅ Why it fits: ====+== ✅ Why it fits: ==
  
   * You have labeled numeric data.   * You have labeled numeric data.
   * It's useful for appraising unlisted items.   * It's useful for appraising unlisted items.
  
-==== 🔧 ML Type: ====+== 🔧 ML Type: ==
  
   * **Regression** (e.g., Linear Regression, XGBoost, AutoML)   * **Regression** (e.g., Linear Regression, XGBoost, AutoML)
Line 610: Line 610:
 ---- ----
  
-===== ✅ 2. Category Prediction (Classification) =====+=== ✅ 2. Category Prediction (Classification) ===
  
-==== 📌 Problem: ====+== 📌 Problem: ==
  
 Predict the ''type'' of collectible or its ''location'' based on fields like ''character'', ''property'', and ''manufacturer''. Predict the ''type'' of collectible or its ''location'' based on fields like ''character'', ''property'', and ''manufacturer''.
  
-==== ✅ Why it fits: ====+== ✅ Why it fits: ==
  
   * Helps auto-label new entries.   * Helps auto-label new entries.
   * Enables predictive organization/sorting.   * Enables predictive organization/sorting.
  
-==== 🔧 ML Type: ====+== 🔧 ML Type: ==
  
   * **Classification**   * **Classification**
Line 627: Line 627:
 ---- ----
  
-===== ✅ 3. Clustering Characters by Similarity (Unsupervised) =====+=== ✅ 3. Clustering Characters by Similarity (Unsupervised) ===
  
-==== 📌 Problem: ====+== 📌 Problem: ==
  
 Group similar characters or figures automatically to find patterns or suggest groupings. Group similar characters or figures automatically to find patterns or suggest groupings.
  
-==== ✅ Why it fits: ====+== ✅ Why it fits: ==
  
   * Explores hidden structure.   * Explores hidden structure.
   * Could be visualized or used to build recommendation systems.   * Could be visualized or used to build recommendation systems.
  
-==== 🔧 ML Type: ====+== 🔧 ML Type: ==
  
   * **Clustering (e.g., KMeans)** using embeddings of text descriptions.   * **Clustering (e.g., KMeans)** using embeddings of text descriptions.
Line 644: Line 644:
 ---- ----
  
-===== ✅ 4. Anomaly Detection =====+=== ✅ 4. Anomaly Detection ===
  
-==== 📌 Problem: ====+== 📌 Problem: ==
  
 Find collectibles that are priced much higher/lower than similar items. Find collectibles that are priced much higher/lower than similar items.
  
-==== ✅ Why it fits: ====+== ✅ Why it fits: ==
  
   * Helps find mispriced entries or rare items.   * Helps find mispriced entries or rare items.
  
-==== 🔧 ML Type: ====+== 🔧 ML Type: ==
  
   * **Anomaly Detection (Isolation Forest, One-Class SVM)**   * **Anomaly Detection (Isolation Forest, One-Class SVM)**
Line 660: Line 660:
 ---- ----
  
-===== ✅ 5. Embedding-Based Semantic Search (LLM-enhanced ML) =====+=== ✅ 5. Embedding-Based Semantic Search (LLM-enhanced ML) ===
  
-==== 📌 Problem: ====+== 📌 Problem: ==
  
 Let users search using natural language and get intelligent matches. Let users search using natural language and get intelligent matches.
  
-==== ✅ Why it fits: ====+== ✅ Why it fits: ==
  
   * Transforms character/item descriptions into embeddings.   * Transforms character/item descriptions into embeddings.
   * Allows vector similarity search over the dataset.   * Allows vector similarity search over the dataset.
  
-==== 🔧 Tools: ====+== 🔧 Tools: ==
  
   * Azure OpenAI (for embeddings)   * Azure OpenAI (for embeddings)
Line 679: Line 679:
 ---- ----
  
-===== 🧠 Suggestion =====+=== 🧠 Suggestion ===
  
 To get started with a real ML training task: To get started with a real ML training task:
  
-==== ▶️ Start with Price Estimation using Regression: ====+== ▶️ Start with Price Estimation using Regression: ==
  
   * **Input**: ''character'', ''figure_name'', ''property'', ''type'', ''manufacturer'', ''list_price'', etc.   * **Input**: ''character'', ''figure_name'', ''property'', ''type'', ''manufacturer'', ''list_price'', etc.
Line 690: Line 690:
 Would you like me to generate a full training script (e.g., in Python with scikit-learn or in Azure ML SDK) for this? Would you like me to generate a full training script (e.g., in Python with scikit-learn or in Azure ML SDK) for this?
  
-== You said: ==+= You said: =
  
 Yes Yes
Line 698: Line 698:
 Your regression model has been successfully trained to predict ''approximate_value'' using collectible attributes. The evaluation shows: Your regression model has been successfully trained to predict ''approximate_value'' using collectible attributes. The evaluation shows:
  
-==== ✅ Root Mean Squared Error (RMSE): ~30.15 ====+== ✅ Root Mean Squared Error (RMSE): ~30.15 ==
  
 This means that, on average, the model's predictions are off by about **$30** — not bad for a first pass, considering potential variance in collectible values. This means that, on average, the model's predictions are off by about **$30** — not bad for a first pass, considering potential variance in collectible values.
Line 704: Line 704:
 ---- ----
  
-===== 🧪 Next Steps You Can Take =====+=== 🧪 Next Steps You Can Take ===
  
-==== 🔧 Improvements ====+== 🔧 Improvements ==
  
   * Use more advanced models (e.g., **XGBoost**, **LightGBM**).   * Use more advanced models (e.g., **XGBoost**, **LightGBM**).
wiki/ai/cli-workspace-deployment-upload.1749051998.txt.gz · Last modified: by ddehamer