Previous
Automate annotation
Submit a training job from your labeled dataset. Viam runs the job on cloud infrastructure – no GPU provisioning or framework installation needed. Training logs are available for 7 days after the job completes.
For background on model frameworks (TFLite and TensorFlow), task types, and how deployment works, see the overview.
part-inspector-v1 or package-detector-v1. This name identifies the model
in your organization’s registry.The training job starts. You will see a confirmation message with the job ID.
If you prefer the command line, use the Viam CLI:
viam train submit managed \
--dataset-id=YOUR-DATASET-ID \
--model-org-id=YOUR-ORG-ID \
--model-name=part-inspector-v1 \
--model-type=single_label_classification \
--model-framework=tflite \
--model-labels=good-part,defective-part
Required flags:
| Flag | Description | Accepted values |
|---|---|---|
--dataset-id | Dataset to train on | Your dataset ID |
--model-org-id | Organization to save the model in | Your organization ID |
--model-name | Name for the trained model | Any string |
--model-type | Task type | single_label_classification, multi_label_classification, object_detection |
--model-framework | Model framework | tflite, tensorflow |
--model-labels | Labels to train on | Comma-separated list of labels from your dataset |
--model-version is optional and defaults to the current timestamp.
The command returns a training job ID that you can use to check status.
Web UI:
CLI:
Check the status of a training job:
viam train get --job-id=YOUR-JOB-ID
View training logs:
viam train logs --job-id=YOUR-JOB-ID
Training logs expire after 7 days. If you need to retain logs for longer, copy them before they expire.
async def main():
viam_client = await connect()
ml_training_client = viam_client.ml_training_client
job = await ml_training_client.get_training_job(
id="YOUR-TRAINING-JOB-ID",
)
print(f"Status: {job.status}")
print(f"Model name: {job.model_name}")
print(f"Created: {job.created_on}")
viam_client.close()
job, err := mlTrainingClient.GetTrainingJob(ctx, "YOUR-TRAINING-JOB-ID")
if err != nil {
logger.Fatal(err)
}
fmt.Printf("Status: %s\n", job.Status)
fmt.Printf("Model name: %s\n", job.ModelName)
fmt.Printf("Created: %s\n", job.CreatedOn)
After training completes, test the model against images before deploying it.
Test with a variety of images:
When training completes, the model is stored in your organization’s registry. See Deploy a model to a machine to configure the module, ML model service, and vision service on your machine.
After deploying, improve your model by collecting targeted data where it struggles (edge cases, counterexamples, varied conditions), using auto-annotation to label efficiently, and retraining. If your machine is configured to use the model, the new version deploys automatically.
To review past training jobs:
async def main():
viam_client = await connect()
ml_training_client = viam_client.ml_training_client
jobs = await ml_training_client.list_training_jobs(
org_id=ORG_ID,
)
for job in jobs:
print(f"Job: {job.id}, Status: {job.status}, "
f"Model: {job.model_name}, Created: {job.created_on}")
viam_client.close()
jobs, err := mlTrainingClient.ListTrainingJobs(
ctx, orgID, app.TrainingStatusUnspecified)
if err != nil {
logger.Fatal(err)
}
for _, job := range jobs {
fmt.Printf("Job: %s, Status: %d, Model: %s, Created: %s\n",
job.ID, job.Status, job.ModelName, job.CreatedOn)
}
Was this page helpful?
Glad to hear it! If you have any other feedback please let us know:
We're sorry about that. To help us improve, please tell us what we can do better:
Thank you!