A roundup of technical Q&A's from the DVC and CML community. This month: CML updates, working with multiple datasets, using DVC stages, and more.
This is a really good question from @v2.03.99!
When you use
dvc exp run, DVC automatically tracks each experiment run. Using
dvc repro leaves it to the user to track each experiment.
A great question here from @quarkquark!
You can debug in VSCode by following the steps below:
"Run and Debug" > "Remote Attach" > localhost > someport.
python -m debugpy --listen someport --wait-for-client -m dvc mycommand
This should help you debug the stages in your pipeline in the IDE and you can find more details here.
Thanks for asking @CarsonM!
You should be able to use DVC to list the directory contents of your DVC remotes without pulling the repo. Here's an example of the command you can run:
$ dvc list https://github.com/iterative/dataset-registry/ fashion-mnist/raw
This is a really interesting question from @BrownZ!
It really depends on your use case. Separated remotes might be useful if you want to have granular control over permissions for each dataset.
In general, we would suggest a single remote and setting up a data registry to handle the different datasets through DVC.
It's awesome community members like @pria want to keep up with our releases!
You can follow all of our releases via GitHub notifications. You can browse
release notes at https://github.com/iterative/cml/releases. You can also
subscribe to release updates by clicking the
Watch button in the top-right,
Custom, and checking the
Thanks for the question @1cybersheep1!
Currently, the supported Source Code Management tools are GitHub, GitLab, and Bitbucket. Other SCMs may be a part of the roadmap later on.
Excellent question from @luke_imm!
In GitHub, you can mount volumes to your container, but you have to declare them within the workflow YAML
Join us in Discord to get all your DVC and CML questions answered!