Every month we share news, findings, interesting reads, community takeaways, and everything else along the way. Look here for updates about DVC, our journey as a startup, projects by our users and big ideas about best practices in ML and data science.
A big hello from DVC mascot DeeVee.
Welcome to the May Heartbeat, our monthly roundup of cool happenings, new releases, good reads and other noteworthy developments the DVC community.
DVC turns 3. On May 4th, we celebrated DVC's third birthday! Fearless leader Dmitry Petrov wrote a retrospective about how the team has grown and what we've learned from our users, contributors, and colleagues. Thanks to everyone who celebrated with us!
Ambassador program launched. DVC has just kicked off our ambassador program with the help of our first ambassador, Marcel Ribeiro-Dantas. Marcel is an early-stage researcher at the Institut Curie, a veteran ambassador of the Fedora Project, and a data science blogger. Becoming an ambassador is a way for folks who are passionate about contributing to the DVC community to get recognized for their efforts. It's also a way for us to help volunteers with financial support for meetups and travel, as well as chances to work more closely with our team. The program is ideal for anyone who already likes blogging about DVC, contributing code, and hosting get-togethers (virtual or otherwise), but especially advanced students and early career data scientists and engineers! Learn more about it here.
DVC is part of 2020 Google Season of Docs. Another way to get involved with DVC is through Google Season of Docs, a program we're participating in for the second year in a row. This program is for technical writers to get paid experience working with the DVC team in fall 2020. Right now, we're accepting proposals from interested writers. Find out more here.
5000 GitHub Stars. It finally happened- we passed 5,000 stars on our GitHub repo!
Coincident with DVC's 3rd birthday, we shared a pre-release of DVC 1.0. The release is expected in a few weeks, but you can experiment with 1.0 now (and make tickets in our project repo if you get a bug 🐛). Some major new features include:
Run cache, a cache of pipelines you've reproduced on your local workspace.
If you re-run
dvc repro on a pipeline version that's already been executed,
run cache will save you compute time by returning the cached result.
Multi-stage DVC files. Users reported that their DVC pipelines changed a
lot, so we've made pipeline
.dvc files more human-readable and editable for
Plots We've got plots powered by Vega-Lite for making beautiful vizualizations comparing model performance across commits! Developer Paweł Redzyński is hard at work:
Visual aids come to DVC 1.0, with my little help. pic.twitter.com/Fd1qVr7rHb— Pablito (@Paffciu1) May 12, 2020
You can read more about the big updates coming in DVC 1.0 in our birthday blog.
Developers weren't the only ones hustling this month…
First ever virtual DVC Meetup. Marcel, our new ambassador, lead an initiative to organize a virtual meetup! Marcel shared his latest scientific work about creating a new comprehensive dataset about mobility during the COVID-19 pandemic and then passed off the mic to our two guest speakers. Data scientist Elizabeth Hutton spoke how she was building a workflow for her NLP team with DVC, and DAGsHub co-founder Dean Pleban shared his custom remote file system setup for modeling Reddit post popularity. It was quite well-attended for our first ever virtual hangout: we logged 40 individual logins to the meetup with more than 30 people staying the whole time! A video of the meetup is on the event page, so you can still check out the talks and discussion we enjoyed.
It was awesome speaking at the @DVCorg meetup about @reddit post popularity prediction and DVC #remote working file systems. Also a lot of #DAGs. pic.twitter.com/5WKTlIEvHK— Dean 🐶 (@DeanPlbn) May 7, 2020
Some blogs we like. As usual, there's a lot of share-worthy writing in the data science and MLOps space:
Last, here are some of our favorite tweets to read this past month:
Data version control from @DVCorg is one of the best new tools I've used in a while. Moving data via the cloud is just a push or pull command away.— Liam Brannigan (@braaannigan) May 6, 2020
Recommend for anyone who works on multiple machines or shares data with collaborators
Getting around to learning @DVCorg, and loving it so far. Versioning data with git-style semantics gives you a lot of functionality with surprisingly little cognitive overhead.— Tim Garvin (@tcgarvin) May 8, 2020
Thank you, thank you very much.
As always, we want to hear what you're making with DVC and what you're reading. Tell us in the blog comments, and be in touch on Twitter and Discord channel. Happy coding!