This month you will find:
❓ Will NLP have more impact than Computer Vision,
🐙 Dmitry Petrov speaks at GitHub Universe,
🧐 CML in research at NeurIPS,
❣️ Unstructured Data Catalog coming,
✅ SOC 2 Type 1 Compliance,
🚀 MLEM adds Sagemaker and Kubernetes deployment,
👀 Lots of new docs,
🚀 Upcoming events, and more!
Image generated with the help of Stable Diffusion
Welcome to November! In the US, this is the time of year we reflect and give thanks. It's been a productive year despite the world's rather extreme challenges. There's lots to be thankful for. Here are some of those things from the last month in the Iterative Community.
In this article entitled The Biggest Opportunity In Generative AI Is Language, Not Images, Robert Toews argues that AI-powered text generation will create many orders of magnitude more value than text-generated images.
Language is humanity’s single most important invention. More than anything else, it is what sets us apart from every other species on the planet. Language enables us to reason abstractly, to develop complex ideas about what the world is and could be, to communicate these ideas to one another, and to build on them across generations and geographies. Almost nothing about modern civilization would be possible without language.
He points out the many examples from a variety of industries and academia that have gained and will continue to gain massive improvements due to the power of large language models (LLMs) in the coming years. Read the article for all the applications.
The State of AI Report is generated each year and reports on the most interesting things the authors, Nathan Benaich, Ian Hogarth, Othmane Sebbouh, and Nitarshan Rajkumar come across in the world of AI throughout the year.
Be sure to digest the whole report for even more AI advances!
💓 So for our “Pulse check” this month:
Do you agree that NLP will have more impact than computer vision? Tell us about what you are working on with NLP. We’d love to get you connected with others struggling with similar issues and know how we can improve our tools to help you with your NLP projects.
Join us in the
#general channel in
Discord to weigh in.
We would like to thank Francesco Calcavecchia, vvssttkk, and deepyaman for their contributions to GTO, MLEM, and CML respectively. They will be receiving their own personalized shirts that note their contributions! And many thanks to Mert Bozkir for leading the Hacktoberfest charge here at Iterative!
One of our Community Champions, João Santiago of Billie.io gives an introduction to DVC in preparation for the remainder of the session where Carsten Behring, author of Metamorph and the scicloj.ml platform presents how NLP pipelines can be managed with DVC, Closure & Python.
Last month we reported on CML turning up in research here. Well, this work will be presented within the virtual Workshop Challenges In Deploying and Monitoring Machine Learning Systems at NeurIPS virtual this year on December 9th. Find out more and register here.
Research on CML to be presented at NeurIPS (Source link)
Do you use Amazon S3, Azure Blob Storage, or Google Cloud Storage? We have a new solution for finding and managing your datasets of unstructured data like images, audio files, and PDFs! Extend your DVC environment with the first data catalog and query language (SQL->DQL) for unstructured data and machine learning. Learn more on our website and/or schedule a meeting with us!
In case you missed it MLEM announced a release on Halloween! MLEM now supports Sagemaker and Kubernetes in addition to Heroku and Docker. You can learn about how easy it now is to package your models for deployment with only a few lines of code and never have to get lost in Kubernetes docs again! Find the blog post here and be sure to visit the docs!
We are very excited to announce that Iterative is now SOC 2 Type 1 compliant. This certification signals to our customers our commitment to Security, Availability, Processing Integrity, Confidentiality, and Privacy within our organization. We have successfully endured the rigorous process and have learned much as a team in the process. Guro Bokum reviews the five key learnings in this blog piece. You can find the full report on our Security and Privacy page.
On November 8th, our CEO, Dmitry Petrov spoke at GitHub Universe on ML with Git: experiment tracking in Codespaces. In his presentation, he shows how to use the DVC extension for VS Code and Codespaces to streamline your machine learning experimentation process. You can find his video below in the event platform if you are registered. We expect the video to be available on YouTube in the next of couple months. We'll keep you updated!
Jupyter Notebooks are great for prototyping, but eventually, you will want to move toward reproducible experiments. Converting a notebook to a DVC pipeline requires a bit of a mental shift. Rob de Wit shows you how to accomplish it with an intermediate step: use Papermill to build a one-stage DVC pipeline that executes our entire notebook, and use the resulting pipeline to run and version ML experiments. Look out for a future post with a more advanced pipeline!
At our next meetup on December 14th, Sami Jawhar will present An Open Discussion of Parallel data pipelines with DVC and TPI, an advanced use case for distributing experiments in the cloud. Sami is a great discussion driver. If you are interested in higher-level use cases you will want to join the discussion!
On January 11th, Francesco Calcavecchia will be joining us to share about his recent contribution to MLEM through his work on GTO and how this helps him in his work at E.On Energie Deutschland with creating a Git-based model registry.
We had a great time at ODSC West! We had great conversations with conferencegoers and attended great sessions! Dmitry had a packed room for his in-person talk Why You Need a GitOps-based Machine Learning Model Registry and Alex Kim presented CI/CD for Machine Learning virtually. At each of the conferences we've sponsored this year, we've had a game called Deevee's Ramen Run. (If you don't know the Ramen connection, you need to spend more time reading the monthly Heartbeats 😉). Below find the top three winners of the game.
We were also part of the MLOps Summit in London only a week later! Admittedly, there were different team members in attendance and staffing the booth. Aside from attending a variety of great talks, we met many wonderful people from all over the world. This resulted in some really interesting discussions about how different companies approach MLOps.
Casper da Costa-Luis gave a well-received talk on how to painlessly run ML experiments in the cloud with CML at the summit. The recording will be made available in the near future, so look out for that! The talk answered at least one of the questions of Deevee's Ramen Run, which yielded some surprised (but excited!) winners this time around.
Gema Parreño Piqueras presented at TechWeek in Spain with her talk Reproducibilty and Version Control are Important: Follow up with the DVC extension for VS Code. She will be presenting the same talk at Codemotion. You can find her talk in Spanish at 2:02 below!
Stay tuned to our Newsletter for what we will be up to conference-wise in 2023!
The team has been busy improving the docs for you. See all the latest and greatest updates below.
dvc plots show!
dvc ls-urlFind the description, options, and example code here.
cml comment. Find the options here.
And finally, this month's winning Tweet is a thread from Robert Boscacci.
Managing large files (📹🔊📸) for deep learning projects can be a nightmare 😰 . Git isn't built to handle them natively.— @firstname.lastname@example.org (@cinemarob1) September 16, 2022
Here's how to use DVC to seamlessly track and version large files. 🚀
🎁 BONUS: Learn to sync those files with remote ☁️ storage such as @awscloud S3! pic.twitter.com/eO1BEwHEbF
Do you have any use case questions or need support? Join us in Discord!
Head to the DVC Forum to discuss your ideas and best practices.