This month you will find:
🎙 Andrew Ng Intel Keynote talk,
🇺🇸 White House Blueprint for AI Bill of Rights,
🧐 CML in research,
🎥 Nadia Nahar video: Collaboration Challenges in ML-Enabled Systems,
🐉 DVC-Hydra integration,
🗣 CI/CD for Machine Learning upcoming webinar,
🚀 New hire, and more!
Welcome to October! As the days grow shorter or longer depending on your hemisphere, we bring you the latest and greatest from the Iterative Community.
At Intel’s Innovation conference, Andrew Ng gave a keynote on democratizing AI. He posits that while large companies have embraced AI, most smaller companies outside of the consumer-based domains still struggle. He provides two main reasons for this: small datasets and customization.
According to Ng, data-centric AI will be the key to unlocking that potential, forcing a paradigm shift away from code-centric AI. In this scenario, people could take mostly ready-built ML tech and focus on the data to ensure it captures all necessary domain knowledge.
For example, two companies that produce cornflakes and medication could take the same ML model and train it on their respective datasets. As long as they have the right tools and practices and provide a domain representative dataset, the same model can reproduce effective results. If you want to see some of the tools Ng uses, make sure to check out his keynote.
What do you think? Does the average data scientist need a different set of skills in the near future? Are you in one of these smaller industries that are starting to embrace AI? We'd love to read your thoughts! Join us in our discussion of this topic on Discord!
If you will recall from last month's Heartbeat we called to your attention the EU AI Act. This act proposes new rules that would require that open source developers adhere to guidelines across a spectrum of categories including risk management, data governance, technical documentation and transparency, standards and accuracy, and cyber security. Not to be outdone, the US White House declared a Blue Print for an AI Bill of Rights. The White House Office of Science and Technology Policy (OSTP) has defined 5 categories for these rights:
There's definitely some overlap here with the EU AI Act and some catching up with Data Privacy in the mix. There's lots to unpack, compare, and contrast on scope and philosophy between the two. It's nice to see that major attention is given to these issues.
We could think of the relationship between AI rights and Andrew Ng's talk in the sense of the AI space maturing. To Andrew Ng's points, as we move from the frenzied all-important model development to an understanding of the need for a data-centric approach and this democratization, we are changing the focus to enable us to adequately address these hard and important issues. Improving the efficiency of tooling will help with this too. That's why we are here.
What do you think? Do the efficiencies we are gaining open up room for improved time/attention to bake protections into the process or am I too hopeful? Head to Discord and share your thoughts!
Did you hear? DVC has a new integration with Hydra. Now you can use Hydra composition to configure your DVC experiments. You can also apend and remove parameters on the fly as well as do a grid search of parameters. Random search functionlity is coming, weigh in on the issue here. Find out more in David de la Iglesia's blog post.
If you missed the October Meetup with Nadia Nahar presenting her team's research on Collaboration Challenges in Building ML-Enabled Systems: Communication, Documentation, Engineering, and Process don't worry, there's a video! Catch it below!
Join us for our next meetup on November 16th. We will have Dmytro Filatov of DeepX presenting Continous Computer Vision with DVC and CML and Jelle Bouwman demoing Iterative Studio Model Registry. Be sure to register here!
Join Alex Kim on November 30th with ODSC to learn about CI/CD for Machine Learning. This webinar shares how CML is a project to help ML and data science practitioners automate their ML model training and model evaluation, using best practices and tools from software engineering, such as GitLab CI/CD (as well as GitHub Actions and BitBucket Pipelines). The idea is to automatically train your model and test it in a production-like environment every time your data or code changes. In this talk, you'll learn how to:
Sign up for the talk here.
Alex Kim webinar CI/CD for Machine Learning for ODSC (Source link)
It's Hacktoberfest month and we are participating! Find out all the information
in Mert Bozkir's
blog post. But if
you just want to jump in, find all the open HackToBerFest issues
Follow along in the
#hacktoberfest channel in Discord to keep up to date for
the rest of the month and be sure to read next month's Heartbeat to learn of the
Ivan Longin joins us as a Senior Software Engineer on the Iterative Studio team from Zadar, Croatia. When Ivan's not working he likes to spend time doing outdoor activities, swimming in good weather, and or just walking or often running after his one-year-old! Been there three times over! ❤️ Welcome Ivan!
This month was full of great content. We wanted to give a shout-out to all of
it, so we are trying out a more abbreviated list.
Thanks to all these amazing Community members that are sharing their knowledge! 🚀
A little belated but neverthless hugely interesting post by my co founders @m_a_upson in which he touches on some core tools we use at Mantis like @DVCorg, @Rasa_HQ and continuous machine learning.— Nick Sorros (@nsorros) September 19, 2022
It comes with code 💻 so you can take some of what you will read and use 🚀 https://t.co/PHgLXtvckz
Do you have any use case questions or need support? Join us in Discord!
Head to the DVC Forum to discuss your ideas and best practices.