Directory Image
This website uses cookies to improve user experience. By using our website you consent to all cookies in accordance with our Privacy Policy.

Data Scientists Need DevOps

Author: Ankita Garg
by Ankita Garg
Posted: Jun 03, 2022

When most data scientists begin working, they are armed with all of the cool math concepts they learned in school. However, they quickly realise that the majority of data science work involves converting data into the format required by the model. Furthermore, the model under development is part of an application for the end user and who are plannng to get aws devops professional certification.

A proper data scientist would keep their model codes version controlled on Git. The codes would then be downloaded by VSTS from Git. VSTS would then be wrapped in a Docker Image and placed on a Docker container registry. It would be orchestrated using Kubernetes once it was added to the registry.

Say all of that to the average data scientist, and his mind will shut down completely. Most data scientists understand how to generate a static report or CSV file containing predictions. But how do we version control the model and integrate it into an app? Based on the outcome, how will people interact with our website? How will it grow!?

All of this would necessitate confidence testing, ensuring that nothing falls below a predetermined threshold, approval from various parties, and orchestration between various cloud servers (with all its ugly firewall rules). This is where a basic understanding of DevOps would come in handy.

Developers have their own chain of command (project managers) who want to release features for their products as soon as possible. This would imply changing the model structure and variables for data scientists. They are unconcerned about what happens to the machinery. Is there smoke coming from a data centre? They couldn't care less as long as they get their data to finish the final product.

IT is at the other end of the spectrum. Their job is to keep all of the servers, networks, and pretty firewall rules running smoothly. Cybersecurity is another major concern for them. They couldn't care less about the company's customers as long as the machines worked flawlessly. DevOps serves as a bridge between developers and IT.

The remainder of this blog will go over the entire Continuous Integration and Deployment process in great detail (or atleast what is relevant to a Data Scientist). Before continuing, please consider the following. Understand the business problem and avoid becoming attached to the tools. The tools mentioned in the blog will change, but the underlying problem will essentially remain the same (for the foreseeable future atleast). Consider sending your code to production. It also works! Perfect.

There are no complaints. As time passes, you continue to add new features and develop it. However, one of these features introduces a bug into your code, causing your production application to crash. You were hoping that one of your numerous unit tests would have caught it.

About the Author

I am an IT engineer with 5 years of experience. I have completed multiple trainings and certifications.

Rate this Article
Leave a Comment
Author Thumbnail
I Agree:
Comment 
Pictures
Author: Ankita Garg

Ankita Garg

Member since: Sep 21, 2021
Published articles: 13

Related Articles