Building a Continuous System using Machine Learning and DevOps
Why 60% of Machine Learning Projects are never implemented?
The major reason behind the scene is the manual work behind the scene, and this is due to Hyperparameters. Hyperparameters are those parameters whose values are set before the learning process begins, and it needs to be specified manually , unlike Model Parameters , which are training set properties that learn on its own during training by ML model.
Hyperparameters are important as they affect the behavior of the training algorithm , also they have an important impact on performance of model under training. Hyperparameters are deciders in this regards and needs to be set up judiciously. It could be of great hindrance to the concept of automation , which we can’t imagine life without in this world.
So, to rectify this , an automation system could be developed that could tweak the values of the same so as to improve the accuracy and it is possible by integrating Machine Learning with DevOps tools like Git and Jenkins.
Since there are multiple parts for the same , let’s understand each part one by one.
Part 1 : Setting up Dockerfile for creating Docker Image for Machine Learning
For facilitating the execution of ML Code inside a Docker container , Docker image with required libraries could be created using Dockerfile
The Dockerfile created in this case is as follows :
Image created from the same has all the setup mentioned here and the one mentioned with COPY and CMD is executed after the container is created from the image generated.
Part 2 : GitHub Setup
In this job, the developer pushes the code into GitHub
Initially , we have to set up our local repository in our respective local machine and it could be set up using Git Bash, in our case , we have set up folder named “ml_code” holding the program files. The commands to set up the same are as follows :
vim nn.py # you can use notepad or any other text editor
Before creating our own local repository , we first need to create an empty repository in GitHub , after creating , convert the existing directory i.e,ml_code and push the program file “nn.py” to the GitHub Repository using the following commands :
git add *
git commit -m "CNN"
git remote add origin https://github.com/satyamcs1999/MLOps.git
git push -u origin master
For pushing the program files to our master branch in GitHub repository , we need to specify “git push -u origin master(only during first time) or git push”, but by using the the hooks/ directory within .git/, we can modify it in such a way that it would commit and also push without specifying any separate command for the same, first of all we need to create a file named “post-commit” and script to be included are as follows :
After this setup , GitHub would look like this:
Part 3 : Jenkins Jobs
Job 1 : This job first pulls the code as soon as Jenkins detect changes in connected GitHub repo.
Setting up Webhook using ngrok
The triggers we use are GitHub hook triggers that could be setup by adding a webhook to our GitHub repo , and for creating webhook in our repo , we need a public URL that could be generated by using ngrok, which uses the concept of Tunneling
./ngrok http 8080
The code above tests the presence of directory called dlcode, if present , the code from the GitHub repo is copied to dlcode directory , if no, it first creates it and then do the same process. This job is an upstream project for Job 2.
Job 2 : This job mounts the code present in dlcode directory to the train directory inside the container created and starts executing that code, after which it checks for accuracy and compares it the threshold specified , if satisfied , it sends an email stating that the “Threshold Accuracy Reached!!!” and prematurely exits the Jenkins pipeline using exit 1 or else moves on to Job 3, thereby acts as an upstream project for Job 3 and downstream project for Job 1 .
Job 3 : This job runs if the threshold is not satisfied by the accuracy in the previous job , first it checks if the container is stopped and then it checks for the accuracy, if it doesn’t satisfy the threshold , the no. of epochs are tweaked and increased by 10 , and then execution is performed , after which it’s accuracy is checked , this process takes place till the accuracy is greater than threshold and after it happens , a mail is sent to the developer stating “Threshold Accuracy Reached!!!” and the overall process gets completed.
It is a downstream project for Job 2
Email received after successful execution of process
In GitHub, master branch has been renamed to main branch, therefore before pushing the code to the GitHub repository, branch could be switched from master to main using the command mentioned below:
git branch -M main
Thank You !!!
Satyam Singh - ARTH Learner - ARTH - The School of Technologies | LinkedIn
View Satyam Singh's profile on LinkedIn, the world's largest professional community. Satyam has 3 jobs listed on their…
GitHub Repository(Mentioned Above):
Contribute to satyamcs1999/MLOps development by creating an account on GitHub.
GitHub Repository(Contains complete code used above with README):