fbpx
Learn to build large language model applications: vector databases, langchain, fine tuning and prompt engineering. Learn more

machine learning models

Murk Sindhya Memon - Author
Murk Sindhya Memon
| July 5

Machine Learning (ML) is a powerful tool that can be used to solve a wide variety of problems. However, building and deploying a machine-learning model is not a simple task. It requires a comprehensive understanding of the end-to-end machine learning lifecycle. 

The development of a Machine Learning Model can be divided into three main stages: 

  • Building your ML data pipeline: This stage involves gathering data, cleaning it, and preparing it for modeling. 
  • Getting your ML model ready for action: This stage involves building and training a machine learning model using efficient machine learning algorithms. 
  • Making sense of your ML model: This stage involves deploying the model into production and using it to make predictions. 
Machine Learning Model Deployment
Machine Learning Model Deployment

Building your ML data pipeline 

The first step of crafting a Machine Learning Model is to develop a pipeline for gathering, cleaning, and preparing data. This pipeline should be designed to ensure that the data is of high quality and that it is ready for modeling. 

The following steps are involved in pipeline development: 

  • Gathering data: The first step is to gather the data that will be used to train the model. For data scrapping a variety of sources, such as online databases, sensor data, or social media.
  • Cleaning data: Once the data has been gathered, it needs to be cleaned. This involves removing any errors or inconsistencies in the data. 

  • Exploratory data analysis (EDA): EDA is a process of exploring data to gain insights into its distribution, relationships, and patterns. This information can be used to inform the design of the model. 
  • Model design: Once the data has been cleaned and explored, it is time to design the model. This involves choosing the right machine-learning algorithm and tuning the model’s hyperparameters. 
  • Training and validation: The next step is to train the model on a subset of the data. Once the model has been trained, it can be evaluated on a holdout set of data to measure its performance. 

Getting your machine learning model ready for action  

Once the pipeline has been developed, the next step is to train the model. This involves using a machine learning algorithm to learn the relationship between the features and the target variable. 

The following steps are involved in training: 

  • Choosing a machine learning algorithm: There are many different machine learning algorithms available. The choice of algorithm will depend on the specific problem that is being solved. 
  • Tuning hyperparameters: Hyperparameters are parameters that control the behavior of the machine learning algorithm. These parameters need to be tuned to achieve the best performance. 
  • Training the model: Once the algorithm and hyperparameters have been chosen, the model can be trained on a dataset. 
  • Evaluating the model: Once the model has been trained, it can be evaluated on a holdout set of data to measure its performance. 

Making sense of ML model’s predictions 

Once the model has been trained, it can be deployed into production and used to make predictions. 

The following steps are involved in inference: 

  • Deploying the model: The model can be deployed in a variety of ways, such as a web service, a mobile app, or a desktop application. 
  • Making predictions: Once the model has been deployed, it can be used to make predictions on new data. 
  • Monitoring the model: It is important to monitor the model’s performance in production to ensure that it is still performing as expected. 

Conclusion 

Developing a Machine Learning Model is a complex process, but it is essential for building and deploying successful machine-learning applications. By following the steps outlined in this blog, you can increase your chances of success. 

Here are some additional tips for building and deploying machine-learning models: 

  • Establish a strong baseline model. Before you deploy a machine learning model, it is important to have a baseline model that you can use to measure the performance of your deployed model. 
  • Use a production-ready machine learning framework. There are a number of machine learning frameworks available, but not all of them are suitable for production deployment. When choosing a machine learning framework for production deployment, it is important to consider factors such as scalability, performance, and ease of maintenance. 
  • Use a continuous integration and continuous delivery (CI/CD) pipeline. A CI/CD pipeline automates the process of building, testing, and deploying your machine-learning model. This can help to ensure that your model is always up-to-date and that it is deployed in a consistent and reliable manner. 
  • Monitor your deployed model. Once your model is deployed, it is important to monitor its performance. This will help you to identify any problems with your model and to make necessary adjustments 
  • Using visualizations to understand the insights better. With the help of the model many insights can be drawn, and they can be visualized using software like Power BI 

 

Data Science Dojo
Stephanie Kirmer
| March 3

Data science model deployment can sound intimidating if you have never had a chance to try it in a safe space. Do you want to make a rest API or a full frontend app? What does it take to do either of these? It’s not as hard as you might think. 

In this series, we’ll go through how you can take machine learning models and deploy them to a web app or a rest API (using saturn cloud) so that others can interact. In this app, we’ll let the user make some feature selections and then the model will predict an outcome for them. But using this same idea, you could easily do other things, such as letting the user retrain the model, upload things like images, or conduct other interactions with your model. 

Just to be interesting, we’re going to do this same project with two frameworks, voila and flask, so you can see how they both work and decide what’s right for your needs. In a flask, we’ll create a rest API and a web app version.
A

Learn data science with Data Science Dojo and Saturn Cloud
               Learn data science with Data Science Dojo and Saturn Cloud – Data Science DojoA

a
Our toolkit
 

Other helpful links 

The project – Deploying machine learning models

The first steps of our process are exactly the same, whether we are going for voila or flask. We need to get some data and build a model! I will take the us department of education’s college scorecard data, and build a quick linear regression model that accepts a few inputs and predicts a student’s likely earnings 2 years after graduation. (you can get this data yourself at https://collegescorecard.ed.gov/data/) 

About measurements 

According to the data codebook: “the cohort of evaluated graduates for earnings metrics consists of those individuals who received federal financial aid, but excludes those who were subsequently enrolled in school during the measurement year, died before the end of the measurement year, received a higher-level credential than the credential level of the field of the study measured, or did not work during the measurement year.” 

Load data 

I already did some data cleaning and uploaded the features I wanted to a public bucket on s3, for easy access. This way, I can load it quickly when the app is run. 

Format for training 

Once we have the dataset, this is going to give us a handful of features and our outcome. We just need to split it between features and target with scikit-learn to be ready to model. (note that all of these functions will be run exactly as written in each of our apps.) 

 Our features are: 

  • Region: geographic location of college 
  • Locale: type of city or town the college is in 
  • Control: type of college (public/private/for-profit) 
  • Cipdesc_new: major field of study (cip code) 
  • Creddesc: credential (bachelor, master, etc) 
  • Adm_rate_all: admission rate 
  • Sat_avg_all: average sat score for admitted students (proxy for college prestige) 
  • Tuition: cost to attend the institution for one year 


Our target outcome is earn_mdn_hi_2yr: median earnings measured two years after completion of degree.
 

Train model 

We are going to use scikit-learn’s pipeline to make our feature engineering as easy and quick as possible. We’re going to return a trained model as well as the r-squared value for the test sample, so we have a quick and straightforward measure of the model’s performance on the test set that we can return along with the model object. 

Now we have a model, and we’re ready to put together the app! All these functions will be run when the app runs, because it’s so fast that it doesn’t make sense to save out a model object to be loaded. If your model doesn’t train this fast, save your model object and return it in your app when you need to predict. 

If you’re interested in learning some valuable tips for machine learning projects, read our blog on machine learning project tips.

Visualization 

In addition to building a model and creating predictions, we want our app to show a visual of the prediction against a relevant distribution. The same plot function can be used for both apps, because we are using plotly for the job. 

The function below accepts the type of degree and the major, to generate the distributions, as well as the prediction that the model has given. That way, the viewer can see how their prediction compares to others. Later, we’ll see how the different app frameworks use the plotly object. 

 

 This is the general visual we’ll be generating — but because it’s plotly, it’ll be interactive! 

Deploying machine learning models
Deploying machine learning models

You might be wondering whether your favorite visualization library could work here — the answer is, maybe! Every python viz library has idiosyncrasies and is not likely to be supported exactly the same for voila and flask. I chose plotly because it has interactivity and is fully functional in both frameworks, but you are welcome to try your own visualization tool and see how it goes.  

Wrapping up

In conclusion, deploying machine learning models to a web app or REST API can seem daunting, but it’s not as difficult as it may seem. By using frameworks like voila and Flask, along with libraries like scikit-learn, plotly, and pandas, you can easily create an app that allows users to interact with machine learning models. In this project, we used the US Department of Education’s college scorecard data to build a linear regression model that predicts a student’s likely earnings two years after graduation.

 

Related Topics

Statistics
Resources
Programming
Machine Learning
LLM
Generative AI
Data Visualization
Data Security
Data Science
Data Engineering
Data Analytics
Computer Vision
Career
Artificial Intelligence