How a Typical Machine Learning System Works
How a Typical Machine Learning System Works
-
Data Collection and PreparationThe process begins with collecting data from various sources such as sensors, databases, user input, or web services. The data is then cleaned, formatted, and preprocessed to handle missing values, noise, or irrelevant features.
-
Model SelectionBased on the problem type (e.g., classification, regression, clustering), a suitable algorithm or model is selected. This could be a decision tree, linear regression model, neural network, or any other appropriate technique.
-
Training the ModelThe selected model is trained using a training dataset. The learning algorithm adjusts the internal parameters of the model to minimize a cost/loss function (a measure of how wrong the model is). This is the core learning phase.
-
EvaluationAfter training, the model is evaluated using a separate validation or test dataset to check how well it generalizes to new, unseen data. Performance metrics like accuracy, precision, recall, or RMSE are used here.
-
Inference (Prediction)Once validated, the trained model is deployed and used to make predictions on new input data. This process is called inference. For example, predicting whether a new email is spam or not.
-
Model Updating (Optional)In batch learning, the model is retrained periodically with updated data. In online learning, the model continuously learns and updates with new data in real-time.
This end-to-end cycle—from data to prediction—is how most machine learning systems operate in real-world applications.
For example, suppose you want to know if money makes people happy, so you download the Better Life Index data from the OECD’s website and stats about gross domestic product (GDP) per capita from the IMF’s website. Then you join the tables and sort by GDP per capita. Table 1-1 shows an excerpt of what you get. ( Data Collection and Preparation)
There does seem to be a trend here! Although the data is noisy (i.e., partly random), it looks like life satisfaction goes up more or less linearly as the country’s GDP per capita increases. So you decide to model life satisfaction as a linear function of GDP per capita. This step is called model selection: you selected a linear model of life satisfaction with just one attribute, GDP per capita (Equation 1-1).
Equation 1-1. A simple linear model
$$lifesatisfaction = θ_0 + θ_1 × GDP_{per-capita}$$
This model has two model parameters, $θ_0$ and $θ_1$. By tweaking these parameters, you can make your model represent any linear function, as shown in Figure 1-18.
Before you can use your model, you need to define the parameter values $θ_0$ and $θ_1$. How can you know which values will make your model perform best? To answer this question, you need to specify a performance measure. You can either define a utility function (or fitness function) that measures how good your model is, or you can define a cost function that measures how bad it is. For Linear Regression problems, people typically use a cost function that measures the distance between the linear model’s predictions and the training examples; the objective is to minimize this distance.
This is where the Linear Regression algorithm comes in: you feed it your training examples, and it finds the parameters that make the linear model fit best to your data. This is called training the model. In our case, the algorithm finds that the optimal parameter values are $θ_0 = 4.85$ and $θ_1 = 4.91 × 10^{–5.}$
Now the model fits the training data as closely as possible (for a linear model), as you can see in Figure 1-19.
You are finally ready to run the model to make predictions. For example, say you want to know how happy Cypriots are, and the OECD data does not have the answer. Fortunately, you can use your model to make a good prediction: you look up Cyprus’s GDP per capita, find dollar 22,587, and then apply your model and find that life satisfaction is likely to be somewhere around $4.85 + 22,587 × 4.91 × 10^{-5} = 5.96.$
Comments
Post a Comment