Use of Machine Learning in Data Science

May 29, 2025

Use of Machine Learning in Data Science

Machine Learning is one of the most powerful tools in data science. While statistics helps us describe and test data, ML enables computers to automatically learn patterns from data and make decisions or predictions.

Here’s how ML is used in data science:

1. Making Predictions

ML models can use past data to predict future outcomes.
Example: Predicting sales for the next month, predicting exam scores from study hours.

2. Classifying Data

ML can separate data into categories.
Example: Email → spam or not spam, medical diagnosis → disease present or not.

3. Finding Patterns and Groups

ML can discover hidden structures in data.
Example: Grouping customers with similar buying behavior (customer segmentation).

4. Recommendation Systems

ML personalizes experiences by suggesting items.
Example: Netflix recommending movies, Amazon suggesting products.

5. Detecting Anomalies

ML identifies unusual patterns in data.
Example: Detecting fraudulent transactions in banking, spotting cyberattacks in networks.

6. Understanding Natural Language

ML helps computers understand and process human language.
Example: Chatbots, Google Translate, sentiment analysis of social media posts.

7. Working with Images and Videos

ML is widely used for recognition and detection.
Example: Face recognition in phones, self-driving cars detecting traffic signs.

8. Automating Decisions

ML reduces the need for human intervention in repetitive decision-making.
Example: Credit scoring, loan approval, automated medical image diagnosis.

In Short

👉 Machine Learning gives intelligence to data science.

Statistics → tells us what the data says.
Machine Learning → tells us what will happen next and helps us make decisions automatically.

Example Python code

import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt

# Sample data
study_hours = [2, 3, 4, 5, 6, 7, 8, 9]
exam_scores = [50, 55, 60, 65, 70, 75, 80, 85]

# Convert to arrays (X should be 2D for sklearn)

X = np.array(study_hours).reshape(-1, 1)
y = np.array(exam_scores)

# Create and train the model
model = LinearRegression()
model.fit(X, y)

# Predict exam scores for new study hours
new_hours = np.array([10, 11, 12]).reshape(-1, 1)
predicted_scores = model.predict(new_hours)

print("Predicted Scores for [10, 11, 12] hours of study:")
print(predicted_scores)

# Visualization
plt.scatter(study_hours, exam_scores, color="blue", label="Actual Data")
plt.plot(study_hours, model.predict(X), color="red", label="Regression Line")
plt.xlabel("Study Hours")
plt.ylabel("Exam Scores")
plt.title("Study Hours vs Exam Scores (Linear Regression)")
plt.legend()
plt.show()

Search This Blog

GNEST305 Introduction to Artificial Intelligence and Data Science KTU BTech S3 2024 Scheme