Model Question Paper and Answers KTU GNEST305 AI and Data Science

Q1. List the types of machine learning systems with one example each. (CO1)

Supervised Learning – The model is trained on labeled data (input–output pairs) to predict outcomes.
Example: Spam email detection (spam or not spam).
Unsupervised Learning – The model finds hidden patterns or groupings in unlabeled data.
Example: Customer segmentation in marketing.
Reinforcement Learning – The model learns by interacting with an environment and receiving rewards or penalties.
Example: Self-driving cars improving navigation by trial and error.

Q2. What is the difference between classification and regression? (CO1)

Classification: Predicts categorical or discrete outcomes. The output is a label or class.
Example: Predicting whether a patient has a disease (Yes/No).
Regression: Predicts continuous numerical outcomes. The output is a real value.
Example: Predicting house prices or stock values.

Key point: Classification → categories; Regression → numbers.

Q3. What is the importance of linear algebra in data representation? (CO2)

Linear algebra provides the foundation for representing and processing data in machine learning:

Data is stored in vectors and matrices for efficient manipulation.
Algorithms like PCA, SVD, and neural networks rely on matrix operations.
It enables compact representation of large datasets and fast computations.

Thus, linear algebra is essential for data transformations, feature extraction, and optimization in ML.

Q4. Define Singular Value Decomposition (SVD) with its component matrix and its relevance with reference to the rows and columns of the data matrix. (CO2)

SVD decomposes a matrix A into three matrices:

A = U \, \Sigma \, V^T

U → left singular vectors (represent row features).
Σ → diagonal matrix of singular values (importance of components).
Vᵀ → right singular vectors (represent column features).

Relevance:

Captures relationships among rows and columns.
Reduces dimensionality by keeping only top singular values.
Widely used in data compression, noise reduction, and recommendation systems.

Q5. Define a random variable with an example. (CO3)

A random variable is a function that assigns numerical values to outcomes of a random process or experiment.

Discrete random variable: Takes countable values.
Example: Rolling a die → values {1, 2, 3, 4, 5, 6}.
Continuous random variable: Takes infinite possible values within a range.
Example: Measuring height of students.

Q6. Differentiate between correlation and regression in terms of their applicability in statistical modelling. (CO3)

Correlation:
- Measures the strength and direction of association between two variables.
- Does not imply causation.
- Example: Height and weight correlation.
Regression:
- Establishes a functional or predictive relationship between variables.
- One variable is dependent, the other(s) independent.
- Example: Predicting sales based on advertising expenditure.

Q7. What are the main benefits of data science in modern industries? (CO4)

Improved decision-making: Data-driven insights help managers choose better strategies.
Efficiency and automation: AI/ML models automate repetitive tasks.
Personalization: Enhances customer experience (e.g., product recommendations).
Innovation: Helps in discovering new business opportunities and products.

Thus, data science boosts competitiveness and growth in industries.

Q8. What is meant by Big Data, and how is it related to data science? (CO4)

Big Data: Refers to massive datasets characterized by the 3Vs:
- Volume (large size),
- Velocity (speed of generation),
- Variety (different formats like text, images, videos).
Relation to Data Science:
Data science provides methods and algorithms to process, analyze, and extract insights from Big Data, which cannot be handled by traditional tools.

PART B

Q9 (a). In binary classification, a model outputs a probability of p = 0.8. If the decision boundary is 0.5, state the predicted class. Justify your reasoning. (CO1)

Since p = 0.8 > 0.5, the model predicts Class 1.
Reason: If probability ≥ 0.5, the instance is classified as positive (Class 1); otherwise, negative (Class 0).

Q9 (b). Apply one step of K-means clustering to the dataset {2, 4, 8, 10} with k = 2 and initial centroids 2 and 8. Show new clusters. (CO1)

Initial centroids: C1 = 2, C2 = 8
Distance from data points:
- 2 → C1(0), C2(6) → Cluster 1
- 4 → C1(2), C2(4) → Cluster 1
- 8 → C1(6), C2(0) → Cluster 2
- 10 → C1(8), C2(2) → Cluster 2
New clusters:
- Cluster 1 = {2, 4}
- Cluster 2 = {8, 10}
New centroids:
- C1 = (2+4)/2 = 3
- C2 = (8+10)/2 = 9

Q9 (c). A dataset has 100 samples. A classification model misclassifies 12 samples. Compute its accuracy. (CO1)

Total samples = 100
Correct predictions = 100 – 12 = 88
Accuracy = Correct / Total = 88 / 100 = 0.88 = 88%

Q10 (a). For input x = [1, 1], weights w = [0.5, 0.5], and bias b = 0, compute the perceptron output before applying activation. (CO1)

Formula: $z = w \cdot x + b$
Calculation: $z = (0.5 \times 1) + (0.5 \times 1) + 0 = 0.5 + 0.5 = 1.0$
Output before activation = 1.0

Q10 (b). Explain with an example how a Multi-Layer Perceptron (MLP) can solve a problem that a single perceptron cannot. (CO1)

A single perceptron can only solve linearly separable problems.
XOR problem is not linearly separable → cannot be solved by one perceptron.
An MLP with hidden layers applies nonlinear transformations, making it possible to classify XOR correctly.
Example: MLP with two hidden neurons can separate XOR points.

Q10 (c). Logistic regression is used to predict whether a patient has a disease (Yes/No). Explain how the probability output can be used for decision-making. (CO1)

Logistic regression outputs a probability between 0 and 1.
A decision threshold (e.g., 0.5) is set:
- If probability ≥ 0.5 → predict “Disease: Yes”.
- If probability < 0.5 → predict “Disease: No”.
This allows doctors to make binary decisions based on risk level.

Q11 (a). Find the eigenvalues of $A = (\begin{array}{cc} 2 & 0 \\ 0 & 3 \end{array}) ..(CO2)$

Matrix is diagonal, so its eigenvalues are the diagonal entries: $λ_{1} = 2, λ_{2} = 3.$

Q11 (b). Apply spectral decomposition to the symmetric matrix $A=\begin{pmatrix}1 & 2\\[4pt]2 & 4\end{pmatrix}.$ (CO2)

Compute eigenvalues: characteristic equation gives $λ (5 - λ) = 0 ⇒$ $\lambda_1=0,\; \lambda_2=5$
Corresponding eigenvectors (unnormalized): for $\lambda=0$ : $v_{1} = (\begin{array}{c} - 2 \\ 1 \end{array})$ for $\lambda=5$ : $v_2 = \begin{pmatrix}1\\[4pt]2\end{pmatrix}$
Normalize (both have norm $\sqrt{5}$
$u_{1} = \frac{1}{\sqrt{5}} (\begin{array}{c} - 2 \\ 1 \end{array}), u_{2} = \frac{1}{\sqrt{5}} (\begin{array}{c} 1 \\ 2 \end{array}) .$
Spectral decomposition: $A = Q \Lambda Q^{T}$ where

Q=\begin{pmatrix}u_1 & u_2\end{pmatrix} =\frac{1}{\sqrt{5}}\begin{pmatrix}-2 & 1\\[4pt]1 & 2\end{pmatrix},\quad \Lambda=\begin{pmatrix}0 & 0\\[4pt]0 & 5\end{pmatrix}.

Thus $A = Q\Lambda Q^{T}$ . (This expresses $A$ as sum of rank-1 components: $A=0\cdot u_1u_1^{T}+5\cdot u_2u_2^{T}$

Q11 (c). For the covariance matrix $A=\begin{pmatrix}4 & 2\\[4pt]2 & 3\end{pmatrix}$ will the eigenvectors be orthogonal? (CO2)

Yes. $A$ is symmetric, and symmetric (real) matrices always have eigenvectors that can be chosen orthogonal. (So the covariance matrix’s eigenvectors are orthogonal; this is why PCA uses them.)

Q12 (a). A dataset has 100 features. After applying PCA, only 10 features are retained. What does this imply about dimensionality reduction and why might it be useful in Machine Learning? (CO2)

Implication: The top 10 principal components capture the majority of the data variance (most information) while the remaining 90 components contribute little.
Why useful: reduces feature dimensionality → less storage and faster training, lowers risk of overfitting, removes noise/redundant features, and simplifies visualization and interpretation.

Q12 (b). A noisy image matrix is approximated using only the first two largest singular values in its SVD. Explain computationally why this reduces storage and how it helps in image compression. (CO2)

SVD form: for an $m\times n$ image $A$ , full SVD stores $U_{m\times m},\ \Sigma_{m\times n},\ V^T_{n\times n}$ .
k-rank approximation (k = 2): store only $U_{m\times 2},\ \Sigma_{2\times 2}$ (or its 2 singular values), and $V^T_{2\times n}$ .
Storage cost: roughly $m\cdot2 + 2 + 2\cdot n = 2(m+n)+1$ numbers instead of $m\cdot n$ . For large images this is a big reduction when $2 ≪ \min (m, n)$
Compression & denoising: keeping only largest singular values preserves the main image structure (high energy) and discards small singular values that often correspond to noise—so the image is compressed and less noisy.

Q12 (c). For the matrix $A=\begin{pmatrix}0 & 1\\[4pt]1 & 0\end{pmatrix}$ , find $A^T A$ and compute singular values of $A$ . (CO2)

$A^T = A$ (matrix is symmetric).
$A^T A = A A = \begin{pmatrix}0 & 1\\[4pt]1 & 0\end{pmatrix}\begin{pmatrix}0 & 1\\[4pt]1 & 0\end{pmatrix} = \begin{pmatrix}1 & 0\\[4pt]0 & 1\end{pmatrix} = I.$
Eigenvalues of $A^T A$ are $\{1,1\}$ .
Singular values = square roots of these eigenvalues → $σ_{1} = 1, σ_{2} = 1$

Q13 (a). A coin is tossed 3 times. List outcomes and probability distribution for at least two heads. (CO3)

Sample space (8 outcomes): {HHH, HHT, HTH, THH, HTT, THT, TTH, TTT}
Event $E$ : at least 2 heads = {HHH, HHT, HTH, THH}
- P(E) = 4/8 = 0.5
Probability distribution (X = no. of heads):

$P(X=2) = 3/8$ , $P(X=3) = 1/8$
So total $P (X \geq 2) = 3 / 8 + 1 / 8 = 0.5$

Q13 (b).A bag contains 3 red and 2 blue balls. One ball is drawn at random. Find the probability that it is
(i) red
(ii) blue.

Total balls = 3 red + 2 blue = 5

(i) Probability of red = $\dfrac{3}{5} = 0.6$
(ii) Probability of blue = $\dfrac{2}{5} = 0.4$

✅ Final Answer: P(Red) = 0.6, P(Blue) = 0.4

Q13(c).A factory has 3 machines A, B, C producing 30%, 50%, and 20% of items,respectively. Their defect rates are 2%, 3%, and 4%. If a randomly chosen
item is defective, find the probability it was produced by machine B.
Factory machines problem (Bayes’ theorem). (CO3)

Machine contributions: A = 30%, B = 50%, C = 20%

Defect rates: A = 2%, B = 3%, C = 4%

Step 1. Total defective probability:
$P(D) = (0.3)(0.02) + (0.5)(0.03) + (0.2)(0.04) = 0.006 + 0.015 + 0.008 = 0.029$
Step 2. Probability defective from B:
$P(B|D) = \frac{P(B)P(D|B)}{P(D)} = \frac{0.5 \times 0.03}{0.029} = \frac{0.015}{0.029} \approx 0.517$
Answer: Probability ≈ 51.7%.
Q14(a)The marks of 5 students are: 50, 60, 70, 80, 90. Compute the mean and variance. Use these parameters to interpret the data distribution. (CO3)

Answer (a)

Mean: $\bar{x}=\dfrac{50+60+70+80+90}{5}=\dfrac{350}{5}=70.$

Deviations from mean: $-20,-10,0,10,20.$ Squared deviations sum = $400 + 100 + 0 + 100 + 400 = 1000.$

Population variance: $σ^{2} = \frac{1000}{5} = 200.$

Sample variance (unbiased): $s^{2}=\dfrac{1000}{4}=250.$

Interpretation: Mean 70 is the center. Variance 200 (std ≈ 14.14) shows moderate spread around the mean. Data are symmetric and evenly spaced about 70 (no skew), so distribution is centered and fairly dispersed.

Q14(b).The data for $X=\{1, 2, 3, 4, 5\},\ Y=\{2, 4, 6, 8, 10\}$ . Compute the correlation coefficient. What does the result indicate?

Step 1: Recall formula

The Pearson correlation coefficient $r$ is:
$r = \frac{\text{Cov}(X,Y)}{\sigma_X \cdot \sigma_Y}$
where

$\text{Cov}(X,Y) = \frac{1}{n}\sum_{i=1}^n (x_i-\bar X)(y_i-\bar Y)$

$σ_{X} = \sqrt{\frac{1}{n} \sum (x_{i} - \overset{ˉ}{X})^{2}}$

$\sigma_Y = \sqrt{\frac{1}{n}\sum (y_i-\bar Y)^2}$

Step 2: Compute means

$\bar X = \frac{1+2+3+4+5}{5} = \frac{15}{5} = 3$ $\bar Y = \frac{2+4+6+8+10}{5} = \frac{30}{5} = 6$

Step 3: Compute covariance

$\text{Cov}(X,Y) = \frac{1}{5}\sum (x_i-\bar X)(y_i-\bar Y)$
Make a table:

$x_i$ $y_i$ $x_i-\bar X$ $y_i-\bar Y$ ( $(x_i-\bar X)(y_i-\bar Y)$
1 2 -2    -4        8
2 4 -1    -2        2
3 6 0    0        0
4 8 +1    +2        2
5 10 +2    +4        8

Sum = 20
$\text{Cov}(X,Y) = \frac{20}{5} = 4$

$x_i$	$y_i$	$x_i-\bar X$	$y_i-\bar Y$	( $(x_i-\bar X)(y_i-\bar Y)$
1	2	-2	-4	8
2	4	-1	-2	2
3	6	0	0	0
4	8	+1	+2	2
5	10	+2	+4	8

Step 4: Compute standard deviations

For $X$ :
$\sigma_X^2 = \frac{1}{5}\sum (x_i-\bar X)^2 = \frac{(-2)^2+(-1)^2+0^2+1^2+2^2}{5} = \frac{10}{5} = 2$ $\sigma_X = \sqrt{2}$
For $Y$ :
$\sigma_Y^2 = \frac{1}{5}\sum (y_i-\bar Y)^2 = \frac{(-4)^2+(-2)^2+0^2+2^2+4^2}{5} = \frac{40}{5} = 8$ $\sigma_Y = \sqrt{8} = 2\sqrt{2}$

Step 5: Correlation coefficient

$r = \frac{\text{Cov}(X,Y)}{\sigma_X \sigma_Y} = \frac{4}{\sqrt{2}\cdot 2\sqrt{2}} = \frac{4}{4} = 1$

Interpretation

$r=1$ means a perfect positive linear correlation.

As $X$ increases, $Y$ increases in exact proportion ( $Y = 2X$ ).

Graphically, the points lie exactly on a straight line with slope 2

Q14(c)A sample of size $n$ is drawn from a normal distribution $N(\mu,\sigma^2)$ with known variance. Show that the sample mean is the MLE of $\mu$ . (CO3)

Answer (c)

Likelihood: $L(\mu)=\prod_{i=1}^n \frac{1}{\sqrt{2\pi\sigma^2}}\exp\!\big(-\frac{(x_i-\mu)^2}{2\sigma^2}\big).$

Log-likelihood: $\ell(\mu) = -\frac{n}{2}\ln(2\pi\sigma^2) - \frac{1}{2\sigma^2}\sum_{i=1}^n (x_i-\mu)^2.$

Derivative w.r.t. $\mu$ : $\dfrac{d\ell}{d\mu} = -\frac{1}{2\sigma^2}\cdot 2\sum ( \mu - x_i) = -\frac{1}{\sigma^2}\big(n\mu - \sum x_i\big).$

Set to zero → $n\mu - \sum x_i =0 \Rightarrow \hat\mu = \dfrac{1}{n}\sum_{i=1}^n x_i$

So the MLE of $\mu$ is the sample mean $\bar{x}$

Q15 (a)A bank wants to predict whether a loan applicant will default (Yes/No). The dataset contains: Age, Income, Loan Amount, Previous Default (Yes/No).

(i) Write the steps to build a classification model using machine learning.

Steps in building the model:

Data Collection – Gather applicant data (Age, Income, Loan Amount, Previous Default, and Default label).

Data Preprocessing – Handle missing values, encode categorical variables (e.g., Yes/No → 0/1), normalize features if needed.

Feature Selection/Engineering – Use relevant predictors like income-to-loan ratio, past default history.

Train-Test Split – Divide dataset into training set (to build model) and test set (to evaluate performance).

Model Training – Train a classification algorithm on the training data.

Prediction & Evaluation – Use the model on test data and evaluate with metrics like accuracy, precision, recall, F1-score.

(ii) Suggest which algorithm (Logistic Regression / Decision Tree) could be used and why.

Logistic Regression is simple, interpretable, and works well if the relationship between features and default is mostly linear.

Decision Tree is better when the dataset has non-linear patterns or mixed categorical + numerical variables, and it provides clear decision rules.

In practice, Decision Tree may be more useful here since financial default often depends on non-linear conditions (e.g., low income + high loan amount + previous default).

15 (b) Discuss any two applications of machine learning in data science

Healthcare
Machine learning is widely used to improve patient care and medical research.

By analyzing patient records (age, blood pressure, glucose level, medical history, etc.), classification models can predict whether a person is at risk of diseases such as diabetes, heart disease, or cancer.
This allows early diagnosis and preventive care before the disease becomes severe.
ML also assists in medical imaging (e.g., detecting tumors in X-rays/MRI scans) and in personalized treatment plans by learning from large datasets of patient outcomes.
Overall, ML in healthcare leads to better decision-making, reduced costs, and improved patient survival rates.

E-commerce
Machine learning powers most modern e-commerce platforms.

Recommender systems suggest products to users by studying purchase history, browsing behavior, and ratings (e.g., “Customers who bought this also bought…”).
ML helps in dynamic pricing, where prices are adjusted in real time based on demand, user location, and buying patterns.
It is also used in fraud detection by identifying unusual transactions, and in customer segmentation, allowing targeted marketing campaigns.
These applications increase customer satisfaction, sales, and overall business efficiency.

Q16 (a) A university wants to analyze students’ performance data – attendance, assignment scores, exam scores. Describe how the data science process would be applied here.

Step 1: Data Collection

Gather data from attendance records, assignment submissions, and exam scores for all students.
Include relevant attributes such as student ID, course, and semester.

Step 2: Data Preprocessing

Handle missing values (e.g., missing attendance or exam scores).
Normalize/standardize scores if needed to bring them to a common scale.
Encode categorical data (e.g., grade levels or pass/fail) if required.

Step 3: Data Exploration & Visualization

Use summary statistics (mean, variance) to understand trends.
Plot histograms, boxplots, or scatterplots to detect patterns or outliers.

Step 4: Feature Engineering

Derive new features like average score, assignment-to-exam ratio, or attendance percentage.

Step 5: Modeling & Analysis

Apply predictive models (e.g., regression to predict exam scores) or classification models (e.g., pass/fail prediction).
Use clustering to group students by performance levels.

Step 6: Evaluation

Evaluate models using metrics like accuracy, RMSE, or F1-score.
Validate findings with cross-validation or on a test dataset.

Step 7: Interpretation & Decision Making

Identify factors affecting performance (e.g., low attendance → lower scores).
Provide insights to improve teaching strategies, student engagement, and interventions for at-risk students.

Summary: The data science process turns raw student data into actionable insights for academic improvement through systematic collection, cleaning, modeling, and interpretation.

Q16(b).A bank receives the following types of data daily: 500,000 transaction records (numerical), 50,000 customer feedback forms (text), ATM surveillance videos (image/video).

(a) Identify the Big Data characteristics present.
(b) Briefly explain why traditional methods fail and how data science techniques overcome this.

(a) Big Data Characteristics (3Vs)

Characteristic	Explanation in this case
Volume	Huge amounts of data: 500,000 transactions, 50,000 feedback forms, video streams.
Velocity	Data arrives continuously, especially transactions and videos, requiring fast processing.
Variety	Data is multi-format: numerical (transactions), text (feedback), images/videos (ATM cameras).
(Optional) Veracity	Data may contain errors/noise, e.g., incomplete forms or unclear video frames.
(Optional) Value	Extracting insights can improve banking services and fraud detection.

(b) Why traditional methods fail & how data science helps

Traditional methods fail:
- Relational databases and spreadsheets cannot handle massive volume efficiently.
- Structured-only methods cannot process unstructured data (text, images, video).
- Real-time analysis of high velocity data is not feasible with traditional tools.
Data science techniques overcome this:
- Big Data frameworks (Hadoop, Spark) handle large-scale storage and processing.
- Machine learning and NLP analyze text feedback automatically.
- Computer vision algorithms process surveillance videos for security/fraud detection.
- Predictive analytics enables real-time decision-making and insights from diverse data types.

Summary: The bank’s daily data exemplifies Big Data (Volume, Velocity, Variety), and data science provides scalable, automated methods to extract actionable insights where traditional methods fail.