ML QP
Machine Learning Lab External
1. a) Create a small dataset with at
least 10 records and 4 columns:
·
Age (numeric)
·
Height (numeric)
·
Weight (numeric)
·
City (categorical,
with possible values: "City A", "City B", "City
C")
b) Ensure that some entries in the Age and Weight columns contain missing values.
·
Handle missing
data by:
o
Replacing
missing values in the Age column with
the average age.
o
Replacing
missing values in the Weight column with
the median weight.
c) Normalize the Age, Height, and Weight columns using min-max scaling to bring
all values between 0 and 1.
d) Display the original dataset, the dataset
with missing values handled, and the
normalized dataset.
e) Create a small dataset with 10
records and 3 numerical columns:
·
Height
(numeric)
·
Weight
(numeric)
·
Age (numeric)
f) Draw
a box plot for the Height, Weight, and Age columns to visualize them
distributions.
·
Customize the
box plot to:
·
Use different
colors for each box plot.
·
Add labels to
the x-axis and y-axis with appropriate titles.
·
Add a title to
the plot as "Distribution of Height, Weight, and Age".
g) Display the box plot.
2. Write a Python script that performs
the following tasks:
a) Create a small dataset with 15
records and 2 numerical columns:
·
Age (numeric)
·
Income (numeric)
b) Draw a scatter plot to visualize the
relationship between Age and Income.
c) Draw a histogram to visualize the
distribution of Income.
d) Customize the plots by:
·
Adding
appropriate titles to the scatter plot and histogram.
·
Labeling the
x-axis and y-axis in both plots.
·
Using different
colors for the scatter plot and histogram for better visualization.
e) Display both plots.
f)
Create a small dataset with 10 records and 3 numerical columns:
·
Height
(numeric)
·
Weight
(numeric)
·
Age
(numeric)
g)
Apply Scaling to the dataset using:
·
Standardization
(Z-score scaling) for each numerical column.
·
Min-Max
Scaling to scale each column between 0 and 1.
h)
Display the original dataset, the standardized dataset, and the min-max scaled
dataset.
3. Write a
Python script that performs the following tasks:
a) Create a small dataset with 10 records
and 4 numerical columns:
·
Feature1
(numeric)
·
Feature2
(numeric)
·
Feature3
(numeric)
·
Feature4
(numeric)
b) Apply PCA (Principal Component Analysis)
to reduce the dataset to 2 principal
components.
c) Display the original dataset, the
transformed dataset with 2 principal components, and
the explained variance ratio for the
principal components.
d) Comment your code to explain each step
in the PCA process, including the importance
of PCA in reducing dimensionality.
4)
a) Create a small dataset with 5 records and 3 numerical columns:
- Feature1 (numeric)
- Feature2 (numeric)
- Feature3 (numeric)
b) Apply Singular
Value Decomposition (SVD) to decompose the dataset matrix into three
matrices: U, S, and V^T.
c) Reconstruct the original matrix using the decomposed matrices and
display the result.
d) Explain the meaning of each matrix in
the SVD decomposition and how they are used
in dimensionality reduction.
e) Apply SVD on Image.
5.
Write a Python script that performs the
following tasks:
a) Create a small dataset with 10 records, 2 features, and a target label. The
dataset should contain:
- Feature1 (numeric)
- Feature2 (numeric)
- Class Label (categorical), with values 0 and 1 to represent
two classes
b)
Apply Linear Discriminant Analysis (LDA):
·
Use LDA to
reduce the dataset to a single linear discriminant that best separates the two
classes.
c) Display the original dataset, the transformed dataset after LDA, and the coefficients
for each feature in the linear discriminant.
d) Explain each
step in the LDA process, including how
LDA works for dimensionality reduction and its importance in classification.
6) a) Create a
small dataset with 10 records, representing a relationship between two
variables:
- X
(independent variable): Represents study hours.
- Y (dependent
variable): Represents scores.
b) Fit a Linear Regression model to predict Y based on X.
c) Display the following outputs:
- The coefficients (slope and
intercept) of the linear regression model.
- Predictions made by the model for
each value in X.
- A scatter plot of the data points
with the regression line.
d) Explain
each step of the linear
regression process, including how the model is fitted to the data and how
predictions are made.
e) Create a small
dataset with 10
records that represents a relationship between three variables:
- X1
(independent variable): Represents advertising spending (in thousands).
- X2
(independent variable): Represents number of salespersons.
- Y
(dependent variable): Represents sales (in thousands).
f) Fit a Regularized Regression
model to predict Y based on X1 and X2.
g) Display the following outputs:
- The coefficients for each feature in the Ridge
regression model.
- The predictions made by the model
for each record in the dataset.
- A visualization comparing the
actual and predicted values.
7.
Write a Python script that performs the
following tasks:
a) Create a small dataset with 10 records that represents a relationship between
two variables:
o
X (independent variable): Represents the years of experience.
o
Y (dependent variable): Represents the salary (in
thousands).
b) Fit a Polynomial Regression model to predict Y based on X with a degree of 2.
c) Display the following outputs:
o
The transformed
polynomial features.
o
The
coefficients of the polynomial regression model.
o
The predictions
made by the model for each value in X.
o
A scatter plot
of the data points with the polynomial regression curve.
d) Explain each step of the polynomial regression process, including how the
data is transformed and how predictions are made using the polynomial terms.
e) Create a small dataset with 10 records that represents the likelihood of
passing an exam based on study hours:
o
Hours (independent variable): Represents the number of hours a
student studied.
o
Pass (dependent variable): A binary variable where 1 indicates the
student passed the exam and 0 indicates the student did not pass.
f) Fit a Logistic Regression model to predict whether a student
passes or fails the exam
based on study
hours.
g) Display the following outputs:
o
The
coefficients and intercept of the logistic regression model.
o
The probability
predictions and final predictions (0 or 1) are made by the model for each
record in the dataset.
o
A scatter plot
showing study hours against the probability of passing, with a decision
boundary.
h) Explain each step of the logistic regression process,
including how the logistic function is
used to make predictions and interpret the coefficients.
8.
Write a Python script that performs the following tasks:
a) Create a small dataset with 10 records that represents two classes of points in
a 2D space:
o
X1 (feature 1): Represents the x-coordinate of the point.
o
X2 (feature 2): Represents the y-coordinate of the point.
o
Label (target variable): A binary label, where 0 represents
Class 0 and 1 represents Class 1.
b) Fit a Support Vector Machine (SVM) classifier to predict the class label based on X1 and X2.
c) Display the following outputs:
o
The support
vectors identified by the SVM.
o
The decision
boundary plot along with the data points, showing how the SVM separates the
classes.
d) Explain each step of the SVM process, including the significance of
support vectors and how SVM creates the decision boundary.
e) Create a small dataset with 10 records that represents two classes of points in
a 2D space:
o
X1 (feature 1): Represents the x-coordinate of the point.
o
X2 (feature 2): Represents the y-coordinate of the point.
o
Label (target variable): A binary label, where 0 represents
Class 0 and 1 represents Class 1.
f) Fit a K-Nearest Neighbors (KNN) classifier with k=3 to predict the class label based on X1 and X2.
g) Display the following outputs:
o
The predictions
made by the model for each point in the dataset.
o
A plot showing
the data points colored by their class, with a decision boundary illustrating
the KNN classification.
h) Explain each step of the KNN process, including how the KNN algorithm
classifies a new data point based on the majority vote of its nearest
neighbors.
9. Write a Python
script that performs the following tasks:
i) Create a small dataset with 10 records that
represents two classes of points in a 2D space:
o
X1
(feature 1): Represents the x-coordinate of the point.
o
X2
(feature 2): Represents the y-coordinate of the point.
o
Label
(target variable): A binary label, where 0 represents Class 0 and 1 represents
Class 1.
j) Fit a Random Forest (RF)
classifier to
predict the class label based on X1 and X2.
k) Display the following outputs:
The
decision boundary plot along with the data points, showing how the RF separates
the classes.
l) Explain each step of the RF process.
m) Create a small dataset with 10 records that
represents two classes of points in a 2D space:
o
X1
(feature 1): Represents the x-coordinate of the point.
o
X2
(feature 2): Represents the y-coordinate of the point.
o
Label
(target variable): A binary label, where 0 represents Class 0 and 1 represents
Class 1.
n) Fit a K-Nearest Neighbors (KNN)
classifier (Label encoding and Scaling) with k=3 to predict the class label based
on X1 and X2.
o) Display the following outputs:
o
The
predictions made by the model for each point in the dataset.
o
A
plot showing the data points colored by their class, with a decision boundary
illustrating the KNN classification.
p) Explain each step of the KNN process, including
how the KNN algorithm classifies a new data point based on the majority vote of
its nearest neighbors.
10.
Write a Python script that performs the following tasks:
a) Create a small dataset with 10 records that represents two classes of points in
a 2D space:
a. X1 (feature 1):
Represents the x-coordinate of the point.
b.
X2 (feature 2): Represents the y-coordinate of the point.
c.
Label (target variable): A binary label, where 0 represents
Class 0 and 1 represents Class 1.
b) Fit an AdaBoost classifier to predict the class label based on X1 and X2.
c) Display the following outputs:
o
The accuracy of
the model on the dataset.
o
The importance
of each feature as determined by the AdaBoost model.
d) Visualize the results:
o
Plot the data
points colored by their class.
o
Show the
decision boundary created by the AdaBoost model.
e) Create a small dataset with 10 records that represents two classes of points in
a 2D space:
o
X1 (feature 1): Represents the x-coordinate of the point.
o
X2 (feature 2): Represents the y-coordinate of the point.
o
Label (target variable): A binary label, where 0 represents
Class 0 and 1 represents Class 1.
f) Fit an XGBoost classifier to predict the class label based on X1 and X2.
g) Display the following outputs:
o
The accuracy of
the model on the dataset.
o
The importance
of each feature as determined by the XGBoost model.
h) Visualize the results:
·
Plot the data
points colored by their class.
·
Show the
decision boundary created by the XGBoost model.
Comments
Post a Comment