ML QP

Machine Learning Lab External

1. a) Create a small dataset with at least 10 records and 4 columns:

· Age (numeric)

· Height (numeric)

· Weight (numeric)

· City (categorical, with possible values: "City A", "City B", "City C")

b) Ensure that some entries in the Age and Weight columns contain missing values.

· Handle missing data by:

o Replacing missing values in the Age column with the average age.

o Replacing missing values in the Weight column with the median weight.

c) Normalize the Age, Height, and Weight columns using min-max scaling to bring

all values between 0 and 1.

d) Display the original dataset, the dataset with missing values handled, and the

normalized dataset.

e) Create a small dataset with 10 records and 3 numerical columns:

· Height (numeric)

· Weight (numeric)

· Age (numeric)

f) Draw a box plot for the Height, Weight, and Age columns to visualize them

distributions.

· Customize the box plot to:

· Use different colors for each box plot.

· Add labels to the x-axis and y-axis with appropriate titles.

· Add a title to the plot as "Distribution of Height, Weight, and Age".

g) Display the box plot.

2. Write a Python script that performs the following tasks:

a) Create a small dataset with 15 records and 2 numerical columns:

· Age (numeric)

· Income (numeric)

b) Draw a scatter plot to visualize the relationship between Age and Income.

c) Draw a histogram to visualize the distribution of Income.

d) Customize the plots by:

· Adding appropriate titles to the scatter plot and histogram.

· Labeling the x-axis and y-axis in both plots.

· Using different colors for the scatter plot and histogram for better visualization.

e) Display both plots.

f) Create a small dataset with 10 records and 3 numerical columns:

· Height (numeric)

· Weight (numeric)

· Age (numeric)

g) Apply Scaling to the dataset using:

· Standardization (Z-score scaling) for each numerical column.

· Min-Max Scaling to scale each column between 0 and 1.

h) Display the original dataset, the standardized dataset, and the min-max scaled dataset.

3. Write a Python script that performs the following tasks:

a) Create a small dataset with 10 records and 4 numerical columns:

· Feature1 (numeric)

· Feature2 (numeric)

· Feature3 (numeric)

· Feature4 (numeric)

b) Apply PCA (Principal Component Analysis) to reduce the dataset to 2 principal

components.

c) Display the original dataset, the transformed dataset with 2 principal components, and

the explained variance ratio for the principal components.

d) Comment your code to explain each step in the PCA process, including the importance

of PCA in reducing dimensionality.

4) a) Create a small dataset with 5 records and 3 numerical columns:

Feature1 (numeric)
Feature2 (numeric)
Feature3 (numeric)

b) Apply Singular Value Decomposition (SVD) to decompose the dataset matrix into three

matrices: U, S, and V^T.

c) Reconstruct the original matrix using the decomposed matrices and display the result.

d) Explain the meaning of each matrix in the SVD decomposition and how they are used

in dimensionality reduction.

e) Apply SVD on Image.

5. Write a Python script that performs the following tasks:

a) Create a small dataset with 10 records, 2 features, and a target label. The dataset should contain:

Feature1 (numeric)
Feature2 (numeric)
Class Label (categorical), with values 0 and 1 to represent two classes

b) Apply Linear Discriminant Analysis (LDA):

· Use LDA to reduce the dataset to a single linear discriminant that best separates the two classes.

c) Display the original dataset, the transformed dataset after LDA, and the coefficients for each feature in the linear discriminant.

d) Explain each step in the LDA process, including how LDA works for dimensionality reduction and its importance in classification.

6) a) Create a small dataset with 10 records, representing a relationship between two variables:

X (independent variable): Represents study hours.
Y (dependent variable): Represents scores.

b) Fit a Linear Regression model to predict Y based on X.

c) Display the following outputs:

The coefficients (slope and intercept) of the linear regression model.
Predictions made by the model for each value in X.
A scatter plot of the data points with the regression line.

d) Explain each step of the linear regression process, including how the model is fitted to the data and how predictions are made.

e) Create a small dataset with 10 records that represents a relationship between three variables:

X1 (independent variable): Represents advertising spending (in thousands).
X2 (independent variable): Represents number of salespersons.
Y (dependent variable): Represents sales (in thousands).

f) Fit a Regularized Regression model to predict Y based on X1 and X2.

g) Display the following outputs:

The coefficients for each feature in the Ridge regression model.
The predictions made by the model for each record in the dataset.
A visualization comparing the actual and predicted values.

7. Write a Python script that performs the following tasks:

a) Create a small dataset with 10 records that represents a relationship between two variables:

o X (independent variable): Represents the years of experience.

o Y (dependent variable): Represents the salary (in thousands).

b) Fit a Polynomial Regression model to predict Y based on X with a degree of 2.

c) Display the following outputs:

o The transformed polynomial features.

o The coefficients of the polynomial regression model.

o The predictions made by the model for each value in X.

o A scatter plot of the data points with the polynomial regression curve.

d) Explain each step of the polynomial regression process, including how the data is transformed and how predictions are made using the polynomial terms.

e) Create a small dataset with 10 records that represents the likelihood of passing an exam based on study hours:

o Hours (independent variable): Represents the number of hours a student studied.

o Pass (dependent variable): A binary variable where 1 indicates the student passed the exam and 0 indicates the student did not pass.

f) Fit a Logistic Regression model to predict whether a student passes or fails the exam

based on study hours.

g) Display the following outputs:

o The coefficients and intercept of the logistic regression model.

o The probability predictions and final predictions (0 or 1) are made by the model for each record in the dataset.

o A scatter plot showing study hours against the probability of passing, with a decision boundary.

h) Explain each step of the logistic regression process, including how the logistic function is

used to make predictions and interpret the coefficients.

8. Write a Python script that performs the following tasks:

a) Create a small dataset with 10 records that represents two classes of points in a 2D space:

o X1 (feature 1): Represents the x-coordinate of the point.

o X2 (feature 2): Represents the y-coordinate of the point.

o Label (target variable): A binary label, where 0 represents Class 0 and 1 represents Class 1.

b) Fit a Support Vector Machine (SVM) classifier to predict the class label based on X1 and X2.

c) Display the following outputs:

o The support vectors identified by the SVM.

o The decision boundary plot along with the data points, showing how the SVM separates the classes.

d) Explain each step of the SVM process, including the significance of support vectors and how SVM creates the decision boundary.

e) Create a small dataset with 10 records that represents two classes of points in a 2D space:

o X1 (feature 1): Represents the x-coordinate of the point.

o X2 (feature 2): Represents the y-coordinate of the point.

o Label (target variable): A binary label, where 0 represents Class 0 and 1 represents Class 1.

f) Fit a K-Nearest Neighbors (KNN) classifier with k=3 to predict the class label based on X1 and X2.

g) Display the following outputs:

o The predictions made by the model for each point in the dataset.

o A plot showing the data points colored by their class, with a decision boundary illustrating the KNN classification.

h) Explain each step of the KNN process, including how the KNN algorithm classifies a new data point based on the majority vote of its nearest neighbors.

9. Write a Python script that performs the following tasks:

i) Create a small dataset with 10 records that represents two classes of points in a 2D space:

o X1 (feature 1): Represents the x-coordinate of the point.

o X2 (feature 2): Represents the y-coordinate of the point.

o Label (target variable): A binary label, where 0 represents Class 0 and 1 represents Class 1.

j) Fit a Random Forest (RF) classifier to predict the class label based on X1 and X2.

k) Display the following outputs:

The decision boundary plot along with the data points, showing how the RF separates the classes.

l) Explain each step of the RF process.

m) Create a small dataset with 10 records that represents two classes of points in a 2D space:

o X1 (feature 1): Represents the x-coordinate of the point.

o X2 (feature 2): Represents the y-coordinate of the point.

o Label (target variable): A binary label, where 0 represents Class 0 and 1 represents Class 1.

n) Fit a K-Nearest Neighbors (KNN) classifier (Label encoding and Scaling) with k=3 to predict the class label based on X1 and X2.

o) Display the following outputs:

o The predictions made by the model for each point in the dataset.

o A plot showing the data points colored by their class, with a decision boundary illustrating the KNN classification.

p) Explain each step of the KNN process, including how the KNN algorithm classifies a new data point based on the majority vote of its nearest neighbors.

10. Write a Python script that performs the following tasks:

a) Create a small dataset with 10 records that represents two classes of points in a 2D space:

a. X1 (feature 1): Represents the x-coordinate of the point.

b. X2 (feature 2): Represents the y-coordinate of the point.

c. Label (target variable): A binary label, where 0 represents Class 0 and 1 represents Class 1.

b) Fit an AdaBoost classifier to predict the class label based on X1 and X2.

c) Display the following outputs:

o The accuracy of the model on the dataset.

o The importance of each feature as determined by the AdaBoost model.

d) Visualize the results:

o Plot the data points colored by their class.

o Show the decision boundary created by the AdaBoost model.

e) Create a small dataset with 10 records that represents two classes of points in a 2D space:

o X1 (feature 1): Represents the x-coordinate of the point.

o X2 (feature 2): Represents the y-coordinate of the point.

o Label (target variable): A binary label, where 0 represents Class 0 and 1 represents Class 1.

f) Fit an XGBoost classifier to predict the class label based on X1 and X2.

g) Display the following outputs:

o The accuracy of the model on the dataset.

o The importance of each feature as determined by the XGBoost model.

h) Visualize the results:

· Plot the data points colored by their class.

· Show the decision boundary created by the XGBoost model.

Search This Blog

Pappula Sarala

ML QP

Comments

Post a Comment

Popular posts from this blog

Data Structures Module_V

Data Structures Module-IV

DS - Module - II - DLL