18IT040_Practical_Exam_Work
Task-1:
Dataset Description using Orange tool.
What is need to be done to improve the accuracy of classification result of the given dataset? Get the maximum classification accuracy possible by performing following methods.
→Pre-processing
o Encoding
o Normalization
o Missing value handling
o Feature Selection
Compare your accuracy with and without applying pre-processing steps. Perform the Classification and visualize accuracy before and after preprocessing in Orange/Python.
About the dataset
To examine the impact of potentially influential factors on the disease spread of COVID-19 in the United States, this is dataset contains the number of confirmed cases and deaths along with other factors relevant to pandemic in each country and for each day since the beginning of the outbreak.
Using Orange tools, Data Info widget, we can observe the information about the loaded dataset, it shows the dataset name, features, description, row count, column count, and targets.
Dataset: https://www.kaggle.com/gpreda/coronavirus-2019ncov
Our dataset has 1241952 rows with 8 columns. Confirmed cases and deaths are the target variables. The data can be viewed in tabular form using the Data Table widget.
Before Pre-processing:
The Data Sampler widget implements several data sampling methods. The Test and Score widget compares each model’s target variable results to the actual data and tells us how good the model is. We have used KNN as a learner.
Observed the evaluation results of the model by testing the train data.
Now performed Pre-processing —
o Encoding
o Normalization
o Missing value handling
o Feature Selection
Observed the evaluation results, accuracy after pre-processing:
Task-2:
Generate the Dashboard of the preprocessed dataset from task-1.
Find the Maximum data insights by plotting Bar chart, Boxplot, Pie Plot, Stack Plot using PowerBI dashboard visualization.
Used Power BI to get interactive visualizations and business intelligence capabilities. I have created the dashboard for the preprocessed data we generated before. The report includes a pie chart, donut chart, bar chart, and data in tabular form.