Comp-activ dataset - Problem
1
Context
The comp-activ database comprises activity measures of computer systems. Data was gathered from a Sun Sparcstation 20/712 with 128 Mbytes of memory, operating in a multi-user university department. Users engaged in diverse tasks, such as internet access, file editing, and CPU-intensive programs.
Being an aspiring data scientist, you aim to establish a linear equation for predicting 'usr' (the percentage of time CPUs operate in user mode).
Your goal is to analyze various system attributes to understand their influence on the system's 'usr' mode. Problem 1- Define the problem and perform exploratory Data Analysis
- Problem definition
- Check shape, Data types, statistical summary
- Univariate analysis
- Multivariate analysis
- Use appropriate visualizations to identify the patterns and insights
- Key meaningful observations on individual variables and the relationship between variables
8
Problem 1- Data Pre-processing
Prepare the data for modeling:
- Missing Value Treatment (if needed)
- Outlier Detection
Exploratory Data Analysis (EDA): This involves understanding the data by checking its shape, data types, and statistical summary. Univariate and multivariate analyses are performed to identify patterns and insights. This can be done using various visualization tools.
Data Pre-processing: This step involves preparing the data for modeling. It includes treating missing values, detecting and treating outliers, feature engineering, encoding the data, and splitting the data into training and testing sets.