Introduction:
The dataset comprises wine quality information, encompassing diverse chemical attributes like acidity, sugar content, pH level, and alcohol percentage. Additionally, it includes a quality rating ranging from 3 to 9, with higher values indicating superior quality, and specifies the wine color (red or white). This project aims to employ data mining methodologies, including multiple regression models and classification tree methods, to predict wine quality.
Implemented in Python, utilizing libraries such as sklearn, numpy, pandas, and matplotlib, the project will follow these steps:
- Data Preprocessing: Conducting data cleaning and preprocessing to eliminate missing values, outliers, and anomalies.
- Determining Data Mining Task: Employing classification techniques.
- Determining Data Mining Technique: Utilizing a multiple linear regression model.
- Model Evaluation: Assessing the chosen model’s performance using metrics such as mean squared error (MSE) and mean absolute error (MAE).
- Deployment of Optimal Technique/Model: Applying the selected model to new data records.