Description#
Video Demonstration#
Description#
Created a normalized database (3NF) and used SQL join statements to fetch data into Pandas DataFrames
Explored data using yprofile and correlation matrix, identifying necessary data cleanup tasks
Performed stratified train/test split based on data analysis
Developed preprocessing pipelines including StandardScaler, MinMaxScaler, LogTransformation, and OneHotEncoding
Experimented with multiple classifiers: LogisticRegression, RidgeClassifier, RandomForestClassifier, and XGBClassifier
Conducted feature engineering, attribute combination, and selection using Correlation Threshold, Feature Importance, and Variance Threshold
Applied PCA for dimensionality reduction, creating a scree plot to select optimal components
Designed and executed two custom experiments to further optimize model performance
Logged all experiment results in MLFlow on DagsHub, creating F1-score plots for model comparison
Saved the final model using joblib and created a FastAPI application to serve it
Containerized the FastAPI application using Docker, pushed to Docker Hub, and deployed to a cloud platform
Developed a Streamlit app for real-time interaction with the deployed model