Description#

Video Demonstration#

Description#

  • Created a normalized database (3NF) and used SQL join statements to fetch data into Pandas DataFrames

  • Explored data using yprofile and correlation matrix, identifying necessary data cleanup tasks

  • Performed stratified train/test split based on data analysis

  • Developed preprocessing pipelines including StandardScaler, MinMaxScaler, LogTransformation, and OneHotEncoding

  • Experimented with multiple classifiers: LogisticRegression, RidgeClassifier, RandomForestClassifier, and XGBClassifier

  • Conducted feature engineering, attribute combination, and selection using Correlation Threshold, Feature Importance, and Variance Threshold

  • Applied PCA for dimensionality reduction, creating a scree plot to select optimal components

  • Designed and executed two custom experiments to further optimize model performance

  • Logged all experiment results in MLFlow on DagsHub, creating F1-score plots for model comparison

  • Saved the final model using joblib and created a FastAPI application to serve it

  • Containerized the FastAPI application using Docker, pushed to Docker Hub, and deployed to a cloud platform

  • Developed a Streamlit app for real-time interaction with the deployed model