CS Senior Work & Experiments

CS Senior Work & Experiments

ML β€’ Algorithms β€’ Prototypes


Overview

A collection of capstone work, course projects, and experimental builds completed during the Computer Science program. Focus areas include data pipelines, classical ML baselines vs. shallow neural nets, evaluation methodology, and deployment/readability.

Reproducibility Clean baselines Metrics literacy Readable code

Tech Stack

  • Python (pandas, scikit-learn, numpy, matplotlib)
  • Jupyter / Colab for EDA & experiments
  • FastAPI (simple demos) β€’ Docker (optional)
  • GitHub for versioning & reports

Capstone Snapshot

Problem

Supervised classification task with imbalanced classes. Goal: build a reliable baseline, then compare tuned classical models to a shallow MLP while avoiding overfitting.

  • Pipeline: split β†’ scale/encode β†’ model β†’ cross-validated metrics
  • Report: accuracy + macro F1 + confusion matrix
  • Risk control: stratified CV, fixed random seeds, leakage checks

Approach

  • Baselines: Logistic Regression, k-NN, Decision Tree
  • Tuned: Random Forest, Gradient Boosting
  • Neural: shallow MLP with early stopping
  • Model selection by CV mean Β± std (reduces variance vs single split)
Pipeline diagram β€” placeholder
Confusion matrix β€” placeholder
ROC / PR curves β€” placeholder

Mini Projects

  • Data Pipeline: CSV β†’ clean β†’ features β†’ train/test split
  • Eval Toolkit: reusable functions for CV, plots, and reports
  • FastAPI Demo: lightweight predict endpoint with input schema
Reproducible CV (template)
from sklearn.model_selection import StratifiedKFold, cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import make_scorer, f1_score
import numpy as np

X, y = ...  # your features/labels
seed = 42
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=seed)

pipe = Pipeline([
    ('scale', StandardScaler(with_mean=False) if X.dtype.kind=='O' else StandardScaler()),
    ('clf', LogisticRegression(max_iter=1000, n_jobs=None, random_state=seed))
])

scores = cross_val_score(pipe, X, y, cv=cv,
                         scoring=make_scorer(f1_score, average='macro'))
print(f"CV macro F1: {scores.mean():.3f} Β± {scores.std():.3f}")
Version Log
v0.1 β€” Baselines & CV scaffold
v0.2 β€” Tune RF/GB; add MLP with early stopping
v0.3 β€” Plotting utilities (confusion matrix, ROC/PR)
v0.4 β€” FastAPI demo + schema validation
v0.5 β€” Final report & slide polish
Scroll to Top