CS Senior Work & Experiments
ML β’ Algorithms β’ Prototypes
Overview
A collection of capstone work, course projects, and experimental builds completed during the Computer Science program. Focus areas include data pipelines, classical ML baselines vs. shallow neural nets, evaluation methodology, and deployment/readability.
Reproducibility
Clean baselines
Metrics literacy
Readable code
Tech Stack
- Python (pandas, scikit-learn, numpy, matplotlib)
- Jupyter / Colab for EDA & experiments
- FastAPI (simple demos) β’ Docker (optional)
- GitHub for versioning & reports
Capstone Snapshot
Problem
Supervised classification task with imbalanced classes. Goal: build a reliable baseline, then compare tuned classical models to a shallow MLP while avoiding overfitting.
- Pipeline: split β scale/encode β model β cross-validated metrics
- Report: accuracy + macro F1 + confusion matrix
- Risk control: stratified CV, fixed random seeds, leakage checks
Approach
- Baselines: Logistic Regression, k-NN, Decision Tree
- Tuned: Random Forest, Gradient Boosting
- Neural: shallow MLP with early stopping
- Model selection by CV mean Β± std (reduces variance vs single split)
Pipeline diagram β placeholder
Confusion matrix β placeholder
ROC / PR curves β placeholder
Mini Projects
- Data Pipeline: CSV β clean β features β train/test split
- Eval Toolkit: reusable functions for CV, plots, and reports
- FastAPI Demo: lightweight predict endpoint with input schema
Artifacts
Replace links above with your actual files when uploaded.
Reproducible CV (template)
from sklearn.model_selection import StratifiedKFold, cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import make_scorer, f1_score
import numpy as np
X, y = ... # your features/labels
seed = 42
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=seed)
pipe = Pipeline([
('scale', StandardScaler(with_mean=False) if X.dtype.kind=='O' else StandardScaler()),
('clf', LogisticRegression(max_iter=1000, n_jobs=None, random_state=seed))
])
scores = cross_val_score(pipe, X, y, cv=cv,
scoring=make_scorer(f1_score, average='macro'))
print(f"CV macro F1: {scores.mean():.3f} Β± {scores.std():.3f}")
Version Log
v0.1 β Baselines & CV scaffold v0.2 β Tune RF/GB; add MLP with early stopping v0.3 β Plotting utilities (confusion matrix, ROC/PR) v0.4 β FastAPI demo + schema validation v0.5 β Final report & slide polish