CS Senior Work & Experiments

ML • Algorithms • Prototypes

Overview

A collection of capstone work, course projects, and experimental builds completed during the Computer Science program. Focus areas include data pipelines, classical ML baselines vs. shallow neural nets, evaluation methodology, and deployment/readability.

Reproducibility Clean baselines Metrics literacy Readable code

Tech Stack

Python (pandas, scikit-learn, numpy, matplotlib)
Jupyter / Colab for EDA & experiments
FastAPI (simple demos) • Docker (optional)
GitHub for versioning & reports

Capstone Snapshot

Problem

Supervised classification task with imbalanced classes. Goal: build a reliable baseline, then compare tuned classical models to a shallow MLP while avoiding overfitting.

Pipeline: split → scale/encode → model → cross-validated metrics
Report: accuracy + macro F1 + confusion matrix
Risk control: stratified CV, fixed random seeds, leakage checks

Approach

Baselines: Logistic Regression, k-NN, Decision Tree
Tuned: Random Forest, Gradient Boosting
Neural: shallow MLP with early stopping
Model selection by CV mean ± std (reduces variance vs single split)

Pipeline diagram — placeholder

Confusion matrix — placeholder

ROC / PR curves — placeholder

Mini Projects

Data Pipeline: CSV → clean → features → train/test split
Eval Toolkit: reusable functions for CV, plots, and reports
FastAPI Demo: lightweight predict endpoint with input schema

Artifacts

Replace links above with your actual files when uploaded.

Reproducible CV (template)

from sklearn.model_selection import StratifiedKFold, cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import make_scorer, f1_score
import numpy as np

X, y = ...  # your features/labels
seed = 42
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=seed)

pipe = Pipeline([
    ('scale', StandardScaler(with_mean=False) if X.dtype.kind=='O' else StandardScaler()),
    ('clf', LogisticRegression(max_iter=1000, n_jobs=None, random_state=seed))
])

scores = cross_val_score(pipe, X, y, cv=cv,
                         scoring=make_scorer(f1_score, average='macro'))
print(f"CV macro F1: {scores.mean():.3f} ± {scores.std():.3f}")

Version Log

v0.1 — Baselines & CV scaffold
v0.2 — Tune RF/GB; add MLP with early stopping
v0.3 — Plotting utilities (confusion matrix, ROC/PR)
v0.4 — FastAPI demo + schema validation
v0.5 — Final report & slide polish

All Projects Contact