Training Loop
A Minimal Classifier With Advanced Habits
This tiny example is not an ASR system. It is a clean template for
habits that matter later: explicit shapes, split metrics, deterministic
seeds, and small-batch overfit checks.
import random
import numpy as np
import torch
from torch import nn
def seed_everything(seed=7):
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
class TinyClassifier(nn.Module):
def __init__(self, in_features, hidden, classes):
super().__init__()
self.net = nn.Sequential(
nn.Linear(in_features, hidden),
nn.ReLU(),
nn.LayerNorm(hidden),
nn.Linear(hidden, classes),
)
def forward(self, x):
# x: [batch, features]
return self.net(x)
def train_step(model, batch, optimizer):
model.train()
x, y = batch
logits = model(x)
assert logits.shape[0] == y.shape[0]
loss = nn.functional.cross_entropy(logits, y)
optimizer.zero_grad(set_to_none=True)
loss.backward()
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
optimizer.step()
return float(loss.detach())
Question: Why include a tiny-batch overfit test?
If the model cannot drive loss near zero on a handful of examples,
something basic is broken: labels, shapes, masking, optimizer, feature
scale, loss wiring, or train/eval mode. This test is cheap and catches
failures before expensive experiments.
Question: How does this map to speech?
Replace fixed vectors with frames or learned audio tokens, add
padding masks, and choose a sequence objective. The same discipline
remains: verify tensor axes, loss behavior, gradient scale, split
metrics, deterministic fixtures, and a small overfit run.