Exercise 9
Recognizing Handwritten Digits with PyTorch (FCN → CNN)
Today you’ll recognize handwritten digits (MNIST) with PyTorch and PyTorch Lightning. You’ll first run a fully-connected network (FCN), then convert it into a convolutional neural network (CNN). The CNN should achieve higher accuracy.
Dataset: MNIST (same as lecture). If needed, download from the course Indico.
Part 1: Review the lecture notebook (context)
- Open the lecture notebook from Indico and scroll through:
- A NN coded from scratch (NumPy).
- A PyTorch/Lightning implementation of the same idea.
- Focus on three key differences (vs. your NumPy version):
- Mini-batch gradient descent
- Validation set (for monitoring/tuning)
- Optimizers/learning-rate handling (e.g., Adam)
- Skim the Lightning module (class) and try to map what each method is responsible for:
__init__: define layers and lossforward: model forward passtraining_step,validation_step,test_step: per-batch logic + loggingconfigure_optimizers: define optimizer (e.g., Adam)
Helpful references:
- PyTorch: Neural Networks Tutorial: https://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html
- Lightning: What is a LightningModule?: https://lightning.ai/docs/pytorch/stable/common/lightning_module.html
Part 2: Run the fully-connected (FCN) MNIST model in Lightning
- Open the provided Lightning FCN code in the lecture notebook. This is just the same code we wrote from scratch in class, but all fancy! No more manual chain rule.
- Run training. Note:
- Train/val accuracy
- Time per epoch
- Compare to your NumPy version:
- Did mini-batch training/Adam help?
- Is validation accuracy stable?
Part 3: Convert the FCN to a CNN in Lightning
Goal: Modify the FCN so it uses convolution and pooling (a basic CNN).
Step A — Adjust the data shape for images
CNNs expect images as [batch, channels, height, width].
Batch size is the number of images in each batch (like 64 from before)
Channels is the number of color channels in the image. MNIST is grayscale, so 1.
If your dataloader currently does:
X_tensor = torch.from_numpy(X) # shape: [N, 784]change it to:
X_tensor = torch.from_numpy(X).view(-1, 1, 28, 28) # shape: [N, 1, 28, 28]
Reference (Tensors & shapes): https://pytorch.org/docs/stable/tensors.html
Step B — Replace the FCN with a small CNN
Start simple (match exactly first, then experiment):
self.model = nn.Sequential(
nn.Conv2d(1, 32, kernel_size=3, stride=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2),
nn.Conv2d(32, 64, kernel_size=3, stride=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2),
nn.Flatten(), # → [batch, 64*5*5]
nn.Linear(64 * 5 * 5, 64),
nn.ReLU(),
nn.Linear(64, 10),
)
Why 64 * 5 * 5?
- Input: 1×28×28
- After Conv(3×3): 32×26×26
- After MaxPool(2): 32×13×13
- After Conv(3×3): 64×11×11
- After MaxPool(2): 64×5×5
- Flatten → 6455 = 1600 features
Some layer options:
- Conv2d: https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html
- MaxPool2d: https://pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html
- Flatten: https://pytorch.org/docs/stable/generated/torch.nn.Flatten.html
- Dropout / Dropout2d: https://pytorch.org/docs/stable/generated/torch.nn.Dropout.html
- Linear: https://pytorch.org/docs/stable/generated/torch.nn.Linear.html
Step C — Keep the training loop logic the same
training_step,validation_step,test_stepstay the same.- Change the loss function to
nn.CrossEntropyLoss() - Optimizer: Keeping it the same as before, Adam with
lr=1e-3is fine to start!
Step D — Train and compare
- Train your CNN for ~10 epochs.
- Record val/test accuracy and compare to FCN
Part 4: Experiments to try (short and focused)
Try one change these one at a time. Record val accuracy for each.
- Convolutional Filters: change
32 → 16or64 → 128 - Kernel size:
3 → 5or7 - Add
padding=1to the convolutional layers to keep spatial size - Add dropout layers, use
Dropout(0.5)orDropout2d(0.25), for example - Batch size:
64 → 128or32
What is the best validation accuracy you can achieve?
Part 5: Test your model
- Only evaluate on the test set once, after all modeling choices are finalized. Using test results to make changes introduces bias; if you’ve peeked, create a new test set and start over!
- Test your model on the test set.
- Record the test accuracy.
- How good can you get?? If you show Jarred a better CNN test accuracy than the default, or at least a nice attempt at one, he will give you a secret word ✨
Troubleshooting
Shape mismatch (common):
Print shapes inside
forward:def forward(self, x): print("Input:", x.shape) x = self.model[0](x) # first layer print("After conv1:", x.shape) ... return self.model(x)Ensure the
nn.Linearinput size matches your flattened feature count.
Accuracy is stuck at a low value:
- Confirm normalization to
[0,1] - Check that labels are of type
int64 - Try a slightly higher learning rate (e.g.,
2e-3) or lower (5e-4) - Ensure you didn’t add
SoftmaxbeforeCrossEntropyLoss
- Confirm normalization to