Exercise 9

Recognizing Handwritten Digits with PyTorch (FCN → CNN)

Today you’ll recognize handwritten digits (MNIST) with PyTorch and PyTorch Lightning. You’ll first run a fully-connected network (FCN), then convert it into a convolutional neural network (CNN). The CNN should achieve higher accuracy.

Dataset: MNIST (same as lecture). If needed, download from the course Indico.


Part 1: Review the lecture notebook (context)

  1. Open the lecture notebook from Indico and scroll through:
    • A NN coded from scratch (NumPy).
    • A PyTorch/Lightning implementation of the same idea.
  2. Focus on three key differences (vs. your NumPy version):
    • Mini-batch gradient descent
    • Validation set (for monitoring/tuning)
    • Optimizers/learning-rate handling (e.g., Adam)
  3. Skim the Lightning module (class) and try to map what each method is responsible for:
    • __init__: define layers and loss
    • forward: model forward pass
    • training_step, validation_step, test_step: per-batch logic + logging
    • configure_optimizers: define optimizer (e.g., Adam)

Helpful references:


Part 2: Run the fully-connected (FCN) MNIST model in Lightning

  1. Open the provided Lightning FCN code in the lecture notebook. This is just the same code we wrote from scratch in class, but all fancy! No more manual chain rule.
  2. Run training. Note:
    • Train/val accuracy
    • Time per epoch
  3. Compare to your NumPy version:
    • Did mini-batch training/Adam help?
    • Is validation accuracy stable?

Part 3: Convert the FCN to a CNN in Lightning

Goal: Modify the FCN so it uses convolution and pooling (a basic CNN).

Step A — Adjust the data shape for images

CNNs expect images as [batch, channels, height, width].

  • Batch size is the number of images in each batch (like 64 from before)

  • Channels is the number of color channels in the image. MNIST is grayscale, so 1.

  • If your dataloader currently does:

    X_tensor = torch.from_numpy(X)  # shape: [N, 784]
    

    change it to:

    X_tensor = torch.from_numpy(X).view(-1, 1, 28, 28)  # shape: [N, 1, 28, 28]
    

Reference (Tensors & shapes): https://pytorch.org/docs/stable/tensors.html

Step B — Replace the FCN with a small CNN

Start simple (match exactly first, then experiment):

self.model = nn.Sequential(
    nn.Conv2d(1, 32, kernel_size=3, stride=1),
    nn.ReLU(),
    nn.MaxPool2d(kernel_size=2),

    nn.Conv2d(32, 64, kernel_size=3, stride=1),
    nn.ReLU(),
    nn.MaxPool2d(kernel_size=2),

    nn.Flatten(),                # → [batch, 64*5*5]
    nn.Linear(64 * 5 * 5, 64),
    nn.ReLU(),
    nn.Linear(64, 10),
)

Why 64 * 5 * 5?

  • Input: 1×28×28
  • After Conv(3×3): 32×26×26
  • After MaxPool(2): 32×13×13
  • After Conv(3×3): 64×11×11
  • After MaxPool(2): 64×5×5
  • Flatten → 6455 = 1600 features

Some layer options:

Step C — Keep the training loop logic the same

  • training_step, validation_step, test_step stay the same.
  • Change the loss function to nn.CrossEntropyLoss()
  • Optimizer: Keeping it the same as before, Adam with lr=1e-3 is fine to start!

Step D — Train and compare

  • Train your CNN for ~10 epochs.
  • Record val/test accuracy and compare to FCN

Part 4: Experiments to try (short and focused)

Try one change these one at a time. Record val accuracy for each.

  • Convolutional Filters: change 32 → 16 or 64 → 128
  • Kernel size: 3 → 5 or 7
  • Add padding=1 to the convolutional layers to keep spatial size
  • Add dropout layers, use Dropout(0.5) or Dropout2d(0.25), for example
  • Batch size: 64 → 128 or 32

What is the best validation accuracy you can achieve?


Part 5: Test your model

  • Only evaluate on the test set once, after all modeling choices are finalized. Using test results to make changes introduces bias; if you’ve peeked, create a new test set and start over!
  • Test your model on the test set.
  • Record the test accuracy.
  • How good can you get?? If you show Jarred a better CNN test accuracy than the default, or at least a nice attempt at one, he will give you a secret word ✨

Troubleshooting

  • Shape mismatch (common):

    • Print shapes inside forward:

      def forward(self, x):
          print("Input:", x.shape)
          x = self.model[0](x)  # first layer
          print("After conv1:", x.shape)
          ...
          return self.model(x)
      
    • Ensure the nn.Linear input size matches your flattened feature count.

  • Accuracy is stuck at a low value:

    • Confirm normalization to [0,1]
    • Check that labels are of type int64
    • Try a slightly higher learning rate (e.g., 2e-3) or lower (5e-4)
    • Ensure you didn’t add Softmax before CrossEntropyLoss

Submit the answer: