Counting in Small Transformers: The Delicate Interplay between Attention and Feed-Forward Layers | Read Paper on Bytez