How Transformers Learn Structured Data: Insights From Hierarchical Filtering | Read Paper on Bytez