Efficient Pre-Training of LLMs via Topology-Aware Communication Alignment on More Than 9600 GPUs | Read Paper on Bytez