Improving Model Representation and Reducing KV Cache via Skip Connections with First Value Heads | Read Paper on Bytez