D4: Improving LLM Pretraining via Document De-Duplication and Diversification | Read Paper on Bytez