Analyzing Similarity Metrics for Data Selection for Language Model Pretraining | Read Paper on Bytez