On Masked Pre-training and the Marginal Likelihood | Read Paper on Bytez