Quantifying Uncertainty in Error Consistency: Towards Reliable Behavioral Comparison of Classifiers | Read Paper on Bytez