b
Discover
Models
Search
About
MARS: Benchmarking the Metaphysical Reasoning Abilities of Language Models with a Multi-task Evaluation Dataset
6 months ago
·
arXiv