Detecting Instruction Fine-tuning Attack on Language Models with Influence Function

Devs

Detecting Instruction Fine-tuning Attack on Language Models with Influence Function | Read Paper on Bytez