MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

Devs

MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding | Read Paper on Bytez