ARES: Alternating Reinforcement Learning and Supervised Fine-Tuning for Enhanced Multi-Modal Chain-of-Thought Reasoning Through Diverse AI FeedbackPublished in EMNLP 2024, 2024Share on Twitter Facebook LinkedIn Previous Next