Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustmen

Last updated