Problem statement & goal
Classic reinforcement learning struggled when the “state” was raw pixels and the action space was large. The question: can a single deep network learn to play Atari from screen input only, using rewards as the only supervision?