Problem statement & goal
NLP models were often pre-trained left-to-right, which weakens context on both sides of a word. BERT’s goal: deep bidirectional representations from unlabeled text, then one small task-specific layer for many downstream benchmarks.