There’s hardly anyone left in the world data science community who wouldn’t agree that the release of BERT was the most exciting event in the NLP field.
For those who still haven’t heard: BERT is a transformer-based technique for pretraining contextual word representations that enables state-of-the-art results across a wide array of natural language processing tasks. The BERT paper was acknowledged as the best long paper