Humans have always been curious about developing methodologies and models that could help machines interpret sentences and even sarcastic remarks. 

Unprecedented efforts are going on to help machines classify text and so there have been developed several pre-trained models that can be fine-tuned to efficiently solve many problems related to Natural Language Processing (NLP).

Here, in this article, we are briefing out 5 State-Of-The-Art (SOTA) pre-trained models for text classification.



XLNet, the text classification model by Google, has outgrown BERT with its outstanding performance in various Language Processing tasks, such as Analysis of Sentiments, Text Classification, and much more. It also showed up its remarkable performance on the GLUE benchmark for English.

Out of the two phases, pre-train and fine-tune phase, that the Language model constitutes of, the XLNet has a major focus on the pre-trained phase. XLNet came up with a proposition of a newer objective, based on the theories of permutation called Permutation Language Modeling (PLM).


 Enhanced Representation through Knowledge Integration (ERNIE)

The Pre-trained Model that defeated Google XLNet and BERT on the English dataset GLUE is the ERNIE. The model has been developed by Baidu, and the 1.0 version of ERNIE, despite achieving great milestones, did not receive much recognition than ERNIE 2.0. The ERNIE 2.0 outperformed the older version and became the hot gossip in the later duration of 2019.

The other SOTA models were outperformed by ERNIE in areas like extracting the meaning from a sentence, analysis of sentiments, and much more.


 Binary Partition Transformer (BPT)

The Binary Partition Transformer uses the popular Transformer Architecture for machine translation, text classification, and much more.  

The transformers utilize the mechanism of self-attention, which hikes up its cost. The self-attention mechanism is operated on the sentence itself to establish a clear relationship between two words in a sentence. The efficiency of the self-attention mechanism is enhanced by the BPT with the graph neural network being used for treating the transformer.


 Neural Attentive Bag-of-Entities (NABoE)

The neural network model, Neural Attentive Bag-of-Entities outperformed the traditional pre-trained models. The NABoE model builds a bag of entities and uses Wikipedia corpus for that purpose.

The entity building can be seen as searching Wikipedia to find all the related entities of a single entity. Thereafter, a smaller subset of entities is prepared constituting only those entities that are closely related to the particular document. 


Text-to-Text Transfer Transformer (T5)

The Text-to-Text Transfer model by Google makes use of the interesting transfer learning model.

The most fascinating feature of this model is the conversion of every problem to text input and receiving the output as text format too. This means that the model accepts the inputs in the form of text and the output would be returned as words for the problems of text classification. Though the model is yet being researched to discover more utilities and features, the text-to-text transfer transformer model achieved SOTA for more than 20 tasks of Natural Language Processing.