When linear models outperform complex deep learning models
FastText, by Facebook Research, is a library for efficient learning of word representations and text classification. FastText supports supervised (classifications) and unsupervised (embedding) representations of words and sentences.
However, the documentation of the FastText package doesn’t provide details about the implemented classifier and processing steps.
Here we try to track the underlying algorithmic implementation of the FastText package.
In a nutshell, there is no magic, but few smart steps:
- A sentence/document vector is obtained by averaging the word/n-gram embeddings.
- For the classification task, multinomial logistic regression is used, where the sentence/document vector corresponds to the features.
- When applying FastText on problems with a large number of classes, you can use the hierarchical softmax to speed-up the computation.
FastText “Bag of Tricks for Efficient Text Classification”:
- Starts with word representations that are averaged into text representation and feed them to a linear classifier (multinomial logistic regression).
- Text representation as a hidden state that can be shared among features and classes.
- Softmax layer to obtain a probability distribution over pre-defined classes.
- Hierarchial Softmax: Based on Huffman Coding Tree Used to reduce computational complexity O(kh) to O(hlog(k)), where k is the number of classes and h is dimension of text representation.
- Uses a bag of n-grams to maintain efficiency without losing accuracy. No explicit use of word order.
- Uses hashing trick to maintain fast and memory efficient mapping of the n-grams.
- It is written in C++ and supports multiprocessing during training.
FastText vs Deep Learning for text classification:
For different text classification tasks FastText shows results that are on par with deep learning models in terms of accuracy, though an order of magnitude faster in performance.
Note: The high accuracy of simple FastText algorithms is an indicator that the text classification problem is still not understood well enough to construct really efficient nonlinear classification models.