The .csv files offered in this subdirectory are created from the following raw corpus, which saves a document-classification problem: https://www.kaggle.com/datasets/kw5454331/anti-lgbt-cyberbullying-texts The different .csv files are obtained after preprocessing (lowercase, stopwords, stemming, etc.) the raw corpus, and represented with bag-of-words of different size: the words that more frequently appear in the documents of the corpus.