Propaganda is commonly defined as information of a biased or misleading nature, possibly purposefully shaped, to promote an agenda or a cause. In this project we are trying to build machine learning system for the Detection of Propaganda Techniques in News Articles.There are two subtasks to be solved as part of this project which are Span Identification and Technique Classification. We are able to secure position 17 on leader board in SI Task and Position 20 in TC Task.
The propaganda detection pipeline includes two sub tasks
Task-2 is a 14-class classification task. The distribution amongst the classes is shown below. Dataset is highly imbalance
Many propaganda includes words like god, church and Muslim. It shows that religion is used as propaganda more commonly.
Task-1 – Span Identification Task: For the baseline architecture we created P/NP tagging and trained Bi-Directional LSTM Model. Please chck here for more details
Task-2 – Technique Classification Task: For the baseline architecture as features we used context, the span present in the context and ratio of length of context and length of span. Please check here for more details
Team Information
SI task
TC task
Ilya Loshchilov and Frank Hutter. 2017. Fixing weight decay regularization in adam. . CoRR, abs 1711.05101.
Lance Ramshaw and Mitch Marcus. 1995. Text chunking using transformation-based learning. .In Third Workshop on Very Large Corpora.
Li,W., Li, S., Liu, C. et al. Span identification and technique classification of propaganda in news articles. ComplexIntell. Syst. (2021). https://doi.org/10.1007/s40747-021-00393-y
Yinhan Liu and Myle Ott and Naman Goyal and Jingfei Du and Mandar Joshi and Danqi Chen and Omer Levy and Mike Lewis and Luke Zettlemoyer and Veselin Stoyanov. 2019. Roberta: A robustly optimized BERT pretraining approach. CoRR,abs/1907.11692.
Giovanni Da San Martino, Seunghak Yu, Alberto Barron-Cede ´ no, Rostislav Petrov, and Preslav Nakov. 2019.Fine-grained analysis of propaganda in news articles. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing EMNLP-IJCNLP 2019, EMNLP-IJCNLP 2019, Hong Kong, China, November.
Giovanni Da San Martino, Alberto Barron-Cede ´ no, Henning Wachsmuth, Rostislav Petrov, and Preslav Nakov. 2020. SemEval-2020 task 11: Detection of propaganda techniques in news articles. 2019. In Proceedings of the 14th International Workshop on Semantic Evaluation, SemEval 2020, Barcelona, Spain, September.