A New Dataset for Causality Identification in Argumentative Texts

Khalid Al Khatib, Michael Völske, Anh Le, Shahbaz Syed, Martin Potthast, Benno Stein


In Sessions:

Sigdial Poster Session 2: (Thursday, 14:00 CEST, Foyer , Chat on Discord )

Abstract: Existing datasets for causality identification in argumentative texts have several limitations, such as the type of input text (e.g., only claims), causality type (e.g., only positive), and the linguistic patterns investigated (e.g., only verb connectives). To resolve these limitations, we build the Webis-Causality-23 dataset, with sophisticated inputs (all units from arguments), a balanced distribution of causality types, and a larger number of linguistic patterns denoting causality. The dataset contains 1485 examples derived by combining the two paradigms of distant supervision and uncertainty sampling to identify diverse, high-quality samples of causality relations, and annotate them in a cost-effective manner.