| [1] |
Abdelrahman Abdallah and Mahmoud Kasem and Mahmoud Abdalla and Mohamed Mahmoud and Mohamed Elkasaby and Yasser Elbendary and Adam Jatowt. Arabicaqa: A comprehensive dataset for arabic question answering. Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2049-2059, 2024. [DOI ] |
| [2] |
Raushan Turganbay and Viacheslav Surkov and Dmitrii Evseev and Mikhail Drobyshevskiy. Generative Question Answering Systems over Knowledge Graphs and Text. 1112-1126, 2023. [DOI ] |
| [3] |
Raphael Gruber and Abdelrahman Abdallah and Michael Färber and Adam Jatowt. ComplexTempQA: A Large-Scale Dataset for Complex Temporal Question Answering. arXiv preprint arXiv:2406.04866. 2024. [DOI ] |
| [4] |
Yiming Cui and Ting Liu and Wanxiang Che and Li Xiao and Zhipeng Chen and Wentao Ma and Shijin Wang and Guoping Hu. A span-extraction dataset for Chinese machine reading comprehension. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 5883–5889, 2019. [DOI ] |
| [5] |
ByungHoon So and Kyuhong Byun and Kyungwon Kang and Seongjin Cho. Jaquad: Japanese question answering dataset for machine reading comprehension. arXiv preprint arXiv:2202.01764. 2022. [DOI ] |
| [6] |
Qi Liu and Matt J. Kusner and Phil Blunsom. A Survey on Contextual Embeddings. DeepAI. 2020. [DOI ] |
| [7] |
Satanjeev Banerjee and Alon Lavie. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments. Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, Proceedings of the Workshop ACL 2005. 2005. [DOI ] |
| [8] |
Naoya Inoue. R4C: A Benchmark for Evaluating RC Systems to Get the Right Answer for the Right Reason. Journal of Natural Language Processing. 27: 2020. [DOI ] |
| [9] |
Yang Bai and Daisy Zhe Wang. More Than Reading Comprehension: A Survey on Datasets and Metrics of Textual Question Answering. arXiv preprint arXiv:2109.12264. https://arxiv.org/abs/2109.12264. 2021. |
| [10] |
Feng Gao and Jiancheng Ni and Peng Gao and Zili Zhou and Yan-Yan Li and Hamido Fujita. Heterogeneous Graph Attention Network for Multi-hop Machine Reading Comprehension. CoRR. https://arxiv.org/abs/2101.11954. 2021. |
| [11] |
Daniel Khashabi and Tushar Khot and Ashish Sabharwal and Peter Clark and Oren Etzioni and Dan Roth. Question Answering via Integer Programming over Semi-Structured Knowledge. Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI). 2016-January: 2232--2238, 2016. [DOI ] |
| [12] |
Linfeng Song and Zhiguo Wang and Mo Yu and Yue Zhang and Radu Florian and Daniel Gildea. Evidence Integration for Multi-hop Reading Comprehension with Graph Neural Networks. IEEE Transactions on Knowledge and Data Engineering. IEEE. 2020. [DOI ] |
| [13] |
Danqi Chen. Neural reading comprehension and beyond. Stanford University. 2018. |
| [14] |
Karl Moritz Hermann and Tomas Kocisky and Edward Grefenstette and Lasse Espeholt and Will Kay and Mustafa Suleyman and Phil Blunsom. Teaching machines to read and comprehend. Advances in neural information processing systems. 28: 1693-1701, 2015. [DOI ] |
| [15] |
Pranav Rajpurkar and Jian Zhang and Konstantin Lopyrev and Percy Liang. Squad: 100,000+ questions for machine comprehension of text. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Austin, Texas: Association for Computational Linguistics. 2383-2392, 2016. [DOI ] |
| [16] |
Mokanarangan Thayaparan and Marco Valentino and André Freitas. A Survey on Explainability in Machine Reading Comprehension. arXiv preprint arXiv:2010.00389. 2020. [DOI ] |
| [17] |
Tushar Khot and Peter Clark and Michal Guerquin and Peter Jansen and Ashish Sabharwal. Qasc: A dataset for question answering via sentence composition. Proceedings of the AAAI Conference on Artificial Intelligence. 34: 8082-8090, 2020. [DOI ] |
| [18] |
Zhuosheng Zhang and Hai Zhao and Rui Wang. Machine reading comprehension: The role of contextualized language models and beyond. arXiv preprint arXiv:2005.06249. 2020. [DOI ] |
| [19] |
Wenhu Chen and Hanwen Zha and Zhiyu Chen and Wenhan Xiong and Hong Wang and William Wang. HybridQA: A Dataset of Multi-Hop Question Answering over Tabular and Textual Data. Proceedings of Findings of EMNLP 2020. 1026--1036, 2020. [DOI ] |
| [20] |
Xanh Ho and Anh-Khoa Duong Nguyen and Saku Sugawara and Akiko Aizawa. Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps. Proceedings of the 28th International Conference on Computational Linguistics. 6609--6625, 2020. [DOI ] |
| [21] |
Adam Trischler and Tong Wang and Xingdi Yuan and Justin Harris and Alessandro Sordoni and Philip Bachman and Kaheer Suleman. Newsqa: A machine comprehension dataset. arXiv preprint arXiv:1611.09830. 2016. [DOI ] |
| [22] |
Razieh Baradaran and Razieh Ghiasi and Hossein Amirkhani. A survey on machine reading comprehension systems. Natural Language Engineering. 1--50, 2020. [DOI ] |
| [23] |
Tao Shen and Tianyi Zhou and Guodong Long and Jing Jiang and Shirui Pan and Chengqi Zhang. Disan: Directional self-attention network for rnn/cnn-free language understanding. Proceedings of the AAAI Conference on Artificial Intelligence. 2017. [DOI ] |
| [24] |
Alon Talmor and Jonathan Berant. Repartitioning of the complexwebquestions dataset. arXiv preprint arXiv:1807.09623. 2018. [DOI ] |
| [25] |
Chin-Yew Lin. Rouge: A package for automatic evaluation of summaries. Text summarization branches out. 74-81, 2004. |
| [26] |
Kishore Papineni and Salim Roukos and Todd Ward and Wei-Jing Zhu. BLEU: a method for automatic evaluation of machine translation. Proceedings of the 40th annual meeting of the Association for Computational Linguistics. 311-318, 2002. [DOI ] |
| [27] |
Todor Mihaylov and Peter Clark and Tushar Khot and Ashish Sabharwal. Can a suit of armor conduct electricity? A new dataset for open book question answering. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2381--2391, 2018. [DOI ] |
| [28] |
Zhilin Yang and Peng Qi and Saizheng Zhang and Yoshua Bengio and William W. Cohen and Ruslan Salakhutdinov and Christopher D. Manning. HotpotQA: A dataset for diverse, explainable multi-hop question answering. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2369--2380, 2018. [DOI ] |
| [29] |
Shanshan Liu and Xin Zhang and Sheng Zhang and Hui Wang and Weiming Zhang. Neural machine reading comprehension: Methods and trends. Applied Sciences. 9(18): 3698, 2019. [DOI ] |
| [30] |
Linfeng Song and Zhiguo Wang and Mo Yu and Yue Zhang and Radu Florian and Daniel Gildea. Evidence Integration for Multi-hop Reading Comprehension with Graph Neural Networks. IEEE Transactions on Knowledge and Data Engineering. IEEE. 2020. [DOI ] |
| [31] |
Yichen Jiang and Mohit Bansal. Self-assembling Modular Networks for Interpretable Multi-hop Reasoning. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 4474--4484, 2019. [DOI ] |
| [32] |
Danqi Chen and Jason Bolton and Christopher D. Manning. A Thorough Examination of the \{CNN\}/Daily Mail Reading Comprehension Task. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2358--2367, Association for Computational Linguistics. 2016. [DOI ] |
| [33] |
Bhuwan Dhingra and Kathryn Mazaitis and William W. Cohen. Quasar: Datasets for Question Answering by Search and Reading. arXiv preprint arXiv:1707.03904. 2017. [DOI ] |
| [34] |
Johannes Welbl and Pontus Stenetorp and Sebastian Riedel. Constructing Datasets for Multi-hop Reading Comprehension Across Documents. Transactions of the Association for Computational Linguistics. 6: 287--302, MIT Press. 2018. [DOI ] |
| [35] |
Sewon Min and Victor Zhong and Richard Socher and Caiming Xiong. Efficient and Robust Question Answering from Minimal Context over Documents. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1725--1735, Association for Computational Linguistics. 2018. [DOI ] |
| [36] |
Yiming Cui and Zhipeng Chen and Si Wei and Shijin Wang and Ting Liu and Guoping Hu. Attention-over-Attention Neural Networks for Reading Comprehension. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 593--602, Association for Computational Linguistics. 2017. [DOI ] |
| [37] |
Guokun Lai and Qizhe Xie and Hanxiao Liu and Yiming Yang and Eduard Hovy. RACE: Large-scale Reading Comprehension Dataset from Examinations. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 785--794, Association for Computational Linguistics. 2017. [DOI ] |
| [38] |
Robin Jia and Percy Liang. Adversarial Examples for Evaluating Reading Comprehension Systems. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2021--2031, Association for Computational Linguistics. 2017. [DOI ] |
| [39] |
Tomáš Kočisky and Jonathan Schwarz and Phil Blunsom and Chris Dyer and Karl Moritz Hermann and Gábor Melis and Edward Grefenstette. The NarrativeQA Reading Comprehension Challenge. Transactions of the Association for Computational Linguistics. 6: 317--328, MIT Press. 2018. [DOI ] |
| [40] |
Jonathan Berant and Vivek Srikumar and Pei-Chun Chen and Abby Vander Linden and Brittany Harding and Brad Huang and Peter Clark and Christopher D Manning. Modeling Biological Processes for Reading Comprehension. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 1499--1510, 2014. [DOI ] |
| [41] |
Mandar Joshi and Eunsol Choi and Daniel S Weld and Luke Zettlemoyer. TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1601--1611, 2017. [DOI ] |
| [42] |
Wen-tau Yih and Matthew Richardson and Christopher Meek and Ming-Wei Chang and Jina Suh. The Value of Semantic Parse Labeling for Knowledge Base Question Answering. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 201--206, 2016. [DOI ] |
| [43] |
Kurt Bollacker and Colin Evans and Praveen Paritosh and Tim Sturge and Jamie Taylor. Freebase: A Collaboratively Created Graph Database for Structuring Human Knowledge. Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. 1247--1250, 2008. [DOI ] |
| [44] |
Hongye Tan and Xiaoyue Wang and Yu Ji and Ru Li and Xiaoli Li and Zhiwei Hu and Yunxiao Zhao and Xiaoqi Han. GCRC: A New Challenging MRC Dataset from Gaokao Chinese for Explainable Evaluation. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. 1319--1330, 2021. [DOI ] |
| [45] |
Abdalghani Abujabal and Rishiraj Saha Roy and Mohamed Yahya and Gerhard Weikum. ComQA: A Community-Sourced Dataset for Complex Factoid Question Answering with Paraphrase Clusters. arXiv preprint arXiv:1809.09528. 2018. [DOI ] |
| [46] |
Bhawna Piryani and Jamshid Mozafari and Adam Jatowt. ChroniclingAmericaQA: A Large-Scale Question Answering Dataset Based on Historical American Newspaper Pages. Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2038--2048, 2024. [DOI ] |
| [47] |
Shengkun Ma and Hao Peng and Lei Hou and Juanzi Li. MRCEval: A Comprehensive, Challenging and Accessible Machine Reading Comprehension Benchmark. arXiv preprint arXiv:2503.07144. 2025. [DOI ] |
| [48] |
Alena Fenogenova and Vladislav Mikhailov and Denis Shevelev. Read and Reason with MuSeRC and RuCoS: Datasets for Machine Reading Comprehension for Russian. Proceedings of the 28th International Conference on Computational Linguistics. 6481-6497, 2020. [DOI ] |