skip to main content
survey
Open Access

Academic Plagiarism Detection: A Systematic Literature Review

Authors Info & Claims
Published:16 October 2019Publication History
Skip Abstract Section

Abstract

This article summarizes the research on computational methods to detect academic plagiarism by systematically reviewing 239 research papers published between 2013 and 2018. To structure the presentation of the research contributions, we propose novel technically oriented typologies for plagiarism prevention and detection efforts, the forms of academic plagiarism, and computational plagiarism detection methods. We show that academic plagiarism detection is a highly active research field. Over the period we review, the field has seen major advances regarding the automated detection of strongly obfuscated and thus hard-to-identify forms of academic plagiarism. These improvements mainly originate from better semantic text analysis methods, the investigation of non-textual content features, and the application of machine learning. We identify a research gap in the lack of methodologically thorough performance evaluations of plagiarism detection systems. Concluding from our analysis, we see the integration of heterogeneous analysis methods for textual and non-textual content features using machine learning as the most promising area for future research contributions to improve the detection of academic plagiarism further.

References

  1. Assad Abbas, Limin Zhang, and Samee U. Khan. 2014. A literature review on the state-of-the-art in patent analysis. World Pat. Inf. 37 (2014), 3–13. DOI:10.1016/j.wpi.2013.12.006Google ScholarGoogle ScholarCross RefCross Ref
  2. Asad Abdi, Norisma Idris, Rasim M. Alguliyev, and Ramiz M. Aliguliyev. 2015. PDLK: Plagiarism detection using linguistic knowledge. Expert Syst. Appl. 42, 22 (2015), 8936--8946. DOI:10.1016/j.eswa.2015.07.048Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Samira Abnar, Mostafa Dehghani, Hamed Zamani, and Azadeh Shakery. 2014. Expanded n-grams for semantic text alignment—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14).Google ScholarGoogle Scholar
  4. Sadia Afroz, Aylin Caliskan Islam, Ariel Stolerman, Rachel Greenstadt, and Damon McCoy. 2014. Doppelgänger finder: Taking stylometry to the underground. In Proceedings of the 2014 IEEE Symposium on Security and Privacy. 212--226.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Naveed Afzal, Yanshan Wang, and Hongfang Liu. 2016. MayoNLP at SemEval-2016 Task 1: Semantic textual similarity based on lexical semantic net and deep learning semantic model. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval’16). 674--679.Google ScholarGoogle ScholarCross RefCross Ref
  6. Basant Agarwal, Heri Ramampiaro, Helge Langseth, and Massimiliano Ruocco. 2018. A deep network model for paraphrase detection in short text messages. Inf. Process. Manag. 54, 6 (2018), 922--937. DOI:10.1016/j.ipm.2018.06.005Google ScholarGoogle ScholarCross RefCross Ref
  7. Eneko Agirre, Carmen Banea, Daniel Cer, Mona Diab, Aitor Gonzalez-Agirre, Rada Mihalcea, German Rigau, and Janyce Wiebe. 2016. Semeval-2016 task 1: Semantic textual similarity, monolingual and cross-lingual evaluation. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval’16). 497--511.Google ScholarGoogle ScholarCross RefCross Ref
  8. Mayank Agrawal and Dilip Kumar Sharma. 2016. A state of art on source code plagiarism detection. In Proceedings of the 2016 2nd International Conference on Next Generation Computing Technologies (NGCT’16). 236--241. DOI:10.1109/NGCT.2016.7877421Google ScholarGoogle ScholarCross RefCross Ref
  9. Mohammad Al-Smadi, Zain Jaradat, Mahmoud Al-Ayyoub, and Yaser Jararweh. 2017. Paraphrase identification and semantic text similarity analysis in arabic news tweets using lexical, syntactic, and semantic features. Inf. Process. Manag. 53, 3 (2017), 640--652. DOI:10.1016/j.ipm.2017.01.002Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Houda Alberts. 2017. Author clustering with the aid of a simple distance measure—Notebook for PAN at CLEF 2017. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’17).Google ScholarGoogle Scholar
  11. Hanan Aldarmaki and Mona Diab. 2016. GWU NLP at SemEval-2016 Shared Task 1: Matrix factorization for crosslingual STS. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval’16). 663--667.Google ScholarGoogle ScholarCross RefCross Ref
  12. Mahmoud Alewiwi, Cengiz Orencik, and Erkay Savas. 2016. Efficient top-k similarity document search utilizing distributed file systems and cosine similarity. Cluster Comput. 19, 1 (2016), 109--126. DOI:10.1007/s10586-015-0506-0Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Zakiy Firdaus Alfikri and Ayu Purwarianti. 2014. Detailed analysis of extrinsic plagiarism detection system using machine learning approach (naive bayes and svm). Indones. J. Electr. Eng. Comput. Sci. 12, 11 (2014), 7884--7894.Google ScholarGoogle Scholar
  14. Muna Alsallal, Rahat Iqbal, Saad Amin, and Anne James. 2013. Intrinsic plagiarism detection using latent semantic indexing and stylometry. In Proceedings of the 2013 6th International Conference on Developments in eSystems Engineering. 145--150. DOI:10.1109/DeSE.2013.34Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Muna AlSallal, Rahat Iqbal, Saad Amin, Anne James, and Vasile Palade. 2016. An integrated machine learning approach for extrinsic plagiarism detection. In Proceedings of the 2016 9th International Conference on Developments in eSystems Engineering (DeSE’16). 203--208. DOI:10.1109/DeSE.2016.1Google ScholarGoogle ScholarCross RefCross Ref
  16. Muna AlSallal, Rahat Iqbal, Vasile Palade, Saad Amin, and Victor Chang. 2019. An integrated approach for intrinsic plagiarism detection. Fut. Gener. Comput. Syst. 96 (2019), 700--712. DOI:10.1016/j.future.2017.11.023Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Miguel A. Álvarez-Carmona, Marc Franco-Salvador, Esaú Villatoro-Tello, Manuel Montes-y-Gómez, Paolo Rosso, and Luis Villaseñor-Pineda. 2018. Semantically-informed distance and similarity measures for paraphrase plagiarism identification. J. Intell. Fuzzy Syst. 34, 5 (2018), 2983--2990.Google ScholarGoogle ScholarCross RefCross Ref
  18. Faisal Alvi, Mark Stevenson, and Paul Clough. 2014. Hashing and merging heuristics for text reuse detection. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14). 939--946.Google ScholarGoogle Scholar
  19. Faisal Alvi, Mark Stevenson, and Paul Clough. 2017. Plagiarism detection in texts obfuscated with homoglyphs. In Advances in Information Retrieval. 669--675.Google ScholarGoogle Scholar
  20. Salha Alzahrani. 2015. Arabic plagiarism detection using word correlation in N-Grams with K-Overlapping approach—Working notes for PAN-AraPlagDet at FIRE 2015. In Proceedings of the Forum for Information Retrieval Evaluation (FIRE’15).Google ScholarGoogle Scholar
  21. Salha M. Alzahrani, Naomie Salim, and Ajith Abraham. 2012. Understanding plagiarism linguistic patterns, textual features, and detection methods. IEEE Trans. Syst. Man, Cybern. C Appl. Rev. 42, 2 (2012), 133--149.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Habibollah Asghari, Salar Mohtaj, Omid Fatemi, Heshaam Faili, Paolo Rosso, and Martin Potthast. 2016. Algorithms and corpora for persian plagiarism detection. In Proceedings of the Forum for Information Retrieval Evaluation (FIRE’16). 61.Google ScholarGoogle Scholar
  23. Duygu Ataman, Jose G. C. De Souza, Marco Turchi, and Matteo Negri. 2016. FBK HLT-MT at SemEval-2016 Task 1: Cross-lingual semantic similarity measurement using quality estimation features and compositional bilingual word embeddings. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval’16). 570--576.Google ScholarGoogle ScholarCross RefCross Ref
  24. Farzindar Atefeh and Wael Khreich. 2015. A survey of techniques for event detection in twitter. Comput. Intell. 31, 1 (2015), 132--164. DOI:10.1111/coin.12017Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Douglas Bagnall. 2015. Author identification using multi-headed recurrent neural networks—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15).Google ScholarGoogle Scholar
  26. Douglas Bagnall. 2016. Authorship clustering using multi-headed recurrent neural networks—Notebook for PAN at CLEF 2016. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’16).Google ScholarGoogle Scholar
  27. Marco Baroni, Georgiana Dinu, and Germán Kruszewski. 2014. Don't count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 238--247.Google ScholarGoogle ScholarCross RefCross Ref
  28. Alberto Barrón-Cedeño, Parth Gupta, and Paolo Rosso. 2013. Methods for cross-language plagiarism detection. Knowl.-Based Syst. 50 (2013), 211--217. DOI:10.1016/j.knosys.2013.06.018Google ScholarGoogle ScholarCross RefCross Ref
  29. Alberto Barrón-Cedeño, Marta Vila, M. Antònia Martí, and Paolo Rosso. 2013. Plagiarism meets paraphrasing: insights for the next generation in automatic plagiarism detection. Comput. Linguist. 39, 4 (2013), 917--947. DOI:10.1162/COLI_a_00153Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Alberto Bartoli, Alex Dagri, Andrea De Lorenzo, Eric Medvet, and Fabiano Tarlao. 2015. An author verification approach based on differential features—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15).Google ScholarGoogle Scholar
  31. Jeffrey Beall. 2016. Best practices for scholarly authors in the age of predatory journals. Ann. R. Coll. Surg. Engl. 98, 2 (2016), 77--79.Google ScholarGoogle ScholarCross RefCross Ref
  32. Imene Bensalem, Imene Boukhalfa, Paolo Rosso, Lahsen Abouenour, Kareem Darwish, and Salim Chikhi. 2015. Overview of the AraPlagDet PAN@FIRE2015 shared task on arabic plagiarism detection. In Proceedings of the Forum for Information Retrieval Evaluation (FIRE’15).Google ScholarGoogle Scholar
  33. Imene Bensalem, Salim Chikhi, and Paolo Rosso. 2013. Building arabic corpora from wikisource. In Proceedings of the 2013 ACS International Conference on Computer Systems and Applications (AICCSA’13). 1--2. DOI:10.1109/AICCSA.2013.6616474Google ScholarGoogle ScholarCross RefCross Ref
  34. Imene Bensalem, Paolo Rosso, and Salim Chikhi. 2013. A new corpus for the evaluation of arabic intrinsic plagiarism detection. In Information Access Evaluation: Multilinguality, Multimodality, and Visualization. 53--58.Google ScholarGoogle Scholar
  35. Imene Bensalem, Paolo Rosso, and Salim Chikhi. 2014. Intrinsic plagiarism detection using n-gram classes. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1459--1464.Google ScholarGoogle ScholarCross RefCross Ref
  36. Ergun Bicici. 2016. RTM at SemEval-2016 Task 1: Predicting semantic similarity with referential translation machines and related statistics. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval’16). 758--764.Google ScholarGoogle ScholarCross RefCross Ref
  37. Victoria Bobicev. 2013. Authorship detection with PPM—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13).Google ScholarGoogle Scholar
  38. Hadj Ahmed Bouarara, Amine Rahmani, Reda Mohamed Hamou, and Abdelmalek Amine. 2014. Machine learning tool and meta-heuristic based on genetic algorithms for plagiarism detection over mail service. In Proceedings of the 2014 IEEE/ACIS 13th International Conference on Computer and Information Science (ICIS’14). 157--162. DOI:10.1109/ICIS.2014.6912125Google ScholarGoogle ScholarCross RefCross Ref
  39. Barry Bozeman, Daniel Fay, and Catherine P. Slade. 2013. Research collaboration in universities and academic entrepreneurship: The-state-of-the-art. J. Technol. Transf. 38, 1 (2013), 1--67. DOI:10.1007/s10961-012-9281-8Google ScholarGoogle ScholarCross RefCross Ref
  40. Pearl Brereton, Barbara A. Kitchenham, David Budgen, Mark Turner, and Mohamed Khalil. 2007. Lessons from applying the systematic literature review process within the software engineering domain. J. Syst. Softw. 80, 4 (2007), 571--583. DOI:10.1016/j.jss.2006.07.009Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Tomáš Brychcín and Lukáš Svoboda. 2016. UWB at SemEval-2016 Task 1: Semantic textual similarity using lexical, syntactic, and semantic information. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval’16). 588--594.Google ScholarGoogle ScholarCross RefCross Ref
  42. Davide Buscaldi, Joseph Le Roux, Jorge J. García Flores, and Adrian Popescu. 2013. LIPN-CORE: Semantic text similarity using n-grams, wordnet, syntactic analysis, ESA and information retrieval based features. In Proceedings of the 2nd Joint Conference on Lexical and Computational Semantics. 63.Google ScholarGoogle Scholar
  43. Esteban Castillo, Ofelia Cervantes, Darnes Vilariño, David Pinto, and Saul León. 2014. Unsupervised method for the authorship identification task—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14).Google ScholarGoogle Scholar
  44. Daniel Castro, Yaritza Adame, María Pelaez, and Rafael Muñoz. 2015. Authorship verification, combining linguistic features and different similarity functions—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15).Google ScholarGoogle Scholar
  45. Daniele Cerra, Mihai Datcu, and Peter Reinartz. 2014. Authorship analysis based on data compression. Pattern Recogn. Lett. 42 (2014), 79--84. DOI:10.1016/j.patrec.2014.01.019Google ScholarGoogle ScholarCross RefCross Ref
  46. Zdenek Ceska. 2008. Plagiarism detection based on singular value decomposition. In Advances in Natural Language Processing. Springer, 108--119.Google ScholarGoogle Scholar
  47. Man Yan Miranda Chong. 2013. A study on plagiarism detection and plagiarism direction identification using natural language processing techniques. Ph. D Thesis. University of Wolverhampton.Google ScholarGoogle Scholar
  48. Hussain A. Chowdhury and Dhruba K. Bhattacharyya. 2016. Plagiarism: Taxonomy, tools and detection techniques. In Proceedings of the 19th National Convention on Knowledge, Library and Information Networking (NACLIN’16).Google ScholarGoogle Scholar
  49. Daniela Chudá, Jozef Lačný, Maroš Maršalek, Pavel Michalko, and Ján Súkeník. 2013. Plagiarism detection in slovak texts on the web. In Proceedings of the Conference on Plagiarism across Europe and Beyond. 249--260.Google ScholarGoogle Scholar
  50. Guy J. Curtis and Joseph Clare. 2017. How prevalent is contract cheating and to what extent are students repeat offenders? J. Acad. Ethics 15, 2 (2017), 115--124. DOI:10.1007/s10805-017-9278-xGoogle ScholarGoogle ScholarCross RefCross Ref
  51. Guy J. Curtis and Lucia Vardanega. 2016. Is plagiarism changing over time? A 10-year time-lag study with three points of measurement. High. Educ. Res. Dev. 35, 6 (2016), 1167--1179. DOI:10.1080/07294360.2016.1161602Google ScholarGoogle ScholarCross RefCross Ref
  52. Michiel van Dam. 2013. A basic character n-gram approach to authorship verification—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13).Google ScholarGoogle Scholar
  53. Avishek Dan and Pushpak Bhattacharyya. 2013. Cfilt-core: Semantic textual similarity using universal networking language. In Proceedings of the 2nd Joint Conference on Lexical and Computational Semantics (*SEM’13). 216--220.Google ScholarGoogle Scholar
  54. Ali Daud, Wahab Khan, and Dunren Che. 2017. Urdu language processing: a survey. Artif. Intell. Rev. 47, 3 (2017), 279--311. DOI:10.1007/s10462-016-9482-xGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  55. Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, and Richard A. Harshman. 1990. Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41, 6 (1990), 391. DOI:10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9Google ScholarGoogle ScholarCross RefCross Ref
  56. T. Dharani and I. Laurence Aroquiaraj. 2013. A survey on content based image retrieval. In Proceedings of the 2013 International Conference on Pattern Recognition, Informatics and Mobile Engineering. 485--490. DOI:10.1109/ICPRIME.2013.6496719Google ScholarGoogle Scholar
  57. Michal Ďuračík, Emil Kršák, and Patrik Hrkút. 2017. Current trends in source code analysis, plagiarism detection and issues of analysis big datasets. Proc. Eng. 192 (2017), 136--141. DOI:10.1016/j.proeng.2017.06.024Google ScholarGoogle ScholarCross RefCross Ref
  58. Nava Ehsan and Azadeh Shakery. 2016. Candidate document retrieval for cross-lingual plagiarism detection using two-level proximity information. Inf. Process. Manag. 52, 6 (2016), 1004--1017. DOI:10.1016/j.ipm.2016.04.006Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Nava Ehsan and Azadeh Shakery. 2016. A pairwise document analysis approach for monolingual plagiarism detection. In Proceedings of the Forum for Information Retrieval Evaluation (FIRE’16). 145--148.Google ScholarGoogle Scholar
  60. Nava Ehsan, Frank Wm. Tompa, and Azadeh Shakery. 2016. Using a dictionary and n-gram alignment to improve fine-grained cross-language plagiarism detection. In Proceedings of the 2016 ACM Symposium on Document Engineering (DocEng’16). 59--68. DOI:10.1145/2960811.2960817Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Taiseer Abdalla Elfadil Eisa, Naomie Salim, and Salha Alzahrani. 2015. Existing plagiarism detection techniques: A systematic mapping of the scholarly literature. Online Inf. Rev. 39, 3 (2015), 383--400.Google ScholarGoogle ScholarCross RefCross Ref
  62. El-Sayed M. El-Alfy, Radwan E. Abdel-Aal, Wasfi G. Al-Khatib, and Faisal Alvi. 2015. Boosting paraphrase detection through textual similarity metrics with abductive networks. Appl. Soft Comput. 26, (2015), 444--453. DOI:10.1016/j.asoc.2014.10.021Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Victoria Elizalde. 2013. Using statistic and semantic analysis to detect plagiarism—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13).Google ScholarGoogle Scholar
  64. Victoria Elizalde. 2014. Using noun phrases and tf-idf for plagiarized document retrieval—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14).Google ScholarGoogle Scholar
  65. Erik von Elm, Greta Poglia, Bernhard Walder, and Martin R. Tramèr. 2004. Different patterns of duplicate publication: An Analysis of articles used in systematic reviews. JAMA 291, 8 (2004), 974--980. DOI:10.1001/jama.291.8.974Google ScholarGoogle ScholarCross RefCross Ref
  66. Fezeh Esteki and Faramarz Safi Esfahani. 2016. A plagiarism detection approach based on SVM for persian texts. In Proceedings of the Forum for Information Retrieval Evaluation (FIRE’16). 149--153.Google ScholarGoogle Scholar
  67. Asli Eyecioglu and Bill Keller. 2015. Twitter paraphrase identification with simple overlap features and SVMs. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval’15). 64--69.Google ScholarGoogle ScholarCross RefCross Ref
  68. Jody Condit Fagan. 2017. An evidence-based review of academic web search engines, 2014--2016: Implications for librarians’ practice and research agenda. Inf. Technol. Libr. 36, 2 (2017), 7.Google ScholarGoogle Scholar
  69. Christiane Fellbaum. 1998. WordNet: An Electronic Lexical Database (Language, Speech, and Communication). The MIT Press.Google ScholarGoogle Scholar
  70. Vanessa Wei Feng and Graeme Hirst. 2013. Authorship verification with entity coherence and other rich linguistic features—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13).Google ScholarGoogle Scholar
  71. Rafael Ferreira, George D. C. Cavalcanti, Fred Freitas, Rafael Dueire Lins, Steven J. Simske, and Marcelo Riss. 2018. Combining sentence similarities measures to identify paraphrases. Comput. Speech Lang. 47 (2018), 59--73. DOI:10.1016/j.csl.2017.07.002Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Jérémy Ferrero, Frederic Agnes, Laurent Besacier, and Didier Schwab. 2017. CompiLIG at SemEval-2017 Task 1: Cross-language plagiarism detection methods for semantic textual similarity. arXiv:1704.01346.Google ScholarGoogle Scholar
  73. Jérémy Ferrero, Frédéric Agnes, Laurent Besacier, and Didier Schwab. 2017. Using word embedding for cross-language plagiarism detection. arXiv:1702.03082.Google ScholarGoogle Scholar
  74. Jérémy Ferrero, Laurent Besacier, Didier Schwab, and Frédéric Agnes. 2017. Deep investigation of cross-language plagiarism detection methods. arXiv:1705.08828.Google ScholarGoogle Scholar
  75. Tomáš Foltýnek and Irene Glendinning. 2015. Impact of policies for plagiarism in higher education across europe: Results of the project. Acta Univ. Agric. Silvic. Mendel. Brun. 63, 1 (2015), 207--216.Google ScholarGoogle ScholarCross RefCross Ref
  76. Marc Franco-Salvador, Parth Gupta, and Paolo Rosso. 2013. Cross-language plagiarism detection using a multilingual semantic network. In Advances in Information Retrieval. 710--713.Google ScholarGoogle Scholar
  77. Marc Franco-Salvador, Parth Gupta, and Paolo Rosso. 2014. Knowledge graphs as context models: Improving the detection of cross-language plagiarism with paraphrasing. In Bridging Between Information Retrieval and Databases: PROMISE Winter School 2013, Nicola Ferro (ed.). Springer-Verlag, Berlin, 227--236. DOI:10.1007/978-3-642-54798-0_12Google ScholarGoogle Scholar
  78. Marc Franco-Salvador, Parth Gupta, Paolo Rosso, and Rafael E. Banchs. 2016. Cross-language plagiarism detection over continuous-space- and knowledge graph-based representations of language. Knowl.-Based Syst. 111 (2016), 87--99. DOI:10.1016/j.knosys.2016.08.004Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. Marc Franco-Salvador, Paolo Rosso, and Manuel Montes-y-Gómez. 2016. A systematic study of knowledge graph analysis for cross-language plagiarism detection. Inf. Process. Manag. 52, 4 (2016), 550--570. DOI:10.1016/j.ipm.2015.12.004Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Marc Franco-Salvador, Paolo Rosso, and Roberto Navigli. 2014. A knowledge-based representation for cross-language document retrieval and categorization. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics. 414--423.Google ScholarGoogle ScholarCross RefCross Ref
  81. Jordan Fréry, Christine Largeron, and Mihaela Juganaru-Mathieu. 2014. UJM at CLEF in Author Identification—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14).Google ScholarGoogle Scholar
  82. Evgeniy Gabrilovich and Shaul Markovitch. 2007. Computing semantic relatedness using wikipedia-based explicit semantic analysis. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’07). 1606--1611.Google ScholarGoogle Scholar
  83. Jean-Gabriel Ganascia, Peirre Glaudes, and Andrea Del Lungo. 2014. Automatic detection of reuses and citations in literary texts. Lit. Linguist. Comput. 29, 3 (2014), 412--421. DOI:10.1093/llc/fqu020Google ScholarGoogle ScholarCross RefCross Ref
  84. Yasmany García-Mondeja, Daniel Castro-Castro, Vania Lavielle-Castro, and Rafael Muñoz. 2017. Discovering author groups using a b-compact graph-based clustering—notebook for PAN at CLEF 2017. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’17).Google ScholarGoogle Scholar
  85. Urvashi Garg and Vishal Goyal. 2016. Maulik: A plagiarism detection tool for hindi documents. Ind. J. Sci. Technol. 9, 12 (2016).Google ScholarGoogle Scholar
  86. Shahabeddin Geravand and Mahmood Ahmadi. 2014. An efficient and scalable plagiarism checking system using bloom filters. Comput. Electr. Eng. 40, 6 (2014), 1789--1800.Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. M. R. Ghaeini. 2013. Intrinsic author identification using modified weighted KNN—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13).Google ScholarGoogle Scholar
  88. Erfaneh Gharavi, Kayvan Bijari, Kiarash Zahirnia, and Hadi Veisi. 2016. A deep learning approach to persian plagiarism detection. In Proceedings of the Forum for Information Retrieval Evaluation (FIRE’16). 154--159.Google ScholarGoogle Scholar
  89. Lee Gillam. 2013. Guess again and see if they line up: Surrey's runs at plagiarism detection—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13).Google ScholarGoogle Scholar
  90. Bela Gipp. 2014. Citation-based Plagiarism Detection -Detecting Disguised and Cross-language Plagiarism Using Citation Pattern Analysis. Springer Vieweg Research. Retrieved from http://www.springer.com/978-3-658-06393-1.Google ScholarGoogle Scholar
  91. Bela Gipp and Norman Meuschke. 2011. Citation pattern matching algorithms for citation-based plagiarism detection: Greedy citation tiling, citation chunking and longest common citation sequence. In Proceedings of the 11th ACM Symposium on Document Engineering. 249--258. DOI:10.1145/2034691.2034741Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. Bela Gipp, Norman Meuschke, and Joeran Beel. 2011. Comparative evaluation of text- and citation-based plagiarism detection approaches using guttenplag. In Proceedings of 11th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL’11). 255--258. DOI:10.1145/1998076.1998124Google ScholarGoogle ScholarDigital LibraryDigital Library
  93. Bela Gipp, Norman Meuschke, and Corinna Breitinger. 2014. Citation‐based plagiarism detection: Practicability on a large‐scale scientific corpus. J. Assoc. Inf. Sci. Technol. 65, 8 (2014), 1527--1540. DOI:10.1002/asi.23228Google ScholarGoogle ScholarDigital LibraryDigital Library
  94. Bela Gipp, Norman Meuschke, Corinna Breitinger, Jim Pitman, and Andreas Nürnberger. 2014. Web-based demonstration of semantic similarity detection using citation pattern visualization for a cross language plagiarism case. In Proceedings of the International Conference on Enterprise Information Systems (ICEIS’14). 677--683. DOI:10.5220/0004985406770683Google ScholarGoogle ScholarDigital LibraryDigital Library
  95. Goran Glavaš, Marc Franco-Salvador, Simone P. Ponzetto, and Paolo Rosso. 2018. A resource-light method for cross-lingual semantic textual similarity. Knowl.-Based Syst. 143 (2018), 1--9. DOI:10.1016/j.knosys.2017.11.041Google ScholarGoogle ScholarCross RefCross Ref
  96. Lila Gleitman and Anna Papafragou. 2005. Language and thought. In The Cambridge Handbook of Thinking and Reasoning, Keith J. Holyoak and Robert G. Morrison (eds.). Cambridge University Press, 633--661.Google ScholarGoogle Scholar
  97. Demetrios G. Glinos. 2014. A hybrid architecture for plagiarism detection—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14). 958--965.Google ScholarGoogle Scholar
  98. Helena Gómez-Adorno, Yuridiana Alemán, Darnes Vilariño Ayala, Miguel A Sanchez-Perez, David Pinto, and Grigori Sidorov. 2017. Author clustering using hierarchical clustering analysis—Notebook for PAN at CLEF 2017. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’17).Google ScholarGoogle Scholar
  99. Helena Gómez-Adorno, Grigori Sidorov, David Pinto, and Ilia Markov. 2015. A graph based authorship identification approach—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15).Google ScholarGoogle Scholar
  100. Philipp Gross and Pashutan Modaresi. 2014. Plagiarism alignment detection by merging context seeds—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14).Google ScholarGoogle Scholar
  101. Deepa Gupta, Vani Kanjirangat, and L. M. Leema. 2016. Plagiarism detection in text documents using sentence bounded stop word n-grams. J. Eng. Sci. Technol. 11, 10 (2016), 1403--1420.Google ScholarGoogle Scholar
  102. Deepa Gupta, Vani Kanjirangat, and Charan Kamal Singh. 2014. Using natural language processing techniques and fuzzy-semantic similarity for automatic external plagiarism detection. In Proceedings of the 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI’14). 2694--2699. DOI:10.1109/ICACCI.2014.6968314Google ScholarGoogle ScholarCross RefCross Ref
  103. Josue Gutierrez, Jose Casillas, Paola Ledesma, Gibran Fuentes, and Ivan Meza. 2015. Homotopy based classification for author verification task—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15).Google ScholarGoogle Scholar
  104. Yaakov HaCohen-Kerner and Aharon Tayeb. 2017. Rapid detection of similar peer-reviewed scientific papers via constant number of randomized fingerprints. Inf. Process. Manag. 53, 1 (2017), 70--86. DOI:10.1016/j.ipm.2016.06.007Google ScholarGoogle ScholarDigital LibraryDigital Library
  105. Matthias Hagen, Martin Potthast, and Benno Stein. 2015. Source retrieval for plagiarism detection from large web corpora: Recent approaches. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15).Google ScholarGoogle Scholar
  106. Osama Haggag and Samhaa Smhaa El-Beltagy. 2013. Plagiarism candidate retrieval using selective query formulation and discriminative query scoring. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13).Google ScholarGoogle Scholar
  107. Oren Halvani and Lukas Graner. 2017. Author clustering based on compression-based dissimilarity scores—notebook for PAN at CLEF 2017. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’17).Google ScholarGoogle Scholar
  108. Oren Halvani and Martin Steinebach. 2014. VEBAV - A simple, scalable and fast authorship verification scheme—notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14).Google ScholarGoogle Scholar
  109. Oren Halvani, Martin Steinebach, and Ralf Zimmermann. 2013. Authorship verification via k-nearest neighbor estimation—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13).Google ScholarGoogle Scholar
  110. Oren Halvani and Christian Winter. 2015. A generic authorship verification scheme based on equal error rates—notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15).Google ScholarGoogle Scholar
  111. Christian Hänig, Robert Remus, and Xose De La Puente. 2015. Exb themis: Extensive feature extraction from word alignments for semantic textual similarity. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval’15). 264--268.Google ScholarGoogle ScholarCross RefCross Ref
  112. Sarah Harvey. 2014. Author verification using PPM with parts of speech tagging—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14).Google ScholarGoogle Scholar
  113. Hua He, John Wieting, Kevin Gimpel, Jinfeng Rao, and Jimmy Lin. 2016. UMD-TTIC-UW at SemEval-2016 Task 1: Attention-based multi-perspective convolutional neural networks for textual similarity measurement. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval’16). 1103--1108.Google ScholarGoogle ScholarCross RefCross Ref
  114. Oumaima Hourrane and El Habib Benlahmar. 2017. Survey of plagiarism detection approaches and big data techniques related to plagiarism candidate retrieval. In Proceedings of the 2nd International Conference on Big Data, Cloud and Applications (BDCA’17). 15:1--15:6. DOI:10.1145/3090354.3090369Google ScholarGoogle ScholarDigital LibraryDigital Library
  115. Manuela Hürlimann, Benno Weck, Esther van denBerg, Simon Šuster, and Malvina Nissim. 2015. GLAD: Groningen lightweight authorship detection—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15).Google ScholarGoogle Scholar
  116. Syed Fawad Hussain and Asif Suryani. 2015. On retrieving intelligently plagiarized documents using semantic similarity. Eng. Appl. Artif. Intell. 45 (2015), 246--258. DOI:10.1016/j.engappai.2015.07.011Google ScholarGoogle ScholarDigital LibraryDigital Library
  117. Ashraf S. Hussein. 2015. A plagiarism detection system for arabic documents. In Intelligent Systems 2014, D. Filev, J. Jabłkowski, J. Kacprzyk, M. Krawczak, I. Popchev, L. Rutkowski, V. Sgurev, E. Sotirova, P. Szynkarczyk, and S. Zadrozny (Eds.). Springer International Publishing, 541--552.Google ScholarGoogle Scholar
  118. Ashraf S. Hussein. 2015. Arabic document similarity analysis using n-grams and singular value decomposition. In Proceedings of the 2015 IEEE 9th International Conference on Research Challenges in Information Science (RCIS’15). 445--455. DOI:10.1109/RCIS.2015.7128906Google ScholarGoogle ScholarCross RefCross Ref
  119. Radu Tudor Ionescu, Marius Popescu, and Aoife Cahill. 2014. Can characters reveal your native language? A language-independent approach to native language identification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1363--1373.Google ScholarGoogle ScholarCross RefCross Ref
  120. Hideo Itoh. 2016. RICOH at SemEval-2016 Task 1: IR-based semantic textual similarity estimation. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval’16). 691--695.Google ScholarGoogle ScholarCross RefCross Ref
  121. Magdalena Jankowska, Vlado Kešelj, and and Evangelos Milios. 2013. Proximity based one-class classification with common n-gram dissimilarity for authorship verification task—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13).Google ScholarGoogle Scholar
  122. Magdalena Jankowska, Vlado Kešelj, and Evangelos Milios. 2014. Ensembles of proximity-based one-class classifiers for author verification—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14).Google ScholarGoogle Scholar
  123. Arun Jayapal and Binayak Goswami. 2013. Vector space model and overlap metric for author identification—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13).Google ScholarGoogle Scholar
  124. Zhuoren Jiang, Miao Chen, and Xiaozhong Liu. 2014. Semantic annotation with rescoredESA: Rescoring concept features generated from explicit semantic analysis. In Proceedings of the 7th International Workshop on Exploiting Semantic Annotations in Information Retrieval (ESAIR’14). 25--27. DOI:10.1145/2663712.2666192Google ScholarGoogle ScholarDigital LibraryDigital Library
  125. M. A. C. Jiffriya, M. A. C. Akmal Jahan, and Roshan G. Ragel. 2014. Plagiarism detection on electronic text based assignments using vector space model. In Proceedings of the 7th International Conference on Information and Automation for Sustainability. 1--5. DOI:10.1109/ICIAFS.2014.7069593Google ScholarGoogle Scholar
  126. M. A. C. Jiffriya, M. A. C. Akmal Jahan, Roshan G. Ragel, and Sampath Deegalla. 2013. AntiPlag: Plagiarism detection on electronic submissions of text based assignments. In Proceedings of the 2013 IEEE 8th International Conference on Industrial and Information Systems. 376--380. DOI:10.1109/ICIInfS.2013.6732013Google ScholarGoogle ScholarCross RefCross Ref
  127. Patrick Juola. 2017. Detecting contract cheating via stylometric methods. In Proceedings on the Conference on Plagiarism across Europe and Beyond. 187--198. Retrieved from https://plagiarism.pefka.mendelu.cz/files/proceedings17.pdf.Google ScholarGoogle Scholar
  128. Patrick Juola and Efstathios Stamatatos. 2013. Overview of the author identification task at PAN 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13).Google ScholarGoogle Scholar
  129. Rune Borge Kalleberg. 2015. Towards detecting textual plagiarism using machine learning methods. University of Agder. Retrieved from https://brage.bibsys.no/xmlui/bitstream/handle/11250/299460/Rune Borge Kalleberg.pdf?sequence&equals;1.Google ScholarGoogle Scholar
  130. Rafael-Michael Karampatsis. 2015. CDTDS: Predicting paraphrases in twitter via support vector regression. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval’15). 75--79.Google ScholarGoogle ScholarCross RefCross Ref
  131. Daniel Karaś, Martyna Śpiewak, and Piotr Sobecki. 2017. OPI-JSA at CLEF 2017: Author clustering and style breach detection—Notebook for PAN at CLEF 2017. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’17).Google ScholarGoogle Scholar
  132. Roman Kern. 2013. Grammar checker features for author identification and author profiling—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13).Google ScholarGoogle Scholar
  133. Imtiaz H. Khan, Muazzam A. Siddiqui, Kamal M. Jambi, Muhammad Imran, and Abobakr A. Bagais. 2014. Query optimization in Arabic plagiarism detection: An empirical study. Int. J. Intell. Syst. Appl. 7, 1 (2014), 73.Google ScholarGoogle Scholar
  134. Jamal Ahmad Khan. 2017. Style breach detection: An unsupervised detection model—Notebook for PAN at CLEF 2017. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’17).Google ScholarGoogle Scholar
  135. Mahmoud Khonji and Youssef Iraqi. 2014. A Slightly-modified GI-based Author-verifier with Lots of Features (ASGALF)—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14).Google ScholarGoogle Scholar
  136. Khadijeh Khoshnavataher, Vahid Zarrabi, Salar Mohtaj, and Habibollah Asghari. 2015. Developing monolingual persian corpus for extrinsic plagiarism detection using artificial obfuscation—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15).Google ScholarGoogle Scholar
  137. Barbara Kitchenham. 2004. Procedures for performing systematic reviews. Keele University Technical Report TR/SE-0401. Keele University. 33.Google ScholarGoogle Scholar
  138. Barbara Kitchenham, O. Pearl Brereton, David Budgen, Mark Turner, John Bailey, and Stephen Linkman. 2009. Systematic literature reviews in software engineering—A systematic literature review. Inf. Softw. Technol. 51, 1 (2009), 7--15. DOI:10.1016/j.infsof.2008.09.009Google ScholarGoogle ScholarDigital LibraryDigital Library
  139. Mirco Kocher. 2016. UniNE at CLEF 2016: Author clustering—Notebook for PAN at CLEF 2016. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’16).Google ScholarGoogle Scholar
  140. Mirco Kocher and Jacques Savoy. 2015. UniNE at CLEF 2015: Author identification—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15).Google ScholarGoogle Scholar
  141. Mirco Kocher and Jacques Savoy. 2017. UniNE at CLEF 2017: Author clustering—Notebook for PAN at CLEF 2017. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’17).Google ScholarGoogle Scholar
  142. Leilei Kong, Yong Han, Zhongyuan Han, Haihao Yu, Qibo Wang, Tinglei Zhang, and Haoliang Qi. 2014. Source retrieval based on learning to rank and text alignment based on plagiarism type recognition for plagiarism detection—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14).Google ScholarGoogle Scholar
  143. Leilei Kong, Zhimao Lu, Yong Han, Haoliang Qi, Zhongyuan Han, Qibo Wang, Zhenyuan Hao, and Jing Zhang. 2015. Source retrieval and text alignment corpus construction for plagiarism detection—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15).Google ScholarGoogle Scholar
  144. Leilei Kong, Zhimao Lu, Haoliang Qi, and Zhongyuan Han. 2014. Detecting high obfuscation plagiarism: Exploring multi-features fusion via machine learning. Int. J. u-and e-Serv. Sci. Technol. 7, 4 (2014), 385--396.Google ScholarGoogle Scholar
  145. Leilei Kong, Haoliang Qi, Cuixia Du, Mingxing Wang, and Zhongyuan Han. 2013. Approaches for source retrieval and text alignment of plagiarism detection—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13).Google ScholarGoogle Scholar
  146. Moshe Koppel and Yaron Winter. 2014. Determining if two documents are written by the same author. J. Assoc. Inf. Sci. Technol. 65, 1 (2014), 178--187.Google ScholarGoogle ScholarCross RefCross Ref
  147. Niraj Kumar. 2014. A graph based automatic plagiarism detection technique to handle artificial word reordering and paraphrasing. In Computational Linguistics and Intelligent Text Processing. 481--494.Google ScholarGoogle Scholar
  148. Marcin Kuta and Jacek Kitowski. 2014. Optimisation of character n-gram profiles method for intrinsic plagiarism detection. In Artificial Intelligence and Soft Computing. 500--511.Google ScholarGoogle Scholar
  149. Mikhail Kuznetsov, Anastasia Motrenko, Rita Kuznetsova, and Vadim Strijov. 2016. Methods for intrinsic plagiarism detection and author diarization. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’16). 912--919. Retrieved from http://ceur-ws.org/Vol-1609/.Google ScholarGoogle Scholar
  150. Robert Layton, Paul Watters, and Richard Dazeley. 2013. Local n-grams for author identification—notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13).Google ScholarGoogle Scholar
  151. Paola Ledesma, Gibran Fuentes, Gabriela Jasso, Angel Toledo, and and Ivan Meza. 2013. Distance learning for author verification—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13).Google ScholarGoogle Scholar
  152. Taemin Lee, Jeongmin Chae, Kinam Park, and Soonyoung Jung. 2013. CopyCaptor: Plagiarized source retrieval system using global word frequency and local feedback—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13).Google ScholarGoogle Scholar
  153. Chi-kiu Lo, Cyril Goutte, and Michel Simard. 2016. CNRC at SemEeval-2016 task 1: Experiments in crosslingual semantic textual similarity. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval’16). 668--673.Google ScholarGoogle Scholar
  154. Tara C. Long, Mounir Errami, Angela C. George, Zhaohui Sun, and Harold R. Garner. 2009. Responding to possible plagiarism. Science 323, 5919 (2009), 1293--1294. DOI:10.1126/science.1167408Google ScholarGoogle Scholar
  155. Ahmed Magooda, Ashraf Y. Mahgoub, Mohsen Rashwan, Magda B. Fayek, and Hazem Raafat. 2015. RDI System for extrinsic plagiarism detection (RDI_RED)—Working Notes for PAN-AraPlagDet at FIRE 2015. In Proceedings of the Forum for Information Retrieval Evaluation (FIRE’15).Google ScholarGoogle Scholar
  156. Peyman Mahdavi, Zahra Siadati, and Farzin Yaghmaee. 2014. Automatic external persian plagiarism detection using vector space model. In Proceedings of the 2014 4th International eConference on Computer and Knowledge Engineering (ICCKE’14). 697--702.Google ScholarGoogle ScholarCross RefCross Ref
  157. Ashraf Y. Mahgoub, Ahmed Magooda, Mohsen Rashwan, Magda B. Fayek, and Hazem Raafat. 2015. RDI System for intrinsic plagiarism detection (RDI_RID)—Working Notes for PAN-AraPlagDet at FIRE 2015. In Proceedings of the Forum for Information Retrieval Evaluation (FIRE’15).Google ScholarGoogle Scholar
  158. Promita Maitra, Souvick Ghosh, and Dipankar Das. 2015. Authorship verification - an approach based on random forest—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15).Google ScholarGoogle Scholar
  159. Cristhian Mayor, Josue Gutierrez, Angel Toledo, Rodrigo Martinez, Paola Ledesma, Gibran Fuentes, and and Ivan Meza. 2014. A single author style representation for the author verification task—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14).Google ScholarGoogle Scholar
  160. Norman Meuschke and Bela Gipp. 2013. State-of-the-art in detecting academic plagiarism. Int. J. Educ. Integr. 9, 1 (2013), 50--71.Google ScholarGoogle ScholarCross RefCross Ref
  161. Norman Meuschke and Bela Gipp. 2014. Reducing computational effort for plagiarism detection by using citation characteristics to limit retrieval space. In Proceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries. 197--200.Google ScholarGoogle ScholarDigital LibraryDigital Library
  162. Norman Meuschke, Christopher Gondek, Daniel Seebacher, Corinna Breitinger, Daniel A. Keim, and Bela Gipp. 2018. An adaptive image-based plagiarism detection approach. In Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL’18). DOI:10.1145/3197026.3197042Google ScholarGoogle ScholarDigital LibraryDigital Library
  163. Norman Meuschke, Moritz Schubotz, Felix Hamborg, Tomáš Skopal, and Bela Gipp. 2017. Analyzing mathematical content to detect academic plagiarism. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (CIKM’17). 2211--2214. DOI:10.1145/3132847.3133144Google ScholarGoogle ScholarDigital LibraryDigital Library
  164. Norman Meuschke, Nicolas Siebeck, Moritz Schubotz, and Bela Gipp. 2017. Analyzing semantic concept patterns to detect academic plagiarism. In Proceedings of the 6th International Workshop on Mining Scientific Publications (WOSP’17). 46--53. DOI:10.1145/3127526.3127535Google ScholarGoogle ScholarDigital LibraryDigital Library
  165. Norman Meuschke, Vincent Stange, Moritz Schubotz, Michael Kramer, and Bela Gipp. 2019. Improving academic plagiarism detection for STEM documents by analyzing mathematical content and citations. In Proceeedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL’19).Google ScholarGoogle ScholarDigital LibraryDigital Library
  166. Pashutan Modaresi and Philipp Gross. 2014. A language independent author verifier using fuzzy c-means clustering—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14).Google ScholarGoogle Scholar
  167. H. F. Moed, W. J. M. Burger, J. G. Frankfort, and A. F. J. Van Raan. 1985. The application of bibliometric indicators: Important field- and time-dependent factors to be considered. Scientometrics 8, 3--4 (1985), 177--203. DOI:10.1007/BF02016935Google ScholarGoogle ScholarCross RefCross Ref
  168. Majid Mohebbi and Alireza Talebpour. 2016. Texts semantic similarity detection based graph approach. Int. Arab J. Inf. Technol. 13, 2 (2016), 246--251.Google ScholarGoogle Scholar
  169. Mozhgan Momtaz, Kayvan Bijari, Mostafa Salehi, and Hadi Veisi. 2016. Graph-based approach to text alignment for plagiarism detection in persian documents. In Proceedings of the Forum for Information Retrieval Evaluation (FIRE’16). 176--179.Google ScholarGoogle Scholar
  170. Erwan Moreau, Arun Jayapal, and Carl Vogel. 2014. Author verification: exploring a large set of parameters using a genetic algorithm—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14).Google ScholarGoogle Scholar
  171. Erwan Moreau, Arun Jayapal, Gerard Lynch, and Carl Vogel. 2015. Author verification: Basic stacked generalization applied to predictions from a set of heterogeneous learners—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15).Google ScholarGoogle Scholar
  172. Erwan Moreau and Carl Vogel. 2013. Style-based distance features for author verification—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13).Google ScholarGoogle Scholar
  173. Maxim Mozgovoy, Tuomo Kakkonen, and Georgina Cosma. 2010. Automatic student plagiarism detection: Future perspectives. J. Educ. Comput. Res. 43, 4 (2010), 511--531.Google ScholarGoogle ScholarCross RefCross Ref
  174. Aibek Musaev, De Wang, Saajan Shridhar, and Calton Pu. 2015. Fast text classification using randomized explicit semantic analysis. In Proceedings of the 2015 IEEE International Conference on Information Reuse and Integration. 364--371. DOI:10.1109/IRI.2015.62Google ScholarGoogle ScholarDigital LibraryDigital Library
  175. El Moatez Billah Nagoudi, Ahmed Khorsi, Hadda Cherroun, and Didier Schwab. 2018. 2L-APD: A Two-Level Plagiarism Detection System for Arabic Documents. Cybern. Inf. Technol. 18, 1 (2018), 124--138. DOI:10.2478/cait-2018-0011Google ScholarGoogle Scholar
  176. Rao Muhammad Adeel Nawab, Mark Stevenson, and Paul Clough. 2017. An IR-based approach utilizing query expansion for plagiarism detection in MEDLINE. IEEE/ACM Trans. Comput. Biol. Bioinforma. 14, 4 (2017), 796--804. DOI:10.1109/TCBB.2016.2542803Google ScholarGoogle ScholarDigital LibraryDigital Library
  177. Philip M. Newton. 2018. How common is commercial contract cheating in higher education and is it increasing? A Systematic Review. Front. Educ. 3 (2018). DOI:10.3389/feduc.2018.00067Google ScholarGoogle Scholar
  178. Le Thanh Nguyen, Nguyen Xuan Toan, and Dinh Dien. 2016. Vietnamese plagiarism detection method. In Proceedings of the 7th Symposium on Information and Communication Technology (SoICT’16). 44--51. DOI:10.1145/3011077.3011109Google ScholarGoogle ScholarDigital LibraryDigital Library
  179. Gabriel Oberreuter and Juan D. VeláSquez. 2013. Text mining applied to plagiarism detection: The use of words for detecting deviations in the writing style. Exp. Syst. Appl. 40, 9 (2013), 3756--3763.Google ScholarGoogle ScholarDigital LibraryDigital Library
  180. Milan Ojsteršek, Janez Brezovnik, Mojca Kotar, Marko Ferme, Goran Hrovat, Albin Bregant, and Mladen Borovič. 2014. Establishing of a slovenian open access infrastructure: A technical point of view. Program 48, 4 (2014), 394--412. DOI:10.1108/PROG-02-2014-0005Google ScholarGoogle ScholarCross RefCross Ref
  181. Adeva Oktoveri, Agung Toto Wibowo, and Ari Moesriami Barmawi. 2014. Non-relevant document reduction in anti-plagiarism using asymmetric similarity and AVL tree index. In Proceedings of the 2014 5th International Conference on Intelligent and Advanced Systems (ICIAS’14). 1--5. DOI:10.1109/ICIAS.2014.6869547Google ScholarGoogle ScholarCross RefCross Ref
  182. Ahmed Hamza Osman and Naomie Salim. 2013. An improved semantic plagiarism detection scheme based on Chi-squared automatic interaction detection. In Proceedings of the 2013 International Conference on Computing, Electrical and Electronic Engineering (ICCEEE’13). 640--647. DOI:10.1109/ICCEEE.2013.6634015Google ScholarGoogle ScholarCross RefCross Ref
  183. Caleb Owens and Fiona A. White. 2013. A 5‐year systematic strategy to reduce plagiarism among first‐year psychology university students. Aust. J. Psychol. 65, 1 (2013), 14--21. DOI:10.1111/ajpy.12005Google ScholarGoogle ScholarCross RefCross Ref
  184. María Leonor Pacheco, Kelwin Fernandes, and Aldo Porco. 2015. Random forest with increased generalization: A universal background approach for authorship verification—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15).Google ScholarGoogle Scholar
  185. Yurii Palkovskii and Alexei Belov. 2013. Using hybrid similarity methods for plagiarism detection—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13).Google ScholarGoogle Scholar
  186. Yurii Palkovskii and Alexei Belov. 2014. Developing high-resolution universal multi-type n-gram plagiarism detector. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14). 984--989.Google ScholarGoogle Scholar
  187. Guy Paré, Marie-Claude Trudel, Mirou Jaana, and Spyros Kitsiou. 2015. Synthesizing information systems knowledge: A typology of literature reviews. Inf. Manag. 52, 2 (2015), 183--199. DOI:10.1016/j.im.2014.08.008Google ScholarGoogle ScholarCross RefCross Ref
  188. Merin Paul and Sangeetha Jamal. 2015. An Improved SRL based plagiarism detection technique using sentence ranking. Procedia Comput. Sci. 46 (2015), 223--230. DOI:10.1016/j.procs.2015.02.015Google ScholarGoogle ScholarDigital LibraryDigital Library
  189. Ellie Pavlick, Pushpendre Rastogi, Juri Ganitkevitch, Benjamin Van Durme, and Chris Callison-Burch. 2015. PPDB 2.0: Better paraphrase ranking, fine-grained entailment relations, word embeddings, and style classification. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. 425--430.Google ScholarGoogle ScholarCross RefCross Ref
  190. Jian Peng, Kim-Kwang Raymond Choo, and Helen Ashman. 2016. Bit-level n-gram based forensic authorship analysis on social media: Identifying individuals from linguistic profiles. J. Netw. Comput. Appl. 70 (2016), 171--182. DOI:10.1016/j.jnca.2016.04.001Google ScholarGoogle ScholarDigital LibraryDigital Library
  191. Solange de L. Pertile, Viviane P. Moreira, and Paolo Rosso. 2015. Comparing and combining content‐ and citation‐based approaches for plagiarism detection. J. Assoc. Inf. Sci. Technol. 67, 10 (2015), 2511--2526. DOI:10.1002/asi.23593Google ScholarGoogle ScholarDigital LibraryDigital Library
  192. Solange de L. Pertile, Paolo Rosso, and Viviane P. Moreira. 2013. Counting co-occurrences in citations to identify plagiarised text fragments. In Proceedings of the International Conference of the Cross-Language Evaluation Forum for European Languages. 150--154.Google ScholarGoogle Scholar
  193. Timo Petmanson. 2013. Authorship identification using correlations of frequent features—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13).Google ScholarGoogle Scholar
  194. Mohammad Taher Pilehvar, David Jurgens, and Roberto Navigli. 2013. Align, disambiguate and walk: A unified approach for measuring semantic similarity. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1341--1351.Google ScholarGoogle Scholar
  195. Gaspar Pizarro V. and Juan D. Velásquez. 2017. Docode 5: Building a real-world plagiarism detection system. Eng. Appl. Artif. Intell. 64 (Jun. 2017), 261--271. DOI:10.1016/j.engappai.2017.06.001Google ScholarGoogle ScholarDigital LibraryDigital Library
  196. Juan-Pablo Posadas-Durán, Grigori Sidorov, Ildar Batyrshin, and Elibeth Mirasol-Meléndez. 2015. Author verification using syntactic n-grams—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15).Google ScholarGoogle Scholar
  197. Martin Potthast, Tim Gollub, Matthias Hagen, Martin Tippmann, Johannes Kiesel, Paolo Rosso, Efstathios Stamatatos, and Benno Stein. 2013. Overview of the 5th International Competition on Plagiarism Detection. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13).Google ScholarGoogle Scholar
  198. Martin Potthast, Matthias Hagen, Anna Beyer, Matthias Busse, Martin Tippmann, Paolo Rosso, and Benno Stein. 2014. Overview of the 6th International Competition on Plagiarism Detection. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14).Google ScholarGoogle Scholar
  199. Martin Potthast, Matthias Hagen, and Benno Stein. 2016. Author Obfuscation: Attacking the state of the art in authorship verification. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’16).Google ScholarGoogle Scholar
  200. Martin Potthast, Francisco Rangel, Michael Tschuggnall, Efstathios Stamatatos, Paolo Rosso, and Benno Stein. 2017. Overview of PAN’17: Author identification, author profiling, and author obfuscation. In Proceedings of the 7th International Conference of the CLEF Initiative. DOI:10.1007/978-3-319-65813-1_25Google ScholarGoogle Scholar
  201. Martin Potthast, Benno Stein, Alberto Barrón-Cedeño, and Paolo Rosso. 2010. An Evaluation framework for plagiarism detection. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters (COLING’10). 997--1005.Google ScholarGoogle ScholarDigital LibraryDigital Library
  202. Martin Potthast, Benno Stein, Andreas Eiselt, Alberto Barrón-Cedeño, and Paolo Rosso. 2009. Overview of the 1st international competition on plagiarism detection. In Proceedings of the SEPLN 09 Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN’09). 1--9.Google ScholarGoogle Scholar
  203. Amit Prakash and Sujan Kumar Saha. 2014. Experiments on document chunking and query formation for plagiarism source retrieval—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14).Google ScholarGoogle Scholar
  204. Piotr Przybyła, Nhung T. H. Nguyen, Matthew Shardlow, Georgios Kontonatsios, and Sophia Ananiadou. 2016. NaCTeM at SemEval-2016 Task 1: Inferring sentence-level semantic similarity from an ensemble of complementary lexical and sentence-level features. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval’16). 614--620.Google ScholarGoogle ScholarCross RefCross Ref
  205. Javad Rafiei, Salar Mohtaj, Vahid Zarrabi, and Habibollah Asghari. 2015. Source retrieval plagiarism detection based on noun phrase and keyword phrase extraction—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15).Google ScholarGoogle Scholar
  206. Shima Rakian, Esfahani Faramarz Safi, and Hamid Rastegari. 2015. A Persian fuzzy plagiarism detection approach. J. Inf. Syst. Telecommun. 3, 3 (2015), 182--190.Google ScholarGoogle Scholar
  207. N Riya Ravi and Deepa Gupta. 2015. Efficient paragraph based chunking and download filtering for plagiarism source retrieval—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15).Google ScholarGoogle Scholar
  208. N. Riya Ravi, Vani Kanjirangat, and Deepa Gupta. 2016. Exploration of fuzzy C means clustering algorithm in external plagiarism detection system. In Intelligent Systems Technologies and Applications. Springer, 127--138.Google ScholarGoogle Scholar
  209. Andi Rexha, Stefan Klampfl, Mark Kröll, and Roman Kern. 2015. Towards authorship attribution for bibliometrics using stylometric features. In Proceedings of the Conference on Computational Linguistics and Bibliometrics co-located with the International Conference on Scientometrics and Informetrics (CLBib@ ISSI). 44--49.Google ScholarGoogle Scholar
  210. Diego Antonio Rodríguez Torrejón and José Manuel Martín Ramos. 2014. CoReMo 2.3 Plagiarism detector text alignment module—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14).Google ScholarGoogle Scholar
  211. Paolo Rosso, Francisco Rangel, Martin Potthast, Efstathios Stamatatos, Michael Tschuggnall, and Benno Stein. 2016. Overview of PAN’16. In Experimental IR Meets Multilinguality, Multimodality, and Interaction. 332--350.Google ScholarGoogle Scholar
  212. Frantz Rowe. 2014. What literature review is not: Diversity, boundaries and recommendations. Eur. J. Inf. Syst. 23, 3 (2014), 241--255. DOI:10.1057/ejis.2014.7Google ScholarGoogle ScholarCross RefCross Ref
  213. Barbara Rychalska, Katarzyna Pakulska, Krystyna Chodorowska, Wojciech Walczak, and Piotr Andruszkiewicz. 2016. Samsung Poland NLP Team at SemEval-2016 Task 1: Necessity for diversity; combining recursive autoencoders, WordNet and ensemble methods to measure semantic similarity. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval’16). 602--608.Google ScholarGoogle ScholarCross RefCross Ref
  214. Kamil Safin and Rita Kuznetsova. 2017. Style breach detection with neural sentence embeddings—Notebook for PAN at CLEF 2017. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’17).Google ScholarGoogle Scholar
  215. Anuj Saini and Aayushi Verma. 2016. Anuj@ DPIL-FIRE2016: a novel paraphrase detection method in hindi language using machine learning. In Proceedings of the Forum for Information Retrieval Evaluation. 141--152.Google ScholarGoogle Scholar
  216. Miguel A. Sanchez-Perez, Alexander Gelbukh, and Grigori Sidorov. 2015. Dynamically adjustable approach through obfuscation type recognition—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15).Google ScholarGoogle Scholar
  217. Miguel A Sanchez-Perez, Grigori Sidorov, and Alexander F Gelbukh. 2014. A winning approach to text alignment for text reuse detection at PAN 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14). 1004--1011.Google ScholarGoogle Scholar
  218. Fernando Sánchez-Vega, Esaú Villatoro-Tello, Manuel Montes-y-Gómez, Luis Villaseñor-Pineda, and Paolo Rosso. 2013. Determining and characterizing the reused text for plagiarism detection. J. Assoc. Inf. Sci. Technol. 65, 5 (2013), 1804--1813. DOI:10.1016/j.eswa.2012.09.021Google ScholarGoogle ScholarDigital LibraryDigital Library
  219. Yunita Sari and Mark Stevenson. 2015. A machine learning-based intrinsic method for cross-topic and cross-genre authorship verification—Notebook for PAN at CLEF 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15).Google ScholarGoogle Scholar
  220. Yunita Sari and Mark Stevenson. 2016. Exploring word embeddings and character n-grams for author clustering—Notebook for PAN at CLEF 2016. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’16).Google ScholarGoogle Scholar
  221. Satyam, Anand, Arnav Kumar Dawn, and and Sujan Kumar Saha. 2014. Statistical analysis approach to author identification using latent semantic analysis—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14).Google ScholarGoogle Scholar
  222. Taneeya Satyapanich, Hang Gao, and Tim Finin. 2015. Ebiquity: Paraphrase and semantic similarity in twitter using skipgrams. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval’15). 51--55.Google ScholarGoogle ScholarCross RefCross Ref
  223. Andreas Schmidt, Reinhold Becker, Daniel Kimmig, Robert Senger, and Steffen Scholz. 2014. A concept for plagiarism detection based on compressed bitmaps. In Procceedings of the 6th International Conference on Advances in Databases, Knowledge, and Data Applications. 30--34.Google ScholarGoogle Scholar
  224. Shachar Seidman. 2013. Authorship verification using the impostors method—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF13).Google ScholarGoogle Scholar
  225. Prasha Shrestha, Suraj Maharjan, and Thamar Solorio. 2014. Machine translation evaluation metric for text alignment—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14).Google ScholarGoogle Scholar
  226. Prasha Shrestha and Thamar Solorio. 2013. Using a variety of n-grams for the detection of different kinds of plagiarism. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13).Google ScholarGoogle Scholar
  227. Muazzam Ahmed Siddiqui, Imtiaz Hussain Khan, Kamal Mansoor Jambi, Salma Omar Elhaj, and Abobakr Bagais. 2014. Developing an arabic plagiarism detection corpus. Comput. Sci. Inf. Technol. 4, 2014 (2014), 261--269. DOI:10.5121/csit.2014.41221Google ScholarGoogle Scholar
  228. L. Sindhu and Sumam Mary Idicula. 2015. Fingerprinting based detection system for identifying plagiarism in malayalam text documents. In Proceedings of the 2015 International Conference on Computing and Network Communications (CoCoNet’15). 553--558. DOI:10.1109/CoCoNet.2015.7411242Google ScholarGoogle ScholarCross RefCross Ref
  229. Abdul Sittar, Hafiz Rizwan Iqbal, and Rao Muhammad Adeel Nawab. 2016. Author diarization using cluster-distance approach. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’16). 1000--1007.Google ScholarGoogle Scholar
  230. Sidik Soleman and Ayu Purwarianti. 2014. Experiments on the Indonesian plagiarism detection using latent semantic analysis. In Proceedings of the 2014 2nd International Conference on Information and Communication Technology (ICoICT’14). 413--418. DOI:10.1109/ICoICT.2014.6914098Google ScholarGoogle ScholarCross RefCross Ref
  231. Hussein Soori, Michal Prilepok, Jan Platos, Eshetie Berhan, and Vaclav Snasel. 2014. Text similarity based on data compression in Arabic. In AETA 2013: Recent Advances in Electrical Engineering and Related Sciences. Springer, 211--220.Google ScholarGoogle Scholar
  232. Efstathios Stamatatos, Walter Daelemans, Ben Verhoeven, Martin Potthast, Benno Stein, Patrick Juola, Miguel A. Sanchez-Perez, and Alberto Barrón-Cedeño. 2014. Overview of the author identification task at PAN 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14).Google ScholarGoogle Scholar
  233. Efstathios Stamatatos, Martin Potthast, Francisco Rangel, Paolo Rosso, and Benno Stein. 2015. Overview of the PAN/CLEF 2015 Evaluation Lab. In Experimental IR Meets Multilinguality, Multimodality, and Interaction: Proceedings of the 6th International Conference of the CLEF Initiative (CLEF’15). 518--538. DOI:10.1007/978-3-319-24027-5_49Google ScholarGoogle ScholarDigital LibraryDigital Library
  234. Efstathios Stamatatos, Walter Daelemans Ben Verhoeven, Patrick Juola, Aurelio López-López, Martin Potthast, and Benno Stein. 2015. Overview of the author identification task at PAN 2015. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15).Google ScholarGoogle Scholar
  235. Benno Stein, Sven zu Eissen, and Martin Potthast. 2007. Strategies for retrieving plagiarized documents. In Proceedings of the 30th Annual International ACM SIGIR Conference. 825--826. DOI:10.1145/1277741.1277928Google ScholarGoogle ScholarDigital LibraryDigital Library
  236. Imam Much Ibnu Subroto and Ali Selamat. 2014. Plagiarism detection through internet using hybrid artificial neural network and support vectors machine. Telecommun. Comput. Electron. Control. 12, 1 (2014), 209--218.Google ScholarGoogle Scholar
  237. Šimon Suchomel and Michal Brandejs. 2014. Heterogeneous queries for synoptic and phrasal search—Notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14).Google ScholarGoogle Scholar
  238. Šimon Suchomel and Michal Brandejs. 2015. Improving synoptic querying for source retrieval. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’15).Google ScholarGoogle Scholar
  239. Šimon Suchomel, Jan Kasprzak, and Michal Brandejs. 2013. Diverse queries and feature type selection for plagiarism discovery—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13).Google ScholarGoogle Scholar
  240. M. D. Arafat Sultan, Steven Bethard, and Tamara Sumner. 2014. DLS@CU: Sentence similarity from word alignment. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval’14). 241--246.Google ScholarGoogle ScholarCross RefCross Ref
  241. M. D. Arafat Sultan, Steven Bethard, and Tamara Sumner. 2014. Back to basics for monolingual alignment: Exploiting word similarity and contextual evidence. Trans. Assoc. Comput. Linguist. 2 (2014), 219--230.Google ScholarGoogle ScholarCross RefCross Ref
  242. M. D. Arafat Sultan, Steven Bethard, and Tamara Sumner. 2015. DLS@CU: Sentence similarity from word alignment and semantic vector composition. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval’15). 148--153.Google ScholarGoogle ScholarCross RefCross Ref
  243. Junfeng Tian and Man Lan. 2016. ECNU at SemEval-2016 Task 1: Leveraging word embedding from macro and micro views to boost performance for semantic textual similarity. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval’16). 621--627.Google ScholarGoogle ScholarCross RefCross Ref
  244. Diego A. Rodríguez Torrejón and José Manuel Martín Ramos. 2013. Text alignment module in CoReMo 2.1 plagiarism detector. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13).Google ScholarGoogle Scholar
  245. Michael Tschuggnall and Günther Specht. 2013. Detecting plagiarism in text documents through grammar-analysis of authors. Datenbanksysteme für Business, Technologie und Web (BTW) 2028, Volker Markl, Gunter Saake, Kai-Uwe Sattler, Gregor Hackenbroich, Bernhard Mitschang, Theo Härder, and Veit Köppen (Eds.). Gesellschaft für Informatik e.V., 241--259.Google ScholarGoogle Scholar
  246. Michael Tschuggnall and Günther Specht. 2013. Using grammar-profiles to intrinsically expose plagiarism in text documents. In Natural Language Processing and Information Systems. 297--302.Google ScholarGoogle Scholar
  247. Michael Tschuggnall, Efstathios Stamatatos, Ben Verhoeven, Walter Daelemans, Günther Specht, Benno Stein, and Martin Potthast. 2017. Overview of the author identification task at PAN-2017: Style breach detection and author clustering. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’17).Google ScholarGoogle Scholar
  248. Alper Kursat Uysal and Serkan Gunal. 2014. Text classification using genetic algorithm oriented latent semantic features. Exp. Syst. Appl. 41, 13 (2014), 5938--5947. DOI:10.1016/j.eswa.2014.03.041Google ScholarGoogle ScholarCross RefCross Ref
  249. Vani Kanjirangat and Deepa Gupta. 2014. Using K-means cluster based techniques in external plagiarism detection. In Proceedings of the 2014 International Conference on Contemporary Computing and Informatics (IC3I’14). 1268--1273. DOI:10.1109/IC3I.2014.7019659Google ScholarGoogle Scholar
  250. Vani Kanjirangat and Deepa Gupta. 2015. Investigating the impact of combined similarity metrics and POS tagging in extrinsic text plagiarism detection system. In Proceedings of the 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI’15). 1578--1584. DOI:10.1109/ICACCI.2015.7275838Google ScholarGoogle Scholar
  251. Vani Kanjirangat and Deepa Gupta. 2016. Study on extrinsic text plagiarism detection techniques and tools. J. Eng. Sci. Technol. Rev. 9, 5 (2016), 9--23.Google ScholarGoogle ScholarCross RefCross Ref
  252. Vani Kanjirangat and Deepa Gupta. 2017. Detection of idea plagiarism using syntax--semantic concept extractions with genetic algorithm. Exp. Syst. Appl. 73 (2017), 11--26. DOI:10.1016/j.eswa.2016.12.022Google ScholarGoogle ScholarCross RefCross Ref
  253. Vani Kanjirangat and Deepa Gupta. 2017. Identifying document-level text plagiarism: A two-phase approach. J. Eng. Sci. Technol. 12, 12 (2017), 3226--3250.Google ScholarGoogle Scholar
  254. Vani Kanjirangat and Deepa Gupta. 2017. Text plagiarism classification using syntax based linguistic features. Exp. Syst. Appl. 88 (2017), 448--464. DOI:10.1016/j.eswa.2017.07.006Google ScholarGoogle ScholarDigital LibraryDigital Library
  255. Anna Vartapetiance and Lee Gillam. 2013. A textual modus operandi: surrey's simple system for author identification—notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13).Google ScholarGoogle Scholar
  256. Juan D Velásquez, Yerko Covacevich, Francisco Molina, Edison Marrese-Taylor, Cristián Rodríguez, and Felipe Bravo-Marquez. 2016. DOCODE 3.0 (DOcument COpy DEtector): A system for plagiarism detection by applying an information fusion process from multiple documental data sources. Inf. Fus. 27 (2016), 64--75. DOI:10.1016/j.inffus.2015.05.006Google ScholarGoogle ScholarDigital LibraryDigital Library
  257. Ondřej Veselý, Tomáš Foltýnek, and Jiří Rybička. 2013. Source retrieval via naïve approach and passage selection heuristics—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13).Google ScholarGoogle Scholar
  258. Darnes Vilariño, David Pinto, Helena Gómez, Saúl León, and Esteban Castillo. 2013. Lexical-syntactic and graph-based features for authorship verification—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13).Google ScholarGoogle Scholar
  259. Ngoc Phuoc An Vo, Octavian Popescu, and Tommaso Caselli. 2014. FBK-TR: SVM for semantic relatedeness and corpus patterns for RTE. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval’14). 289--293.Google ScholarGoogle Scholar
  260. Hai Hieu Vu, Jeanne Villaneau, Farida Saïd, and Pierre-François Marteau. 2014. Sentence similarity by combining explicit semantic analysis and overlapping N-grams. In Text, Speech and Dialogue. 201--208.Google ScholarGoogle Scholar
  261. Elizabeth Wager. 2014. Defining and responding to plagiarism. Learn. Publ. 27, 1 (2014), 33--42. DOI:10.1087/20140105Google ScholarGoogle ScholarCross RefCross Ref
  262. Wafa Wali, Bilel Gargouri, and Abdelmajid Ben Hamadou. 2015. Supervised learning to measure the semantic similarity between arabic sentences. In Computational Collective Intelligence. 158--167.Google ScholarGoogle Scholar
  263. John Walker. 1998. Student plagiarism in universities: What are we doing about it? High. Educ. Res. Dev. 17, 1 (1998), 89--106. DOI:10.1080/0729436980170105Google ScholarGoogle ScholarCross RefCross Ref
  264. Shuai Wang, Haoliang Qi, Leilei Kong, and Cuixia Nu. 2013. Combination of VSM and jaccard coefficient for external plagiarism detection. In Proceedings of the 2013 International Conference on Machine Learning and Cybernetics. 1880--1885. DOI:10.1109/ICMLC.2013.6890902Google ScholarGoogle Scholar
  265. Debora Weber-Wulff. 2014. False feathers: A Perspective on Academic Plagiarism. Springer, Berlin.Google ScholarGoogle Scholar
  266. Debora Weber-Wulff, Christopher Möller, Jannis Touras, and Elin Zincke. 2013. Plagiarism Detection Software Test 2013. Retrieved from http://plagiat.htw-berlin.de/wp-content/uploads/Testbericht-2013-color.pdf.Google ScholarGoogle Scholar
  267. Agung Toto Wibowo, Kadek W. Sudarmadi, and Ari M. Barmawi. 2013. Comparison between fingerprint and winnowing algorithm to detect plagiarism fraud on Bahasa Indonesia documents. In Proceedings of the 2013 International Conference of Information and Communication Technology (ICoICT’13). 128--133. DOI:10.1109/ICoICT.2013.6574560Google ScholarGoogle Scholar
  268. Kyle Williams, Hung-Hsuan Chen, Sagnik Ray Chowdhury, and C. Lee Giles. 2013. Unsupervised ranking for plagiarism source retrieval—Notebook for PAN at CLEF 2013. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’13).Google ScholarGoogle Scholar
  269. Kyle Williams, Hung-Hsuan Chen, and C. Lee Giles. 2014. Classifying and ranking search engine results as potential sources of plagiarism. In Proceedings of the 2014 ACM Symposium on Document Engineering (DocEng’14). 97--106. DOI:10.1145/2644866.2644879Google ScholarGoogle ScholarDigital LibraryDigital Library
  270. Kyle Williams, Hung-Hsuan Chen, and C. Lee Giles. 2014. Supervised ranking for plagiarism source retrieval—notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14).Google ScholarGoogle Scholar
  271. Xuchen Yao, Benjamin Van Durme, Chris Callison-Burch, and Peter Clark. 2013. A lightweight and high performance monolingual word aligner. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 702--707.Google ScholarGoogle Scholar
  272. Takeru Yokoi. 2015. Sentence-based plagiarism detection for japanese document based on common nouns and part-of-speech structure. In Intelligent Software Methodologies, Tools and Techniques. 297--308.Google ScholarGoogle Scholar
  273. Guido Zarrella, John Henderson, Elizabeth M. Merkhofer, and Laura Strickhart. 2015. MITRE: Seven systems for semantic similarity in tweets. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval’15). 12--17.Google ScholarGoogle ScholarCross RefCross Ref
  274. Chunxia Zhang, Xindong Wu, Zhendong Niu, and Wei Ding. 2014. Authorship identification from unstructured texts. Knowl.-Based Syst. 66 (2014), 99--111. DOI:10.1016/j.knosys.2014.04.025Google ScholarGoogle ScholarDigital LibraryDigital Library
  275. Jiang Zhao and Man Lan. 2015. Ecnu: Leveraging word embeddings to boost performance for paraphrase in twitter. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval’15). 34--39.Google ScholarGoogle ScholarCross RefCross Ref
  276. Valentin Zmiycharov, Dimitar Alexandrov, Hristo Georgiev, Yasen Kiprov, Georgi Georgiev, Ivan Koychev, and Preslav Nakov. 2016. Experiments in authorship-link ranking and complete author clustering—Notebook for PAN at CLEF 2016. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’16).Google ScholarGoogle Scholar
  277. Sven Meyer Zu Eissen and Benno Stein. 2006. Intrinsic plagiarism detection. In Proceedings of the European Conference on Information Retrieval. 565--569.Google ScholarGoogle ScholarDigital LibraryDigital Library
  278. Denis Zubarev and Ilya Sochenkov. 2014. Using sentence similarity measure for plagiarism source retrieval—notebook for PAN at CLEF 2014. In Proceedings of the Conference and Labs of the Evaluation Forum and Workshop (CLEF’14).Google ScholarGoogle Scholar
  279. Teddi Fishman. 2009. We know it when we see it' is not good enough: Toward a standard definition of plagiarism that transcends theft, fraud, and copyright. In Proceedings 4th Asia Pacific Conference on Educational Integrity (4APCEI'09). 5.Google ScholarGoogle Scholar

Index Terms

  1. Academic Plagiarism Detection: A Systematic Literature Review

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM Computing Surveys
              ACM Computing Surveys  Volume 52, Issue 6
              November 2020
              806 pages
              ISSN:0360-0300
              EISSN:1557-7341
              DOI:10.1145/3368196
              • Editor:
              • Sartaj Sahni
              Issue’s Table of Contents

              Copyright © 2019 Owner/Author

              This work is licensed under a Creative Commons Attribution International 4.0 License.

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 16 October 2019
              • Revised: 1 August 2019
              • Accepted: 1 August 2019
              • Received: 1 March 2019
              Published in csur Volume 52, Issue 6

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • survey
              • Research
              • Refereed

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            HTML Format

            View this article in HTML Format .

            View HTML Format