skip to main content
10.1145/2030376.2030394acmotherconferencesArticle/Chapter ViewAbstractPublication PagesceasConference Proceedingsconference-collections
research-article

Link spamming Wikipedia for profit

Published:01 September 2011Publication History

ABSTRACT

Collaborative functionality is an increasingly prevalent web technology. To encourage participation, these systems usually have low barriers-to-entry and permissive privileges. Unsurprisingly, ill-intentioned users try to leverage these characteristics for nefarious purposes. In this work, a particular abuse is examined -- link spamming -- the addition of promotional or otherwise inappropriate hyperlinks.

Our analysis focuses on the wiki model and the collaborative encyclopedia, Wikipedia, in particular. A principal goal of spammers is to maximize exposure, the quantity of people who view a link. Creating and analyzing the first Wikipedia link spam corpus, we find that existing spam strategies perform quite poorly in this regard. The status quo spamming model relies on link persistence to accumulate exposures, a strategy that fails given the diligence of the Wikipedia community. Instead, we propose a model that exploits the latency inherent in human anti-spam enforcement.

Statistical estimation suggests our novel model would produce significantly more link exposures than status quo techniques. More critically, the strategy could prove economically viable for perpetrators, incentivizing its exploitation. To this end, we address mitigation strategies.

References

  1. Alexa Web Info. http://aws.amazon.com/awis/.Google ScholarGoogle Scholar
  2. Google Safe Browsing API. http://code.google.com/apis/safebrowsing/. (Malware/phishing lists).Google ScholarGoogle Scholar
  3. Huggle. http://en.wikipedia.org/wiki/WP:HG.Google ScholarGoogle Scholar
  4. Link building on Wikipedia. http://www.gamblingcashcow.com/link-building-on-wikipedia/. (SEO blog).Google ScholarGoogle Scholar
  5. MediaWiki API. http://en.wikipedia.org/w/api.php.Google ScholarGoogle Scholar
  6. MediaWiki (MW). http://www.mediawiki.org/.Google ScholarGoogle Scholar
  7. MW extensions. http://www.mediawiki.org/Extension_Matrix.Google ScholarGoogle Scholar
  8. Pending changes: Straw poll. http://en.wikipedia.org/wiki/Wikipedia:Pending_changes/Straw_poll.Google ScholarGoogle Scholar
  9. Spamhaus Project. http://www.spamhaus.org/.Google ScholarGoogle Scholar
  10. The Open Directory Project. http://www.dmoz.org/.Google ScholarGoogle Scholar
  11. Wikimedia statistics. http://dammit.lt/wikistats.Google ScholarGoogle Scholar
  12. Wikipedia (local) and Wikimedia (global) spam blacklists. http://en.wikipedia.org/wiki/WP:BLACKLIST.Google ScholarGoogle Scholar
  13. Wikipedia (WP). http://www.wikipedia.org/.Google ScholarGoogle Scholar
  14. WikiProject spam. http://en.wikipedia.org/wiki/WP:WPSPAM.Google ScholarGoogle Scholar
  15. Wikistats. http://stats.wikimedia.org/.Google ScholarGoogle Scholar
  16. WP: External links. http://en.wikipedia.org/wiki/WP:EXT.Google ScholarGoogle Scholar
  17. WP: Protection policy. http://en.wikipedia.org/wiki/WP:PP.Google ScholarGoogle Scholar
  18. WP: User access levels. http://en.wikipedia.org/wiki/WP:UAL.Google ScholarGoogle Scholar
  19. XRumer. http://www.xrumerseo.com/.Google ScholarGoogle Scholar
  20. S. Abu-Nimeh and T. Chen. Proliferation and detection of blog spam. IEEE Security and Privacy, 8:42--47, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. B. Adler, L. de Alfaro, S. M. Mola-Velasco, P. Rosso, and A. G. West. Wikipedia vandalism detection: Combining natural language, metadata, and reputation features. In CI-CLing'11 and LNCS 6609, pages 277--288, February 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. B. T. Adler and L. de Alfaro. A content-driven reputation system for the Wikipedia. In WWW'07, May 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Antin and C. Cheshire. Readers are not free-riders: Reading as a form of participation on Wikipedia. In CSCW'10: Conf. on Computer Supported Cooperative Work, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. N. Cohen. Wikipedia to limit changes to articles on people. New York Times, page B1, August 25, 2009.Google ScholarGoogle Scholar
  25. H. Dai, Z. Nie, L. Wang, L. Zhao, J.-R. Wen, and Y. Li. Detecting online commercial intention (OCI). In WWW'06. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. Felegyhazi, C. Kreibich, and V. Paxson. On the potential of proactive domain blacklisting. In LEET: Proc. of the Conf. on Large-scale Exploits and Emergent Threats, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. P. K.-F. Fong and R. P. Biuk-Aghai. What did they do? Deriving high-level edit histories in wikis. In WikiSym'10: Intl. Syposium on Wikis and Open Collaboration, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. H. Gao, J. Hu, C. Wilson, Z. Li, Y. Chen, and B. Y. Zhao. Detecting and characterizing social spam campaigns. In CCS'10: Proceedings of the Conference on Computer and Communications Security, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. R. S. Geiger and D. Ribes. The work of sustaining order in Wikipedia: The banning of a vandal. In CSCW'10: Proc. of the Conf. on Computer Supported Cooperative Work, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. E. Goldman. Wikipedia's labor squeeze and its consequences. Journal of Telecomm. and High Tech. Law, 8, 2009.Google ScholarGoogle Scholar
  31. S. Han, Y. yeol Ahn, S. Moon, and H. Jeong. Collaborative blog spam filtering using adaptive percolation search. In WWE'06: The Wkshp. on the Weblogging Ecosystem, 2006.Google ScholarGoogle Scholar
  32. P. Heymann, G. Koutrika, and H. Garcia-Molina. Fighting spam on social web sites: A survey of approaches and future challenges. IEEE Internet Comp., 11(6):36--45, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. C. Kanich, C. Kreibich, K. Levchenko, B. Enright, G. M. Voelker, V. Paxson, and S. Savage. Spamalytics: An empirical market analysis of spam marketing conversion. In CCS'08: Conf. on Computer and Comm. Security, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Kaspersky Labs. Spam in the third quarter of 2010. http://www.securelist.com/en/analysis/204792147/Spam_in_the_Third_Quarter_of_2010.Google ScholarGoogle Scholar
  35. B. Krebs. Body armor for bad websites. http://krebsonsecurity.com/2010/11/body-armor-for-bad-web-sites/.Google ScholarGoogle Scholar
  36. C. McCarthy. Amazon adds Wikipedia to book-shopping.http://news.cnet.com/8301-13577_3-20024297-36.html, 2010.Google ScholarGoogle Scholar
  37. Y. min Wang, M. Ma, Y. Niu, and H. Chen. Spam double-funnel: Connecting web spammers with advertisers. In WWW'07: Proc. of the 16th World Wide Web Conf., 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. G. Mishne, D. Carmel, and R. Lempel. Blocking blog spam with language model disagreement. In AIRWeb'05: Proc. of the Wkshp. on Adversarial Info. Retrieval on the Web, 2005.Google ScholarGoogle Scholar
  39. M. Motoyama, K. Levchenko, C. Kanich, D. McCoy, G. M. Voekler, and S. Savage. Re: CAPTCHAs - Understanding CAPTCHA-solving services in an economic context. In USENIX Security, August 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Y. Niu, Y. min Wang, H. Chen, M. Ma, and F. Hsu. A quantitative study of forum spamming using context-based analysis. In NDSS'07: Proc. of the Network and Distributed System Security Symposium, 2007.Google ScholarGoogle Scholar
  41. A. Ntoulas, M. Najork, M. Manasse, and D. Fetterly. Detecting spam web pages through content analysis. In WWW'06. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. M. Potthast, B. Stein, and R. Gerling. Automatic vandalism detection in Wikipedia. In Advances in Information Retrieval, pages 663--668, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. M. Potthast, B. Stein, and T. Holfeld. Overview of the 1st Intl. competition on Wikipedia vandalism detection. In PAN-CLEF 2010 Labs and Workshops, 2010.Google ScholarGoogle Scholar
  44. R. Priedhorsky, J. Chen, S. K. Lam, K. Panciera, L. Terveen, and J. Riedl. Creating, destroying, and restoring value in Wikipedia. In GROUP'07: Proceedings of the 2007 Intl. ACM Conference on Supporting Group Work, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. N. Provos, P. Mavrommatis, M. A. Rajab, and F. Monrose. All your iFrames point to us. In USENIX Security, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Y. Shin, M. Gupta, and S. Myers. The nuts and bolts of a forum spam automator. In LEET: Proc. of the 4th Wkshp. on Large-Scale Exploits and Emergent Threats, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. B. E. Ur and V. Ganapathy. Evaluating attack amplification in online social networks. In W2SP'09: The Workshop on Web 2.0 Security and Privacy, 2009.Google ScholarGoogle Scholar
  48. B. Vibber. http://lists.wikimedia.org/pipermail/wikien-l/2007-January/061137.html. (HTML {nofollow}) enabled).Google ScholarGoogle Scholar
  49. A. G. West. STiki: A vandalism detection tool for Wikipedia. http://en.wikipedia.org/wiki/Wikipedia:STiki.Google ScholarGoogle Scholar
  50. A. G. West, A. Agrawal, P. Baker, B. Exline, and I. Lee. Autonomous link spam detection in purely collaborative environments. In WikiSym `11: Proc. of the 7th Intl. Symposium on Wikis and Open Collaboration, October 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. A. G. West, S. Kannan, and I. Lee. Detecting Wikipedia vandalism via spatio-temporal analysis of revision metadata. In EUROSEC'10: European Wkshp. on System Security, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. A. G. West and I. Lee. What Wikipedia deletes: Examining dangerous collaborative content. In WikiSym `11: 7th Intl. Symposium on Wikis and Open Collaboration, October 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Link spamming Wikipedia for profit

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Other conferences
            CEAS '11: Proceedings of the 8th Annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference
            September 2011
            230 pages
            ISBN:9781450307888
            DOI:10.1145/2030376

            Copyright © 2011 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 1 September 2011

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader