ABSTRACT
Collaborative functionality is an increasingly prevalent web technology. To encourage participation, these systems usually have low barriers-to-entry and permissive privileges. Unsurprisingly, ill-intentioned users try to leverage these characteristics for nefarious purposes. In this work, a particular abuse is examined -- link spamming -- the addition of promotional or otherwise inappropriate hyperlinks.
Our analysis focuses on the wiki model and the collaborative encyclopedia, Wikipedia, in particular. A principal goal of spammers is to maximize exposure, the quantity of people who view a link. Creating and analyzing the first Wikipedia link spam corpus, we find that existing spam strategies perform quite poorly in this regard. The status quo spamming model relies on link persistence to accumulate exposures, a strategy that fails given the diligence of the Wikipedia community. Instead, we propose a model that exploits the latency inherent in human anti-spam enforcement.
Statistical estimation suggests our novel model would produce significantly more link exposures than status quo techniques. More critically, the strategy could prove economically viable for perpetrators, incentivizing its exploitation. To this end, we address mitigation strategies.
- Alexa Web Info. http://aws.amazon.com/awis/.Google Scholar
- Google Safe Browsing API. http://code.google.com/apis/safebrowsing/. (Malware/phishing lists).Google Scholar
- Huggle. http://en.wikipedia.org/wiki/WP:HG.Google Scholar
- Link building on Wikipedia. http://www.gamblingcashcow.com/link-building-on-wikipedia/. (SEO blog).Google Scholar
- MediaWiki API. http://en.wikipedia.org/w/api.php.Google Scholar
- MediaWiki (MW). http://www.mediawiki.org/.Google Scholar
- MW extensions. http://www.mediawiki.org/Extension_Matrix.Google Scholar
- Pending changes: Straw poll. http://en.wikipedia.org/wiki/Wikipedia:Pending_changes/Straw_poll.Google Scholar
- Spamhaus Project. http://www.spamhaus.org/.Google Scholar
- The Open Directory Project. http://www.dmoz.org/.Google Scholar
- Wikimedia statistics. http://dammit.lt/wikistats.Google Scholar
- Wikipedia (local) and Wikimedia (global) spam blacklists. http://en.wikipedia.org/wiki/WP:BLACKLIST.Google Scholar
- Wikipedia (WP). http://www.wikipedia.org/.Google Scholar
- WikiProject spam. http://en.wikipedia.org/wiki/WP:WPSPAM.Google Scholar
- Wikistats. http://stats.wikimedia.org/.Google Scholar
- WP: External links. http://en.wikipedia.org/wiki/WP:EXT.Google Scholar
- WP: Protection policy. http://en.wikipedia.org/wiki/WP:PP.Google Scholar
- WP: User access levels. http://en.wikipedia.org/wiki/WP:UAL.Google Scholar
- XRumer. http://www.xrumerseo.com/.Google Scholar
- S. Abu-Nimeh and T. Chen. Proliferation and detection of blog spam. IEEE Security and Privacy, 8:42--47, 2010. Google ScholarDigital Library
- B. Adler, L. de Alfaro, S. M. Mola-Velasco, P. Rosso, and A. G. West. Wikipedia vandalism detection: Combining natural language, metadata, and reputation features. In CI-CLing'11 and LNCS 6609, pages 277--288, February 2011. Google ScholarDigital Library
- B. T. Adler and L. de Alfaro. A content-driven reputation system for the Wikipedia. In WWW'07, May 2007. Google ScholarDigital Library
- J. Antin and C. Cheshire. Readers are not free-riders: Reading as a form of participation on Wikipedia. In CSCW'10: Conf. on Computer Supported Cooperative Work, 2010. Google ScholarDigital Library
- N. Cohen. Wikipedia to limit changes to articles on people. New York Times, page B1, August 25, 2009.Google Scholar
- H. Dai, Z. Nie, L. Wang, L. Zhao, J.-R. Wen, and Y. Li. Detecting online commercial intention (OCI). In WWW'06. Google ScholarDigital Library
- M. Felegyhazi, C. Kreibich, and V. Paxson. On the potential of proactive domain blacklisting. In LEET: Proc. of the Conf. on Large-scale Exploits and Emergent Threats, 2010. Google ScholarDigital Library
- P. K.-F. Fong and R. P. Biuk-Aghai. What did they do? Deriving high-level edit histories in wikis. In WikiSym'10: Intl. Syposium on Wikis and Open Collaboration, 2010. Google ScholarDigital Library
- H. Gao, J. Hu, C. Wilson, Z. Li, Y. Chen, and B. Y. Zhao. Detecting and characterizing social spam campaigns. In CCS'10: Proceedings of the Conference on Computer and Communications Security, 2010. Google ScholarDigital Library
- R. S. Geiger and D. Ribes. The work of sustaining order in Wikipedia: The banning of a vandal. In CSCW'10: Proc. of the Conf. on Computer Supported Cooperative Work, 2010. Google ScholarDigital Library
- E. Goldman. Wikipedia's labor squeeze and its consequences. Journal of Telecomm. and High Tech. Law, 8, 2009.Google Scholar
- S. Han, Y. yeol Ahn, S. Moon, and H. Jeong. Collaborative blog spam filtering using adaptive percolation search. In WWE'06: The Wkshp. on the Weblogging Ecosystem, 2006.Google Scholar
- P. Heymann, G. Koutrika, and H. Garcia-Molina. Fighting spam on social web sites: A survey of approaches and future challenges. IEEE Internet Comp., 11(6):36--45, 2007. Google ScholarDigital Library
- C. Kanich, C. Kreibich, K. Levchenko, B. Enright, G. M. Voelker, V. Paxson, and S. Savage. Spamalytics: An empirical market analysis of spam marketing conversion. In CCS'08: Conf. on Computer and Comm. Security, 2008. Google ScholarDigital Library
- Kaspersky Labs. Spam in the third quarter of 2010. http://www.securelist.com/en/analysis/204792147/Spam_in_the_Third_Quarter_of_2010.Google Scholar
- B. Krebs. Body armor for bad websites. http://krebsonsecurity.com/2010/11/body-armor-for-bad-web-sites/.Google Scholar
- C. McCarthy. Amazon adds Wikipedia to book-shopping.http://news.cnet.com/8301-13577_3-20024297-36.html, 2010.Google Scholar
- Y. min Wang, M. Ma, Y. Niu, and H. Chen. Spam double-funnel: Connecting web spammers with advertisers. In WWW'07: Proc. of the 16th World Wide Web Conf., 2007. Google ScholarDigital Library
- G. Mishne, D. Carmel, and R. Lempel. Blocking blog spam with language model disagreement. In AIRWeb'05: Proc. of the Wkshp. on Adversarial Info. Retrieval on the Web, 2005.Google Scholar
- M. Motoyama, K. Levchenko, C. Kanich, D. McCoy, G. M. Voekler, and S. Savage. Re: CAPTCHAs - Understanding CAPTCHA-solving services in an economic context. In USENIX Security, August 2010. Google ScholarDigital Library
- Y. Niu, Y. min Wang, H. Chen, M. Ma, and F. Hsu. A quantitative study of forum spamming using context-based analysis. In NDSS'07: Proc. of the Network and Distributed System Security Symposium, 2007.Google Scholar
- A. Ntoulas, M. Najork, M. Manasse, and D. Fetterly. Detecting spam web pages through content analysis. In WWW'06. Google ScholarDigital Library
- M. Potthast, B. Stein, and R. Gerling. Automatic vandalism detection in Wikipedia. In Advances in Information Retrieval, pages 663--668, 2008. Google ScholarDigital Library
- M. Potthast, B. Stein, and T. Holfeld. Overview of the 1st Intl. competition on Wikipedia vandalism detection. In PAN-CLEF 2010 Labs and Workshops, 2010.Google Scholar
- R. Priedhorsky, J. Chen, S. K. Lam, K. Panciera, L. Terveen, and J. Riedl. Creating, destroying, and restoring value in Wikipedia. In GROUP'07: Proceedings of the 2007 Intl. ACM Conference on Supporting Group Work, 2007. Google ScholarDigital Library
- N. Provos, P. Mavrommatis, M. A. Rajab, and F. Monrose. All your iFrames point to us. In USENIX Security, 2008. Google ScholarDigital Library
- Y. Shin, M. Gupta, and S. Myers. The nuts and bolts of a forum spam automator. In LEET: Proc. of the 4th Wkshp. on Large-Scale Exploits and Emergent Threats, 2011. Google ScholarDigital Library
- B. E. Ur and V. Ganapathy. Evaluating attack amplification in online social networks. In W2SP'09: The Workshop on Web 2.0 Security and Privacy, 2009.Google Scholar
- B. Vibber. http://lists.wikimedia.org/pipermail/wikien-l/2007-January/061137.html. (HTML {nofollow}) enabled).Google Scholar
- A. G. West. STiki: A vandalism detection tool for Wikipedia. http://en.wikipedia.org/wiki/Wikipedia:STiki.Google Scholar
- A. G. West, A. Agrawal, P. Baker, B. Exline, and I. Lee. Autonomous link spam detection in purely collaborative environments. In WikiSym `11: Proc. of the 7th Intl. Symposium on Wikis and Open Collaboration, October 2011. Google ScholarDigital Library
- A. G. West, S. Kannan, and I. Lee. Detecting Wikipedia vandalism via spatio-temporal analysis of revision metadata. In EUROSEC'10: European Wkshp. on System Security, 2010. Google ScholarDigital Library
- A. G. West and I. Lee. What Wikipedia deletes: Examining dangerous collaborative content. In WikiSym `11: 7th Intl. Symposium on Wikis and Open Collaboration, October 2011. Google ScholarDigital Library
Index Terms
Link spamming Wikipedia for profit
-
Recommendations
-
Autonomous link spam detection in purely collaborative environments
WikiSym '11: Proceedings of the 7th International Symposium on Wikis and Open CollaborationCollaborative models (e.g., wikis) are an increasingly prevalent Web technology. However, the open-access that defines such systems can also be utilized for nefarious purposes. In particular, this paper examines the use of collaborative functionality to ...
-
Spamming botnets: signatures and characteristics
In this paper, we focus on characterizing spamming botnets by leveraging both spam payload and spam server traffic properties. Towards this goal, we developed a spam signature generation framework called AutoRE to detect botnet-based spam emails and ...
-
Spamming botnets: signatures and characteristics
SIGCOMM '08: Proceedings of the ACM SIGCOMM 2008 conference on Data communicationIn this paper, we focus on characterizing spamming botnets by leveraging both spam payload and spam server traffic properties. Towards this goal, we developed a spam signature generation framework called AutoRE to detect botnet-based spam emails and ...
Comments