Full text loading...
Review Article
Free
The Challenge of Big Data and Data Science
- Henry E. Brady1
- Vol. 22:297-323 (Volume publication date May 2019) https://doi.org/10.1146/annurev-polisci-090216-023229
- First published as a Review in Advance on January 21, 2019
-
Copyright © 2019 by Annual Reviews. All rights reserved
Abstract
Big data and data science are transforming the world in ways that spawn new concerns for social scientists, such as the impacts of the internet on citizens and the media, the repercussions of smart cities, the possibilities of cyber-warfare and cyber-terrorism, the implications of precision medicine, and the consequences of artificial intelligence and automation. Along with these changes in society, powerful new data science methods support research using administrative, internet, textual, and sensor-audio-video data. Burgeoning data and innovative methods facilitate answering previously hard-to-tackle questions about society by offering new ways to form concepts from data, to do descriptive inference, to make causal inferences, and to generate predictions. They also pose challenges as social scientists must grasp the meaning of concepts and predictions generated by convoluted algorithms, weigh the relative value of prediction versus causal inference, and cope with ethical challenges as their methods, such as algorithms for mobilizing voters or determining bail, are adopted by policy makers.
Article metrics loading...
Literature Cited
- Ahlquist JA, Breunig C 2012. Model-based clustering and typologies in the social sciences. Political Anal 20:192–112
- Albus JS 1984. Robots and the economy. Futurist 18:638–44
- Alvarez RM 2016. Computational Social Science: Discovery and Prediction (Analytical Methods for Social Research) Cambridge, UK: Cambridge Univ. Press
- Ansolabehere S, Hersh E 2012. Validation: what big data reveal about survey misreporting and the real electorate. Political Anal. 20:4437–59
- Athey S 2018. Draft chapter, Natl. Bur. Econ. Res. Cambridge, MA: http://www.nber.org/chapters/c14009.pdf
- Atkins DE, Droegemeier KK, Feldman SI, Garcia-Molina H, Klein M et al. 2003. Revolutionizing science and engineering through cyberinfrastructure: report of the National Science Foundation blue-ribbon advisory panel on cyberinfrastructure Rep. Natl. Sci. Found. Washington, DC: https://stewardshipgap.net/node/17
- Bail CA 2014. The cultural environment: measuring culture with big data. Theory Soc 43:3/4465–82
- Barberá P 2015. Birds of the same feather tweet together: Bayesian ideal point estimation using Twitter data. Political Anal 23:76–91
- Beachy SH, Olson S, Berger AC 2015. Genomics-Enabled Learning Health Care Systems: Gathering and Using Genomic Information to Improve Patient Care and Research: Workshop Summary Washington, DC: Natl. Acad. Press
- Bennett WL, Segerberg A 2012. The logic of connective action. Inf. Commun. Soc. 15:5739–68
- Berk RA 2008. Statistical Learning from a Regression Perspective New York: Springer
- Berman F, Brady H 2005.Workshop on cyberinfrastructure for the social and behavioral sciences: final report. Rep., Natl. Sci. Found., Alexandria, VA. https://www.sdsc.edu/assets/docs/SBE-CISE-FINAL.pdf. Accessed Dec. 2, 2018
- Bishop CM 2011. Pattern Recognition and Machine Learning New York: Springer
- Bohn R, Short J 2012. Measuring consumer information. Int. J. Commun. 6:980–1000
- Bond RM, Fariss CJ, Jones JJ, Kramer AD, Marlow C et al. 2012. A 61-milllion-person experiment in social influence and political mobilization. Nature 489:7415295–98
- Bond R, Messing S 2015. Quantifying social media's political space: estimating ideology from publicly revealed preferences on Facebook. Am. Political Sci. Rev. 109:162–78
- Bonica A 2013. Ideology and interests in the political marketplace. Am. J. Political Sci. 57:2294–311
- Bonica A 2016. A data-driven voter guide for U.S. elections: adapting quantitative measures of the preferences and priorities of political elites to help votes learn about candidates. RSF Russell Sage Found. J. Soc. Sci. 2:711–32
- Bonica A, Chilton A, Sen M 2016. The political ideologies of American lawyers. J. Legal Analysis 8:2277–335
- Bonica A, Rosenthal H, Rothman DJ 2014. The political polarization of physicians in the United States: an analysis of campaign contributions to federal elections, 1991 through 2012. JAMA Intern. Med. 174:81308–17
- Boullier D 2015. The social sciences and traces of big data: society, opinion, or vibrations?. Rev. Française Sci. Politique 65:5–671–93
- boyd D, Crawford K 2012. Critical questions for big data: provocations for a cultural, technological, and scholarly phenomenon. Inf. Commun. Soc. 15:5662–79
- Brady HE 2009. Causation and explanation in political science. The Oxford Handbook of Political Science R Goodin 217–70 Oxford, UK: Oxford Univ. Press
- Brady HE, Grand SA, Powell MA, Schink W 2001. Access and confidentiality issues with administrative data. Studies of Welfare Populations: Data Collection and Research Issues Natl. Res. Counc. 220–74 Washington, DC: Natl. Acad. Press
- Brady HE, McNulty JE 2011. Turning out to vote: the costs of finding and getting to the polling place. Am. Political Sci. Rev. 105:1115–34
- Brady HE, Schlozman KL, Verba S 1999. Prospecting for participants: rational expectations and the recruitment of political activists. Am. Political Sci. Rev. 93:1153–68
- Breiman L 2001. Statistical modeling: the two cultures. Stat. Sci. 16:3199–231
- Chen H, Chiang RHL, Storey VC 2012. Business intelligence and analytics: from big data to big impact. MIS Q 36:41165–88
- Christiano LJ 2012. Christopher A. Sims and vector autoregressions. Scand. J. Econ. 114:41082–104
- Clark WR, Golder M 2015. Big data, causal inference, and formal theory: contradictory trends in political science. PS Political Sci. Politics 48:165–70
- Clarke RA, Knake R 2011. Cyber War: The Next Threat to National Security and What to Do About It New York: HarperCollins
- Cleveland WS 2001. Data science: an action plan for expanding the technical areas of the field of statistics. Int. Stat. Rev. 69:121–26
- Conway D 2013. The data science Venn diagram. http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
- Corbett-Davies S, Pierson E, Feller A, Goel S, Huq A 2017. Algorithmic decision making and the cost of fairness. Proceedings of 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Canada New York: ACM https://arxiv.org/abs/1701.08230
- Cukier K, Mayer-Schoenberger V 2013. The rise of big data: how it's changing the way we think about the world. Foreign Aff 92:328–40
- Deutsch KW 1963. The Nerves of Government: Models of Political Communication and Control New York: Free Press
- Donoho D 2017. 50 years of data science. J. Comput. Graphical Stat. 26:4745–66
- Dunlap CJ 2014. The hyper-personalization of war: cyber, big data, and the changing face of conflict. Georgetown J. Int. Aff. 15:108–18
- Dustdar S, Nastić S, Šćekić O 2017. Smart Cities: The Internet of Things, People, and Systems New York: Springer Int. Publ.
- Dzau VJ, Ginsburg GS 2016. Realizing the full potential of precision medicine in health and health care. JAMA 316:161659–60
- Enos RD 2016. What the demolition of public housing teaches us about the impact of racial threat on political behavior. Am. J. Political Sci. 60:1123–42
- Evans P 2018. Harnessing big data: a tsunami of transformation. Opening Government137–44 Acton, ACT, Aust.: ANU Press
- Farrell H 2012. The consequences of the internet for politics. Annu. Rev. Political Sci. 15:35–52
- Glaeser EL, Cominers SD, Luca M, Naik N 2018. Big data and big cities: the promises and limitations of improved measures of urban life. Econ. Inq. 56:1114–37
- Goff PA, Lloyd T, Geller A 2016. The science of justice: race, arrests, and police use of force Rep. Cent. Policing Equity New York, NY:
- Gomez-Rodriguez M, Leskovec J, Krause A 2012. Inferring networks of diffusion and influence. ACM Trans. Knowledge Discov. Data 5:421
- Granato J, Scioli F 2004. Puzzles, proverbs, and omega matrices: the scientific and social significance of Empirical Implications of Theoretical Models (EITM). Perspect. Politics 2:2313–23
- Gray J 2009. Jim Gray on eScience: a transformed scientific method. The Fourth Paradigm: Data-Intensive Scientific Discovery T Hey, S Tansley, K Tolle xvii–xxxi Redmond, WA: Microsoft Res.
- Grimmer J, Messing S, Westwood SJ 2012. How words and money cultivate a personal vote: the effect of legislator credit claiming on constituent credit allocation. Am. Political Sci. Rev. 106:4703–19
- Grimmer J, Stewart BM 2013. Text as data: the promise and pitfalls of automatic content analysis methods for political texts. Political Anal 21:3267–97
- Hanauer DA, Rhodes DR, Chinnaiyan AM 2009. Exploring clinical associations using ‘-omics’ based enrichment analyses. PLOS ONE 4:4e5203
- Harcourt BE 2007. Against Prediction: Profiling, Policing, and Punishing in an Actuarial Age Chicago: Univ. Chicago Press
- Hashem IAT, Chang V, Anuar NB, Adewole K, Yaqoob I et al. 2016. The role of big data in Smart City. Int. J. Inf. Manag. 36:748–58
- Hastie T, Tibshirani R, Friedman J 2016. The Elements of Statistical Learning: Data Mining, Inference, and Prediction Stanford, CA: Stanford Univ. Press, 2nd ed..
- Hersh ED 2013. Long-term effect of September 11 on the political behavior of victims' families and neighbors. PNAS 110:5220959–63
- Hilbert M, López P 2011. The world's technological capacity to store, communicate, and compute information. Science 332:60–65
- Hochschild J, Sen M 2015. Genetic determinism, technology, optimism, and race: views of the American public. Ann. AAPSS 661:160–80
- Hopkins D, King G 2010. A method of automated nonparametric content analysis for social science. Am. J. Political Sci. 54:1229–47
- Hsiang SM, Burke M, Miguel E 2013. Quantifying the influence of climate on human conflict. Science 341:1235367
- Hsiang SM, Meng KC, Cane MA 2011. Civil conflicts are associated with the global climate. Nature 476:438–41
- Jamieson K 2018. Cyber-War: How Russian Hackers and Trolls Helped Elect a President New York: Oxford Univ. Press
- Jordan M 2018. Artificial intelligence—the revolution hasn't happened yet. Medium https://medium.com/@mijordan3/artificial-intelligence-the-revolution-hasnt-happened-yet-5e1d5812e1e7
- Kalil T 2012. Big data is a big deal. Press release, The White House, Mar. 29. https://obamawhitehouse.archives.gov/blog/2012/03/29/big-data-big-deal
- Kandel S, Paepeke A, Hellerstein Heer J 2011. Wrangler: interactive visual specification of data transformation scripts Paper presented at CHI Conference on Human Factors in Computing Systems, May 7–12, Vancouver, BC
- Kandel S, Paepeke A, Hellerstein Heer J 2012. Enterprise data analysis and visualization: an interview study. IEEE Trans. Vis. Comput. Graph. 18:122917–26
- Kaplan F 2017. Dark Territory: The Secret History of Cyber War New York: Simon & Schuster
- Kim IS 2017. Political cleavages within industry: firm-level lobbying for trade liberalization. Am. Political Sci. Rev. 111:11–20
- Kim IS, Kunisky D 2018. Mapping political communities: a statistical analysis of lobbying networks in legislative politics Work. Pap., Mass. Inst. Technol. http://web.mit.edu/insong/www/pdf/network.pdf. Accessed Dec. 2, 2018
- King G, Pan J, Roberts ME 2013. How censorship in China allows government criticism but silences collective expression. Am. Political Sci. Rev. 107:2326–43
- Kitchin R 2014. The real-time city? Big data and smart urbanism. GeoJournal 79:11–14
- Kitzes J, Turek D, Deniz F 2017. The Practice of Reproducible Research: Case Studies and Lessons from the Data-Intensive Sciences Oakland: Univ. Calif. Press
- Kleinberg J, Ludwig J, Mullainathan S, Obermeyer Z 2015. Prediction policy problems. Am. Econ. Rev. Pap. Proc. 105:5491–95
- Knight W 2017. The dark secret at the heart of AI. MIT Technol. Rev. May/June. https://www.technologyreview.com/s/604087/the-dark-secret-at-the-heart-of-ai/
- Laney D 2001. 3D data management: controlling data volume, velocity, and variety. Application Delivery Strategies File 949, Feb. 6, META Group. https://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf
- Lasswell HD 1951. The policy orientation. The Policy Sciences: Recent Developments in Scope and Method D Lerner, H Lasswell 3–15 Stanford, CA: Stanford Univ. Press
- Laver M, Benoit K, Garry J 2003. Extracting policy positions from political texts using words as data. Am. Political Sci. Rev. 97:2311–31
- Lazer D, Kennedy R, King G, Vespignani A 2014. The parable of Google flu: traps in big data analysis. Science 343:61761203–4
- LeCun Y, Bengio Y, Hinton G 2015. Deep learning. Nature 521:436–44
- Leskovec J, Backstrom L, Kleinberg J 2009. Meme-tracking and the dynamics of the news cycle Paper presented at 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, June 28–July 1, Paris, France
- Libicki MC 2014. Why cyber war will not and should not have its grand strategist. Strateg. Stud. Q. 8:123–39
- Lin H, Tegmark M, Rolnick D 2017. Why does deep and cheap learning work so well?. J. Stat. Phys. 168:61223–47
- Lugmayr A, Stockleben B, Scheib C 2016. A comprehensive survey on big-data research and its implications—What is really ‘new’ in big data?—It's cognitive big data!. PACIS 2016 Proceedings Abstr. 248. https://aisel.aisnet.org/pacis2016/248
- Luks S, Brady HE 2003. Defining welfare spells. Coping with problems of survey responses and administrative data. Eval. Rev. 27:4395–420
- Lyman P, Varian HR 2003. How much information? Executive summary Rep. School Inf. Manag. Syst., Univ. Calif. Berkeley, CA: http://groups.ischool.berkeley.edu/archive/how-much-info-2003/execsum.htm
- Maimon O, Roach L 2005. The Data Mining and Knowledge Discovery Handbook New York: Springer
- Manjoo F 2016. A plan in case robots take the jobs: give everyone a paycheck. New York Times Mar. 2. https://www.nytimes.com/2016/03/03/technology/plan-to-fight-robot-invasion-at-work-give-everyone-a-paycheck.html
- Mayer-Schönberger V, Cukier K 2014. Big Data: A Revolution That Will Transform How We Live, Work, and Think Boston: Houghton Mifflin Harcourt
- Mbadiwe T 2018. Algorithmic injustice. New Atlantis 54:3–28
- Mergel I 2016. Big data in public affairs education. J. Public Aff. Educ. 22:2231–48
- Miller K 2012. Big data analytics in biomedical research. Biomed. Comput. Rev. Winter 2011/2012:14–21. http://biomedicalcomputationreview.org/content/big-data-analytics-biomedical-research
- Mosco V 2014. To the Cloud: Big Data in a Turbulent World New York: Paradigm
- Mullainathan S, Spiess J 2017. Machine learning: an applied econometric approach. J. Econ. Perspect. 31:287–106
- Nagler J, Tucker JA 2015. Drawing inferences and testing theories with big data. PS Political Sci. Politics 48:184–88
- National Research Council. 2011. Toward Precision Medicine: Building a Knowledge Network for Biomedical Research and a New Taxonomy of Disease Washington, DC: Natl. Acad. Press
- National Research Council. 2013. Frontiers in Massive Data Analysis Washington, DC: Natl. Acad. Press
- Neumann R 2016. The Digital Difference: Media Technology and the Theory of Communication Effects Cambridge, MA: Harvard Univ. Press
- Nickerson DW, Rogers T 2014. Political campaigns and big data. J. Econ. Perspect. 28:251–73
- NIST (Natl. Inst. Standards Technol.). 2015. Big data interoperability framework: Volume 1, definitions NIST Spec. Publ. 1500-1. https://bigdatawg.nist.gov/_uploadfiles/NIST.SP.1500-1.pdf
- NITRD (Netw. Inf. Technol. Res. Dev.). 2016. The federal big data research and development strategic plan Rep. Big Data Senior Steering Group, Subcomm. NITRD Washington, DC: https://www.nitrd.gov/PUBS/bigdatardstrategicplan.pdf
- Noble S 2018. Algorithms of Oppression: How Search Engines Reinforce Racism New York: New York Univ. Press
- Oussous A, Benjelloun FZ, Lahcen AA, Belfkih S 2018. Big data technologies: a survey. J. King Saud Univ.—Comput. Inf. Sci. 30:4431–48
- Picon A 2015. Smart Cities: A Spatialised Intelligence New York: Wiley
- Pierson E, Simoiu C, Overgoor J, Overgoor J, Corbett-Davies S et al. 2017. A large-scale analysis of racial disparities in police stops across the United States. arXiv:1706.05678 [stat.AP]
- Pool IS 1983. Tracking the flow of information. Science 221:4611609–13
- Porche IR, Wilson B, Johnson EE, Tierney S, Saltzman E 2014. Barrier to benefiting from big data. Data Flood: Helping the Navy Address the Rising Tide of Sensor Information13–21 Santa Monica, CA: RAND Corp.
- Powell J 2017. Identification and asymptotic approximations: three examples of progress in econometric theory. J. Econ. Perspect. 31:2107–24
- Pratt GA 2015. Is a Cambrian explosion coming for robotics?. J. Econ. Perspect. 29:51–60
- Prior M 2013. Media and political polarization. Annu. Rev. Political Sci. 16:101–27
- Rid T 2012. Cyber war will not take place. J. Strateg. Stud. 35:15–32
- Ripley BD 1995. Pattern Recognition and Neural Networks New York: Cambridge Univ. Press
- Roberts M, Stewart B, Tingley D, Lucas C, Leder-Luis J et al. 2014. Structural topic models for open-ended survey responses. Am. J. Political Sci. 58:41064–82
- Rogers R 2013. Digital Methods Cambridge, MA: MIT Press
- Russell S, Norvig P 2009. Artificial Intelligence: A Modern Approach New York: Pearson, 3rd ed..
- Salganik MJ 2017. Bit by Bit: Social Research in the Digital Age Princeton, NJ: Princeton Univ. Press
- Samuel A 1962. Artificial intelligence: a frontier of automation. Ann. Am. Acad. Political Social Sci. 340:10–20
- Sanger DE 2018. The Perfect Weapon: War, Sabotage, and Fear in the Cyber Age New York: Crown
- Sarle W 1994. Neural networks and statistical models. Proceedings of the Nineteenth Annual SAS Users Group International Conference, Dallas, Texas, Aprl 10–13 Cary, NC: SAS Inst http://www.sascommunity.org/sugi/SUGI94/Sugi-94-255%20Sarle.pdf
- Schmidhuber J 2015. Deep learning in neural networks: an overview. Neural Netw 61:85–117
- Schroeder R 2018. Social Theory after the Internet: Media, Technology, and Globalization London: UCL Press
- Schudson M 2002. The news media as political institutions. Annu. Rev. Political Sci. 5:249–69
- Scott JC 1999. Seeing Like a State London: Yale Univ. Press
- Shmueli G 2010. To explain or to predict. Stat. Sci. 25:3289–310
- Sims CA 1980. Macroeconomics and reality. Econometrics 48:11–48
- Smith G 2018. The AI Delusion New York: Oxford Univ. Press
- Statistical Science. 2003. Tribute to John W. Tukey. Stat. Sci. 18:3)
- Stephens-Davidowitz S 2014. The cost of racial animus on a black candidate: evidence using Google search data. J. Public Econ. 118:26–40
- Tankersley J 2018. Democrats' next big thing: government-guaranteed jobs. New York Times May 22. https://www.nytimes.com/2018/05/22/us/politics/democrats-guaranteed-jobs.html
- Taylor GR 1951. The Transportation Revolution 1815–1860 New York: Rinehart
- Thagard P 1992. Conceptual Revolutions Princeton, NJ: Princeton Univ. Press
- Theodoridis AG, Nelson AJ 2012. Of BOLD claims and excessive fears: a call for caution and patience regarding political neuroscience. Political Psychol 33:127–28
- Tinati R, Halford S, Carr L et al. 2014. Big data: methodological challenges and approaches for sociological analysis. Sociology 48:4663–81
- Titiunik R 2015. Can big data solve the fundamental problem of causal inference?. PS Political Sci. Politics 48:175–79
- Townsend AM 2013. Smart Cities: Big Data, Civic Hackers, and the Quest for a New Utopia New York/London: W.W. Norton
- Tukey J 1962. The future of data analysis. Ann. Math. Stat. 33:11–67
- Turnbull N 2008. Harold Lasswell's “problem orientation” for the policy sciences. Crit. Policy Anal. 2:272–91
- Varian HR 2014. Big data: new tricks for econometrics. J. Econ. Perspect. 28:23–27
- Voigt R, Camp NP, Prabhakaran V et al. 2017. Language from policy body camera footage shows racial disparities in officer respect. PNAS 114:256521–26
- Ward JS, Barker A 2013. Undefined by data: a survey of big data definitions. arXiv:1309.5821 [cs.DB]
- Warner B, Misra M 1996. Understanding neural networks as statistical tools. Am. Statistician 50:40284–93
- Weil F 2012. The sinews of society are changing. Huffington Post, Apr. 17. https://www.huffingtonpost.com/frank-a-weil/the-sinews-of-society-are_b_1277241.html
- White H 1992. Artificial Neural Networks: Approximation and Learning Theory Cambridge, MA: Blackwell
- Wickham H 2014. Tidy data. J. Stat. Softw. 59:101–24
- Wiedemann G 2013. Opening up to big data: computer-assisted analysis of textual data in social sciences. Forum Qual. Soc. Res. 14:213 http://www.qualitative-research.net/index.php/fqs/article/view/1949
- Wigner E 1960. The unreasonable effectiveness of mathematics in the natural sciences. Commun. Pure Appl. Math. 13:11–14
- Wilkerson J, Casas A 2017. Large-scale computerized text analysis in political science: opportunities and challenges. Annu. Rev. Political Sci. 20:529–44
- Williams BA, Brooks CF, Shmargad Y 2018. How algorithms discriminate based on data they lack: challenges, solutions, and policy implications. J. Inf. Policy 8:78–115
- Yarkoni T, Westfall J 2017. Choosing prediction over explanation in psychology: lessons from machine learning. Perspect. Psychol. Sci. 12:61100–22
Data & Media loading...
- Article Type: Review Article
Most Read This Month
Most Cited Most Cited RSS feed
-
-
-
-
Discursive Institutionalism: The Explanatory Power of Ideas and Discourse
Vol. 11 (2008), pp. 303–326
-
-
-
-
-
The Origins and Consequences of Affective Polarization in the United States
Vol. 22 (2019), pp. 129–146
-
-
-
-
-
Public Attitudes Toward Immigration
Vol. 17 (2014), pp. 225–249
-
-
-
-
-
Economic Determinants of Electoral Outcomes
Vol. 3 (2000), pp. 183–219
-
- More Less