ABSTRACT
Compilers are critical, widely-used complex software. Bugs in them have significant impact, and can cause serious damage when they silently miscompile a safety-critical application. An in-depth understanding of compiler bugs can help detect and fix them. To this end, we conduct the first empirical study on the characteristics of the bugs in two main-stream compilers, GCC and LLVM. Our study is significant in scale — it exhaustively examines about 50K bugs and 30K bug fix revisions over more than a decade’s span. This paper details our systematic study. Summary findings include: (1) In both compilers, C++ is the most buggy component, accounting for around 20% of the total bugs and twice as many as the second most buggy component; (2) the bug revealing test cases are typically small, with 80% having fewer than 45 lines of code; (3) most of the bug fixes touch a single source file with small modifications (43 lines for GCC and 38 for LLVM on average); (4) the average lifetime of GCC bugs is 200 days, and 111 days for LLVM; and (5) high priority tends to be assigned to optimizer bugs, most notably 30% of the bugs in GCC’s inter-procedural analysis component are labeled P1 (the highest priority). This study deepens our understanding of compiler bugs. For application developers, it shows that even mature production compilers still have many bugs, which may affect development. For researchers and compiler developers, it sheds light on interesting characteristics of compiler bugs, and highlights challenges and opportunities to more effectively test and debug compilers.
- ACE. SuperTest compiler test and validation suite. http://www.ace.nl/compiler/supertest.html.Google Scholar
- A. Balestrat. CCG: A random C code generator. https: //github.com/Merkil/ccg/.Google Scholar
- S. Blazy, Z. Dargaye, and X. Leroy. Formal Verification of a C Compiler Front-End. In Int. Symp. on Formal Methods (FM), pages 460–475, 2006. Google ScholarDigital Library
- N. Chen, S. C. H. Hoi, and X. Xiao. Software Process Evaluation: A Machine Learning Approach. In ASE, pages 333–342, Washington, DC, USA, 2011. ISBN 978- 1-4577-1638-6. Google ScholarDigital Library
- Y. Chen, A. Groce, C. Zhang, W.-K. Wong, X. Fern, E. Eide, and J. Regehr. Taming compiler fuzzers. In Proceedings of the 2013 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 197–208, 2013. Google ScholarDigital Library
- R. Chillarege, W.-L. Kao, and R. G. Condit. Defect Type and Its Impact on the Growth Curve. In Proceedings of the 13th International Conference on Software Engineering (ICSE), pages 246–255, 1991. Google ScholarDigital Library
- ISBN 0- 89791-391-4. URL http://dl.acm.org/citation.cfm?id= 256664.256773.Google Scholar
- A. Chou, J. Yang, B. Chelf, S. Hallem, and D. Engler. An Empirical Study of Operating Systems Errors. In Proceedings of the Eighteenth ACM Symposium on Operating Systems Principles (SOSP), pages 73–88, 2001. Google ScholarDigital Library
- ISBN 1-58113-389-8.Google Scholar
- P. Cuoq, B. Monate, A. Pacalet, V. Prevosto, J. Regehr, B. Yakobowski, and X. Yang. Testing static analyzers with randomly generated programs. In NASA Formal Methods - 4th International Symposium (NFM), pages 120–125, 2012. Google ScholarDigital Library
- GCC. GIMPLE – GNU Compiler Collection (GCC) Internals,. https://gcc.gnu.org/onlinedocs/gccint/ GIMPLE.html, accessed: 2014-06-25.Google Scholar
- GCC. RTL – GNU Compiler Collection (GCC) Internals,. https://gcc.gnu.org/onlinedocs/gccint/RTL. html, accessed: 2014-06-25.Google Scholar
- V. Le, M. Afshari, and Z. Su. Compiler Validation via Equivalence Modulo Inputs. In Proceedings of the 2014 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2014. Google ScholarDigital Library
- V. Le, C. Sun, and Z. Su. Randomized Stress-Testing of Link-Time Optimizers. In Proceedings of the 2015 International Symposium on Software Testing and Analysis (ISSTA), pages 327–337. ACM, 2015. Google ScholarDigital Library
- V. Le, C. Sun, and Z. Su. Finding Deep Compiler Bugs via Guided Stochastic Program Mutation. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), pages 386–399. ACM, 2015. Google ScholarDigital Library
- X. Leroy, A. W. Appel, S. Blazy, and G. Stewart. The CompCert Memory Model, Version 2. Research report RR-7987, INRIA, June 2012.Google Scholar
- Z. Li, L. Tan, X. Wang, S. Lu, Y. Zhou, and C. Zhai. Have Things Changed Now?: An Empirical Study of Bug Characteristics in Modern Open Source Software. In Proceedings of the 1st Workshop on Architectural and System Support for Improving Software Dependability (ASID), pages 25–33, 2006. ISBN 1-59593-576-2. Google ScholarDigital Library
- N. P. Lopes, D. Menendez, S. Nagarakatte, and J. Regehr. Provably correct peephole optimizations with alive. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 22–32, 2015.. URL http://doi.acm.org/10.1145/2737924.2737965. Google ScholarDigital Library
- S. Lu, S. Park, E. Seo, and Y. Zhou. Learning from Mistakes: A Comprehensive Study on Real World Concurrency Bug Characteristics. In Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 329–339, 2008. ISBN 978-1-59593-958-6. Google ScholarDigital Library
- L. Martignoni, R. Paleari, G. Fresi Roglia, and D. Bruschi. Testing system virtual machines. In Proceedings of the 19th International Symposium on Software Testing and Analysis (ISSTA), pages 171–182, 2010. ISBN 978-1-60558-823-0. Google ScholarDigital Library
- L. Martignoni, R. Paleari, A. Reina, G. F. Roglia, and D. Bruschi. A methodology for testing cpu emulators. ACM Trans. Softw. Eng. Methodol., 22(4):29:1–29:26, Oct. 2013. ISSN 1049-331X. Google ScholarDigital Library
- E. Nagai, H. Awazu, N. Ishiura, and N. Takeda. Random testing of C compilers targeting arithmetic optimization. In Workshop on Synthesis And System Integration of Mixed Information Technologies (SASIMI 2012), pages 48–53, 2012.Google Scholar
- E. Nagai, A. Hashimoto, and N. Ishiura. Scaling up size and number of expressions in random testing of arithmetic optimization of C compilers. In Workshop on Synthesis And System Integration of Mixed Information Technologies (SASIMI 2013), pages 88–93, 2013.Google Scholar
- Plum Hall, Inc. The Plum Hall Validation Suite for C. http://www.plumhall.com/stec.html.Google Scholar
- A. Pnueli, M. Siegel, and E. Singerman. Translation Validation. In 4th International Conference on Tools and Algorithms for Construction and Analysis of Systems (TACAS), pages 151–166, 1998. Google ScholarDigital Library
- J. Regehr, Y. Chen, P. Cuoq, E. Eide, C. Ellison, and X. Yang. Test-case reduction for C compiler bugs. In Proceedings of the 2012 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 335–346, 2012. Google ScholarDigital Library
- S. K. Sahoo, J. Criswell, and V. Adve. An Empirical Study of Reported Bugs in Server Software with Implications for Automated Bug Diagnosis. In Proceedings of the 32Nd ACM/IEEE International Conference on Software Engineering (ICSE), pages 485–494, 2010. ISBN 978-1-60558-719-6. Google ScholarDigital Library
- L. Song and S. Lu. Statistical Debugging for Real-world Performance Problems. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA), pages 561–578, 2014. ISBN 978-1-4503-2585-1.. URL http://doi.acm.org/10.1145/2660193.2660234. Google ScholarDigital Library
- M. Sullivan and R. Chillarege. A Comparison of Software Defects in Database Management Systems and Operating Systems. In Twenty-Second International Symposium on Fault-Tolerant Computing (FTCS), pages 475–484, July 1992.Google ScholarCross Ref
- C. Sun, D. Lo, X. Wang, J. Jiang, and S.-C. Khoo. A Discriminative Model Approach for Accurate Duplicate Bug Report Retrieval. In Proceedings of the 32Nd ACM/IEEE International Conference on Software Engineering (ICSE), pages 45–54, 2010. Google ScholarDigital Library
- C. Sun, J. Du, N. Chen, S.-C. Khoo, and Y. Yang. Mining Explicit Rules for Software Process Evaluation. In ICSSP, pages 118–125, 2013. ISBN 978-1-4503-2062- 7. Google ScholarDigital Library
- C. Sun, V. Le, and Z. Su. Finding and Analyzing Compiler Warning Defects. In Proceedings of the 38th International Conference on Software Engineering (ICSE). ACM, 2016. Google ScholarDigital Library
- F. Thung, S. Wang, D. Lo, and L. Jiang. An Empirical Study of Bugs in Machine Learning Systems. In Software Reliability Engineering (ISSRE), 2012 IEEE 23rd International Symposium on, pages 271–280, Nov 2012. Google ScholarDigital Library
- Y. Tian, D. Lo, and C. Sun. DRONE: Predicting Priority of Reported Bugs by Multi-factor Analysis. In 29th IEEE International Conference on Software Maintenance (ICSM), pages 200–209, Sept 2013. Google ScholarDigital Library
- TIOBE. TIOBE Index for May 2016. http://www. tiobe.com/tiobe index, accessed: 2016-05-15.Google Scholar
- J.-B. Tristan and X. Leroy. Formal Verification of Translation Validators: A Case Study on Instruction Scheduling Optimizations. In Proceedings of the 35th ACM Symposium on Principles of Programming Languages (POPL), pages 17–27, Jan. 2008. Google ScholarDigital Library
- X. Yang, Y. Chen, E. Eide, and J. Regehr. Finding and Understanding Bugs in C Compilers. In Proceedings of the 2011 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 283–294, 2011. Google ScholarDigital Library
- Z. Yin, X. Ma, J. Zheng, Y. Zhou, L. N. Bairavasundaram, and S. Pasupathy. An empirical study on configuration errors in commercial and open source systems. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles (SOSP), pages 159– 172, 2011. ISBN 978-1-4503-0977-6. Google ScholarDigital Library
- Z. Yin, D. Yuan, Y. Zhou, S. Pasupathy, and L. Bairavasundaram. How Do Fixes Become Bugs? In 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering (ESEC/FSE), pages 26–36, 2011. ISBN 978-1-4503- 0443-6. Google ScholarDigital Library
- A. Zeller and R. Hildebrandt. Simplifying and Isolating Failure-Inducing Input. IEEE Trans. Softw. Eng., 28 (2):183–200, Feb. 2002. ISSN 0098-5589. Google ScholarDigital Library
- C. Zhao, Y. Xue, Q. Tao, L. Guo, and Z. Wang. Automated test program generation for an industrial optimizing compiler. In ICSE Workshop on Automation of Software Test (AST), pages 36–43, 2009.Google Scholar
- T. Zimmermann, N. Nagappan, P. J. Guo, and B. Murphy. Characterizing and Predicting Which Bugs Get Reopened. In Proceedings of the 34th International Conference on Software Engineering (ICSE), pages 1074–1083, 2012. Google ScholarDigital Library
Index Terms
-
Toward understanding compiler bugs in GCC and LLVM
-
Recommendations
-
A comprehensive study of deep learning compiler bugs
ESEC/FSE 2021: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software EngineeringThere are increasing uses of deep learning (DL) compilers to generate optimized code, boosting the runtime performance of DL models on specific hardware. Like their traditional counterparts, DL compilers can generate incorrect code, resulting in ...
-
Finding compiler bugs via live code mutation
OOPSLA 2016: Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and ApplicationsValidating optimizing compilers is challenging because it is hard to generate valid test programs (i.e., those that do not expose any undefined behavior). Equivalence Modulo Inputs (EMI) is an effective, promising methodology to tackle this problem. ...
-
Finding and understanding bugs in C compilers
PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and ImplementationCompilers should be correct. To improve the quality of C compilers, we created Csmith, a randomized test-case generation tool, and spent three years using it to find compiler bugs. During this period we reported more than 325 previously unknown bugs to ...
Comments