Abstract
To meet the needs of a diverse range of workloads, asymmetric multicore processors (AMPs) have been proposed, which feature cores of different microarchitecture or ISAs. However, given the diversity inherent in their design and application scenarios, several challenges need to be addressed to effectively architect AMPs and leverage their potential in optimizing both sequential and parallel performance. Several recent techniques address these challenges. In this article, we present a survey of architectural and system-level techniques proposed for designing and managing AMPs. By classifying the techniques on several key characteristics, we underscore their similarities and differences. We clarify the terminology used in this research field and identify challenges that are worthy of future investigation. We hope that more than just synthesizing the existing work on AMPs, the contribution of this survey will be to spark novel ideas for architecting future AMPs that can make a definite impact on the landscape of next-generation computing systems.
- Arunachalam Annamalai, Rance Rodrigues, Israel Koren, and Sandip Kundu. 2013. An opportunistic prediction-based thread scheduling to maximize throughput/watt in AMPs. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT’13). 63--72. Google ScholarDigital Library
- Murali Annavaram, Ed Grochowski, and John Shen. 2005. Mitigating Amdahl’s law through EPI throttling. In Proceedings of the International Symposium on Computer Architecture (ISCA’05). 298--309. Google ScholarDigital Library
- Amin Ansari, Shuguang Feng, Shantanu Gupta, Josep Torrellas, and Scott Mahlke. 2013. Illusionist: Transforming lightweight cores into aggressive cores on demand. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA’13). 436--447. Google ScholarDigital Library
- ARM. 2015a. big.LITTLE Technology. Retrieved December 29, 2015, from http://www.arm.com/products/processors/technologies/biglittleprocessing.php.Google Scholar
- ARM. 2015b. Cortex-A Series Processors. Retrieved December 29, 2015, from http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.set.cortexa/index.html.Google Scholar
- Saisanthosh Balakrishnan, Ravi Rajwar, Mike Upton, and Konrad Lai. 2005. The impact of performance asymmetry in emerging multicore architectures. In Proceedings of the International Symposium on Computer Architecture (ISCA’05). 506--517. Google ScholarDigital Library
- Antonio Barbalace, Marina Sadini, Saif Ansary, Christopher Jelesnianski, Akshay Ravichandran, Cagil Kendir, Alastair Murray, and Binoy Ravindran. 2015. Popcorn: Bridging the programmability gap in heterogeneous-ISA platforms. In Proceedings of the European Conference on Computer Systems (EuroSys’15). 29:1--29:16. Google ScholarDigital Library
- Michela Becchi and Patrick Crowley. 2006. Dynamic thread assignment on heterogeneous multiprocessor architectures. In Proceedings of the Computing Frontiers Conference (CF’06). 29--40. Google ScholarDigital Library
- Jeffery Brown, Leo Porter, and Dean M. Tullsen. 2011. Fast thread migration via cache working set prediction. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA’11). 193--204. Google ScholarDigital Library
- Ting Cao, Stephen M. Blackburn, Tiejun Gao, and Kathryn S. McKinley. 2012. The yin and yang of power and performance for asymmetric hardware and managed software. In Proceedings of the International Symposium on Computer Architecture (ISCA’12). 225--236. Google ScholarDigital Library
- Jian Chen and Lizy Kurian John. 2008. Energy-aware application scheduling on a heterogeneous multi-core system. In Proceedings of the International Symposium on Workload Characterization (IISWC’08). 5--13.Google Scholar
- Jian Chen and Lizy Kurian John. 2009. Efficient program scheduling for heterogeneous multi-core processors. In Proceedings of the Design Automation Conference (DAC’09). 927--930. Google ScholarDigital Library
- Quan Chen and Minyi Guo. 2014. Adaptive workload-aware task scheduling for single-ISA asymmetric multicore architectures. ACM Transactions on Architecture and Code Optimization 11, 1, 8:1--8:25. Google ScholarDigital Library
- Nagabhushan Chitlur, Ganapati Srinivasa, Scott Hahn, Pragya K. Gupta, Dheeraj Reddy, David Koufaty, Paul Brett, Abirami Prabhakaran, Li Zhao, Nelson Ijih, Suchit Subhaschandra, Sabina Grover, Xiaowei Jiang, and Ravi Iyer. 2012. QuickIA: Exploring heterogeneous architectures on real prototypes. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA’12). 1--8. Google ScholarDigital Library
- Jih-Ching Chiu, Yu-Liang Chou, and Po-Kai Chen. 2010. Hyperscalar: A novel dynamically reconfigurable multi-core architecture. In Proceedings of the International Conference on Parallel Processing (ICPP’10). 277--286. Google ScholarDigital Library
- CNXSoft. 2014. ARM Cortex A15/A17 SoCs Comparison—Nvidia Tegra K1 vs Samsung Exynos 5422 vs Rockchip RK3288 vs AllWinner A80. Retrieved December 29, 2015, from http://www.cnx-software.com/2014/05/21/comparison-nvidia-tegra-k1-samsung-exynos-5422-rockchip-rk3288-allwinner-a80/.Google Scholar
- Jason Cong and Bo Yuan. 2012. Energy-efficient scheduling on heterogeneous multi-core architectures. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED’12). 345--350. Google ScholarDigital Library
- Matthew DeVuyst, Ashish Venkat, and Dean M. Tullsen. 2012. Execution migration in a heterogeneous-ISA chip multiprocessor. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’12). 261--272. Google ScholarDigital Library
- Stijn Eyerman and Lieven Eeckhout. 2010. Modeling critical sections in Amdahl’s law and its implications for multicore design. In Proceedings of the International Symposium on Computer Architecture (ISCA’10). 362--370. Google ScholarDigital Library
- Stijn Eyerman and Lieven Eeckhout. 2014. The benefit of SMT in the multi-core era: Flexibility towards degrees of thread-level parallelism. ACM SIGARCH Computer Architecture News 42, 1, 591--606. Google ScholarDigital Library
- Chris Fallin, Chris Wilkerson, and Onur Mutlu. 2014. The heterogeneous block architecture. In Proceedings of the International Conference on Computer Design (ICCD’14). 386--393.Google ScholarCross Ref
- Andrei Frumusanu and Ryan Smith. 2015. ARM A53/A57/T760 Investigated—Samsung Galaxy Note 4 Exynos Review. Retrieved December 29, 2015, from http://www.anandtech.com/show/8718/the-samsung-galaxy-note-4-exynos-rev iew/6.Google Scholar
- Giorgis Georgakoudis, Dimitrios S. Nikolopoulos, and Spyros Lalis. 2013. Fast dynamic binary rewriting to support thread migration in shared-ISA asymmetric multicores. In Proceedings of the International Workshop on Code Optimisation for Multi and Many Cores (COSMIC’13). 4:1--4:10. Google ScholarDigital Library
- Dan Gibson and David A. Wood. 2010. Forwardflow: A scalable core for power-constrained CMPs. ACM SIGARCH Computer Architecture News 38, 14--25. Google ScholarDigital Library
- Lori Gil. 2015. NVIDIAs Tegra X1 Crushes the Competition. Retrieved December 29, 2015, from http://liliputing.com/2015/02/nvidias-tegra-x1-crushes-the-competition.html.Google Scholar
- Ryan E. Grant and Ahmad Afsahi. 2006. Power-performance efficiency of asymmetric multiprocessors for multi-threaded scientific applications. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS’06). Google ScholarDigital Library
- Ed Grochowski, Ronny Ronen, John Shen, and Hong Wang. 2004. Best of both latency and throughput. In Proceedings of the IEEE International Conference on Computer Design (ICCD’04). 236--243. Google ScholarDigital Library
- Michael Gschwind, H. Peter Hofstee, Brian Flachs, Martin Hopkins, Yukio Watanabe, and Takeshi Yamazaki. 2006. Synergistic processing in Cell’s multicore architecture. IEEE Micro 26, 2, 10--24. Google ScholarDigital Library
- Divya P. Gulati, Changkyu Kim, Simha Sethumadhavan, Stephen W. Keckler, and Doug Burger. 2008. Multitasking workload scheduling on flexible-core chip multiprocessors. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT’08). 187--196. Google ScholarDigital Library
- Shantanu Gupta, Shuguang Feng, Amin Ansari, and Scott Mahlke. 2010. Erasing core boundaries for robust and configurable performance. In Proceedings of the International Symposium on Microarchitecture (MICRO’10). 325--336. Google ScholarDigital Library
- Vishal Gupta and Ripal Nathuji. 2010. Analyzing performance asymmetric multicore processors for latency sensitive datacenter applications. In Proceedings of the Workshop on Power Aware Computing and Systems (HotPower’10). 1--8. Google ScholarDigital Library
- Anthony Gutierrez, Ronald G. Dreslinski, and Trevor Mudge. 2014. Evaluating private vs. shared last-level caches for energy efficiency in asymmetric multi-cores. In Proceedings of the International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS’14). 191--198.Google ScholarCross Ref
- Mark D. Hill and Michael R. Marty. 2008. Amdahl’s law in the multicore era. IEEE Computer 7, 33--38. Google ScholarDigital Library
- Houman Homayoun, Vasileios Kontorinis, Amirali Shayan, Ta-Wei Lin, and Dean M. Tullsen. 2012. Dynamically heterogeneous cores through 3D resource pooling. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA’12). 1--12. Google ScholarDigital Library
- Tomas Hruby, Herbert Bos, and Andrew S. Tanenbaum. 2013. When slower is faster: On heterogeneous multicores for reliable systems. In Proceedings of the USENIX Annual Technical Conference (ATC’13). 255--266. Google ScholarDigital Library
- Ineda. 2015. Ineda Dhanush Wearable Processing Unit.Google Scholar
- Engin Ipek, Meyrem Kirman, Nevin Kirman, and Jose F. Martinez. 2007. Core fusion: Accommodating software diversity in chip multiprocessors. In Proceedings of the International Symposium on Computer Architecture (ISCA’07). 186--197. Google ScholarDigital Library
- Brian Jeff. 2012. Big.LITTLE system architecture from ARM: Saving power through heterogeneous multiprocessing and task context migration. In Proceedings of the ACM Design Automation Conference (DAC’12).Google ScholarCross Ref
- José A. Joao, M. Aater Suleman, Onur Mutlu, and Yale N. Patt. 2012. Bottleneck identification and scheduling in multithreaded applications. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’12). 223--234. Google ScholarDigital Library
- José A. Joao, M. Aater Suleman, Onur Mutlu, and Yale N. Patt. 2013. Utility-based acceleration of multithreaded applications on asymmetric CMPs. In Proceedings of the International Symposium on Computer Architecture (ISCA’13). 154--165. Google ScholarDigital Library
- B. H. H. Juurlink and C. H. Meenderinck. 2012. Amdahl’s law for predicting the future of multicores considered harmful. ACM SIGARCH Computer Architecture News 40, 2, 1--9. Google ScholarDigital Library
- Vahid Kazempour, Ali Kamali, and Alexandra Fedorova. 2010. AASH: An asymmetry-aware scheduler for hypervisors. ACM SIGPLAN Notices 45, 7, 85--96. Google ScholarDigital Library
- Omer Khan and Sandip Kundu. 2010. A self-adaptive scheduler for asymmetric multi-cores. In Proceedings of the ACM Great Lakes Symposium on VLSI (GLSVLSI’10). 397--400. Google ScholarDigital Library
- Khubaib Khubaib, M. Aater Suleman, Milad Hashemi, Chris Wilkerson, and Yale N. Patt. 2012. MorphCore: An energy-efficient microarchitecture for high performance ILP and high throughput TLP. In Proceedings of the International Symposium on Microarchitecture (MICRO’12). 305--316. Google ScholarDigital Library
- Changkyu Kim, Simha Sethumadhavan, Madhu S. Govindan, Nitya Ranganathan, Divya Gulati, Doug Burger, and Stephen W. Keckler. 2007. Composable lightweight processors. In Proceedings of the International Symposium on Microarchitecture (MICRO’07). 381--394. Google ScholarDigital Library
- Jun Kim, Joonwon Lee, and Jinkyu Jeong. 2015. Exploiting asymmetric CPU performance for fast startup of subsystem in mobile smart devices. IEEE Transactions on Consumer Electronics 61, 1, 103--111.Google ScholarDigital Library
- Myungsun Kim, Kibeom Kim, James R. Geraci, and Seongsoo Hong. 2014. Utilization-aware load balancing for the energy efficient operation of the big.LITTLE processor. In Proceedings of the Conference on Design, Automation, and Test in Europe (DATE’14). 223:1--223:4. Google ScholarDigital Library
- Byeong-Moon Ko, Joonwon Lee, and Heeseung Jo. 2012. AMP aware core allocation scheme for mobile devices. In Proceedings of the IEEE Spring Congress on Engineering and Technology (S-CET’12). 1--4.Google ScholarCross Ref
- David Koufaty, Dheeraj Reddy, and Scott Hahn. 2010. Bias scheduling in heterogeneous multi-core architectures. In Proceedings of the European Conference on Computer Systems (EuroSys’10). 125--138. Google ScholarDigital Library
- Rakesh Kumar, Keith I. Farkas, Norman P. Jouppi, Parthasarathy Ranganathan, and Dean M. Tullsen. 2003. Single-ISA heterogeneous multi-core architectures: The potential for processor power reduction. In Proceedings of the International Symposium on Microarchitecture (MICRO’03). 81--92. Google ScholarDigital Library
- Rakesh Kumar, Norman P. Jouppi, and Dean M. Tullsen. 2004a. Conjoined-core chip multiprocessing. In Proceedings of the International Symposium on Microarchitecture (MICRO’04). 195--206. Google ScholarDigital Library
- Rakesh Kumar, Dean M. Tullsen, and Norman P. Jouppi. 2006. Core architecture optimization for heterogeneous chip multiprocessors. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT’06). 23--32. Google ScholarDigital Library
- Rakesh Kumar, Dean M. Tullsen, Parthasarathy Ranganathan, Norman P. Jouppi, and Keith I. Farkas. 2004b. Single-ISA heterogeneous multi-core architectures for multithreaded workload performance. ACM SIGARCH Computer Architecture News 32, 64. Google ScholarDigital Library
- Youngjin Kwon, Changdae Kim, Seungryoul Maeng, and Jaehyuk Huh. 2011. Virtualizing performance asymmetric multi-core systems. In Proceedings of the International Symposium on Computer Architecture (ISCA’11). 45--56. Google ScholarDigital Library
- Nagesh B. Lakshminarayana and Hyesoon Kim. 2008. Understanding performance, power and energy behavior in asymmetric multiprocessors. In Proceedings of the International Conference on Computer Design (ICCD’08). 471--477.Google Scholar
- Nagesh B. Lakshminarayana, Jaekyu Lee, and Hyesoon Kim. 2009. Age based scheduling for asymmetric multiprocessors. In Proceedings of the Conference on High Performance Computing Networking, Storage, and Analysis (SC’09). 25:1--25:12. Google ScholarDigital Library
- Tong Li, Dan Baumberger, David A. Koufaty, and Scott Hahn. 2007. Efficient operating system scheduling for performance-asymmetric multi-core architectures. In Proceedings of the ACM/IEEE Conference on Supercomputing (SC’07). 53:1--53:11. Google ScholarDigital Library
- Tong Li, Paul Brett, Rob Knauerhase, David Koufaty, Dheeraj Reddy, and Scott Hahn. 2010. Operating system support for overlapping-ISA heterogeneous multi-core architectures. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA’10). 1--12.Google Scholar
- Felix Xiaozhu Lin, Zhen Wang, Robert LiKamWa, and Lin Zhong. 2012. Reflex: Using low-power processors in smartphones without knowing them. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’12). 13--24. Google ScholarDigital Library
- Felix Xiaozhu Lin, Zhen Wang, and Lin Zhong. 2014. K2: A mobile operating system for heterogeneous coherence domains. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’14). 285--300. Google ScholarDigital Library
- Guangshuo Liu, Jinpyo Park, and Diana Marculescu. 2013. Dynamic thread mapping for high-performance, power-efficient heterogeneous many-core systems. In Proceedings of the International Conference on Computer Design (ICCD’13). 54--61.Google ScholarCross Ref
- Andrew Lukefahr, Shruti Padmanabha, Reetuparna Das, Ronald Dreslinski Jr., Thomas F. Wenisch, and Scott Mahlke. 2014. Heterogeneous microarchitectures trump voltage scaling for low-power cores. In Proceedings of the International Conference on Parallel Architectures and Compilation (PACT’14). 237--250. Google ScholarDigital Library
- Andrew Lukefahr, Shruti Padmanabha, Reetuparna Das, Faissal M. Sleiman, Ronald Dreslinski, Thomas F. Wenisch, and Scott Mahlke. 2012. Composite cores: Pushing heterogeneity into a core. In Proceedings of the International Symposium on Microarchitecture (MICRO’12). 317--328. Google ScholarDigital Library
- Yangchun Luo, Venkatesan Packirisamy, Wei-Chung Hsu, and Antonia Zhai. 2010. Energy efficient speculative threads: Dynamic thread allocation in same-ISA heterogeneous multicore systems. In Proceedings of the International Conference on Parallel Architectures and Compilation (PACT’10). 453--464. Google ScholarDigital Library
- Daniel Lustig, Caroline Trippel, Michael Pellauer, and Margaret Martonosi. 2015. ArMOR: Defending against memory consistency model mismatches in heterogeneous architectures. In Proceedings of the International Symposium on Computer Architecture (ISCA’15). 388--400. Google ScholarDigital Library
- Felipe Lopes Madruga, Henrique C. Freitas, and Philippe Olivier Alexandre Navaux. 2010. Parallel shared-memory workloads performance on asymmetric multi-core architectures. In Proceedings of the Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP’10). 163--169. Google ScholarDigital Library
- N. Markovic, D. Nemirovsky, O. Unsal, M. Valero, and A. Cristal. 2014. Thread lock section-aware scheduling on asymmetric single-ISA multi-core. IEEE Computer Architecture Letters 14, 2, 160--163. DOI:http://dx.doi.org/10.1109/LCA.2014.2357805 Google ScholarDigital Library
- Sparsh Mittal. 2014a. A survey of techniques for improving energy efficiency in embedded computing systems. International Journal of Computer Aided Engineering and Technology 6, 4, 440--459.Google ScholarCross Ref
- Sparsh Mittal. 2014b. Power Management Techniques for Data Centers: A Survey. Technical Report ORNL/TM-2014/381. Oak Ridge National Laboratory, Oak Ridge, TN.Google Scholar
- Sparsh Mittal, Matthew Poremba, Jeffrey Vetter, and Yuan Xie. 2014. Exploring Design Space of 3D NVM and eDRAM Caches Using DESTINY Tool. Technical Report ORNL/TM-2014/636. Oak Ridge National Laboratory, Oak Ridge, TN.Google Scholar
- Sparsh Mittal and Jeffrey Vetter. 2015. A survey of CPU-GPU heterogeneous computing techniques. ACM Computing Surveys 47, 4, 69:1--69:35. Google ScholarDigital Library
- Jeffrey C. Mogul, Jayaram Mudigonda, Nathan Binkert, Parthasarathy Ranganathan, and Vanish Talwar. 2008. Using asymmetric single-ISA CMPs to save energy on operating systems. IEEE Micro 28, 3, 26--41. Google ScholarDigital Library
- Tomer Y. Morad, Avinoam Kolodny, and Uri C. Weiser. 2010. Scheduling multiple multithreaded applications on asymmetric and symmetric chip multiprocessors. In Proceedings of the International Symposium on Parallel Architectures, Algorithms, and Programming (PAAP’10). 65--72. Google ScholarDigital Library
- Tomer Y. Morad, Uri C. Weiser, Avinoam Kolodny, Mateo Valero, and Eduard Ayguade. 2006. Performance, power efficiency and scalability of asymmetric cluster chip multiprocessors. Computer Architecture Letters 5, 1, 14--17. Google ScholarDigital Library
- Tobias Mühlbauer, Wolf Rödiger, Robert Seilbeck, Alfons Kemper, and Thomas Neumann. 2014. Heterogeneity-conscious parallel query execution: Getting a better mileage while driving faster! In Proceedings of the International Workshop on Data Management on New Hardware (DaMoN’14). 2:1--2:10. Google ScholarDigital Library
- Janani Mukundan, Saugata Ghose, Robert Karmazin, Engin Ipek, and José F. Martínez. 2012. Overcoming single-thread performance hurdles in the core fusion reconfigurable multicore architecture. In Proceedings of the International Conference on Supercomputing (ICS’12). 101--110. Google ScholarDigital Library
- Thannirmalai Somu Muthukaruppan, Anuj Pathania, and Tulika Mitra. 2014. Price theory based power management for heterogeneous multi-cores. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’14). 161--176. Google ScholarDigital Library
- Thannirmalai Somu Muthukaruppan, Mihai Pricopi, Vanchinathan Venkataramani, Tulika Mitra, and Sanjay Vishin. 2013. Hierarchical power management for asymmetric multi-core in dark silicon era. In Proceedings of the Design Automation Conference (DAC’13). 174. Google ScholarDigital Library
- Hashem Hashemi Najaf-Abadi, Niket Kumar Choudhary, and Eric Rotenberg. 2009. Core-selectability in chip multiprocessors. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT’09). 113--122. Google ScholarDigital Library
- Hashem H. Najaf-Abadi and Eric Rotenberg. 2009. Architectural contesting. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA’09). 189--200.Google Scholar
- Sandeep Navada, Niket K. Choudhary, Salil V. Wadhavkar, and Eric Rotenberg. 2013. A unified view of non-monotonic core selection and application steering in heterogeneous chip multiprocessors. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. 133--144. Google ScholarDigital Library
- Rajiv Nishtala, Daniel Mossé, and Vinicius Petrucci. 2013. Energy-aware thread co-location in heterogeneous multicore processors. In Proceedings of the International Conference on Embedded Software (EMSOFT’13). 1--9. Google ScholarDigital Library
- NVIDIA. 2011. Variable SMP—A Multi-Core CPU Architecture for Low Power and High Performance. Retrieved December 29, 2015, from http://www.nvidia.com/content/PDF/tegra_white_papers/tegra-whitepaper-0 911b.pdf.Google Scholar
- Shruti Padmanabha, Andrew Lukefahr, Reetuparna Das, and Scott Mahlke. 2013. Trace based phase prediction for tightly-coupled heterogeneous cores. In Proceedings of the International Symposium on Microarchitecture. 445--456. Google ScholarDigital Library
- Sankaralingam Panneerselvam and Michael M. Swift. 2012. Chameleon: Operating system support for dynamic processors. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’12). 99--110. Google ScholarDigital Library
- George Patsilaras, Niket K. Choudhary, and James Tuck. 2012. Efficiently exploiting memory level parallelism on asymmetric coupled cores in the dark silicon era. ACM Transactions on Architecture and Code Optimization 8, 4, 28:1--28:21. Google ScholarDigital Library
- Miquel Pericas, Adrian Cristal, Francisco J. Cazorla, Ruben Gonzalez, Daniel A. Jimenez, and Mateo Valero. 2007. A flexible heterogeneous multi-core architecture. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques (PACT’07). 13--24. Google ScholarDigital Library
- Vinicius Petrucci, Orlando Loques, and Daniel Mossé. 2012. Lucky scheduling for energy-efficient heterogeneous multi-core systems. In Proceedings of the USENIX Conference on Power-Aware Computing and Systems (HotPower’12). Google ScholarDigital Library
- Dmitry Ponomarev, Gurhan Kucuk, and Kanad Ghose. 2001. Reducing power requirements of instruction scheduling through dynamic allocation of multiple datapath resources. In Proceedings of the International Symposium on Microarchitecture. 90--101. Google ScholarDigital Library
- Mihai Pricopi and Tulika Mitra. 2012. Bahurupi: A polymorphic heterogeneous multi-core architecture. ACM Transactions on Architecture and Code Optimization 8, 4, 22:1--22:21. Google ScholarDigital Library
- Mihai Pricopi and Tulika Mitra. 2014. Task scheduling on adaptive multi-core. IEEE Transactions on Computers 63, 10, 2590--2603. Google ScholarDigital Library
- Mihai Pricopi, Thannirmalai Somu Muthukaruppan, Vanchinathan Venkataramani, Tulika Mitra, and Sanjay Vishin. 2013. Power-performance modeling on asymmetric multi-cores. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES’13). 1--10. Google ScholarDigital Library
- Moo-Ryong Ra, Bodhi Priyantha, Aman Kansal, and Jie Liu. 2012. Improving energy efficiency of personal sensing applications with heterogeneous multi-processors. In Proceedings of the ACM Conference on Ubiquitous Computing (Ubicomp’12). 1--10. Google ScholarDigital Library
- M. Mustafa Rafique, Benjamin Rose, Ali R. Butt, and Dimitrios S. Nikolopoulos. 2009. Supporting MapReduce on large-scale asymmetric multi-core clusters. ACM SIGOPS Operating Systems Review 43, 2, 25--34. Google ScholarDigital Library
- Behnam Robatmili, Dong Li, Hadi Esmaeilzadeh, Sibi Govindan, Aaron Smith, Andrew Putnam, Doug Burger, and Stephen W. Keckler. 2013. How to implement effective prediction and forwarding for fusable dynamic multicore architectures. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA’13). 460--471. Google ScholarDigital Library
- Rance Rodrigues, Arunachalam Annamalai, Israel Koren, Sandip Kundu, and Omer Khan. 2011. Performance per watt benefits of dynamic core morphing in asymmetric multicores. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT’11). 121--130. Google ScholarDigital Library
- Rance Rodrigues, Israel Koren, and Sandip Kundu. 2014. Performance and power benefits of sharing execution units between a high performance core and a low power core. In Proceedings of the International Conference on VLSI Design (VLSID’14). 204--209. Google ScholarDigital Library
- Juan Carlos Saez, Alexandra Fedorova, David Koufaty, and Manuel Prieto. 2012. Leveraging core specialization via OS scheduling to improve performance on asymmetric multicore systems. ACM Transactions on Computer Systems 30, 2, 6:1--6:38. Google ScholarDigital Library
- Juan Carlos Saez, Alexandra Fedorova, Manuel Prieto, and Hugo Vegas. 2010. Operating system support for mitigating software scalability bottlenecks on asymmetric multicore processors. In Proceedings of the Computing Frontiers Conference (CF’10). 31--40. Google ScholarDigital Library
- Juan Carlos Saez, Adrian Pousa, Fernando Castro, Daniel Chaver, and Manuel Prieto-Matias. 2015. ACFS: A completely fair scheduler for asymmetric single-ISA multicore systems. In Proceedings of the ACM Symposium on Applied Computing (SAC’15). Google ScholarDigital Library
- Pierre Salverda and Craig Zilles. 2008. Fundamental performance constraints in horizontal fusion of in-order cores. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA’08). 252--263.Google ScholarCross Ref
- Samsung. 2013. SAMSUNG Highlights Innovations in Mobile Experiences Driven by Components, in CES Keynote. Retrieved December 29, 2015, from http://www.samsung.com/us/news/20353.Google Scholar
- Karthikeyan Sankaralingam, Ramadass Nagarajan, Haiming Liu, Changkyu Kim, Jaehyuk Huh, Doug Burger, Stephen W. Keckler, and Charles R. Moore. 2003. Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture. In Proceedings of the International Symposium on Computer Architecture (ISCA’03). 422--433. Google ScholarDigital Library
- Lina Sawalha and Ronald D. Barnes. 2012. Energy-efficient phase-aware scheduling for heterogeneous multicore processors. In Proceedings of the IEEE Green Technologies Conference. 1--6.Google Scholar
- Daniel Shelepov, Juan Carlos Saez Alcaide, Stacey Jeffery, Alexandra Fedorova, Nestor Perez, Zhi Feng Huang, Sergey Blagodurov, and Viren Kumar. 2009. HASS: A scheduler for heterogeneous multicore systems. ACM SIGOPS Operating Systems Review 43, 2, 66--75. Google ScholarDigital Library
- Tyler Sondag and Hridesh Rajan. 2009. Phase-guided thread-to-core assignment for improved utilization of performance-asymmetric multi-core processors. In Proceedings of the ICSE Workshop on Multicore Software Engineering. 73--80. Google ScholarDigital Library
- Sudarshan Srinivasan, Nithesh Kurella, Israel Koren, and Sandip Kundu. 2015. Exploring heterogeneity within a core for improved power efficiency. IEEE Transactions on Parallel and Distributed Systems PP, 99, 1.Google Scholar
- Sudarshan Srinivasan, Rance Rodrigues, Arunachalam Annamalai, Israel Koren, and Sandip Kundu. 2013. A study on polymorphing superscalar processor dynamically to improve power efficiency. In Proceedings of the IEEE Computer Society Annual Symposium on VLSI (ISVLSI’13). 46--51.Google ScholarCross Ref
- Sadagopan Srinivasan, Li Zhao, Ramesh Illikkal, and Ravishankar Iyer. 2011. Efficient interaction between OS and architecture in heterogeneous platforms. ACM SIGOPS Operating Systems Review 45, 1, 62--72. Google ScholarDigital Library
- Richard Strong, Jayaram Mudigonda, Jeffrey C. Mogul, Nathan Binkert, and Dean Tullsen. 2009. Fast switching of threads between cores. ACM SIGOPS Operating Systems Review 43, 2, 35--45. Google ScholarDigital Library
- M. Aater Suleman, Onur Mutlu, José A. Joao, Khubaib, and Yale Patt. 2010. Data marshaling for multi-core architectures. In Proceedings of the International Symposium on Computer Architecture (ISCA’10). 441--450. Google ScholarDigital Library
- M. Aater Suleman, Onur Mutlu, Moinuddin K. Qureshi, and Yale N. Patt. 2009. Accelerating critical section execution with asymmetric multi-core architectures. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’09). 253--264. Google ScholarDigital Library
- M. Aater Suleman, Yale N. Patt, Eric Sprangle, Anwar Rohillah, Anwar Ghuloum, and Doug Carmean. 2007. Asymmetric Chip Multiprocessors: Balancing Hardware Efficiency and Programmer Efficiency. TR-HPS-2007-001. University of Texas, Austin, TX.Google Scholar
- Hsin-Ching Sun, Bor-Yeh Shen, Wuu Yang, and Jenq-Kuen Lee. 2011. Migrating Java threads with fuzzy control on asymmetric multicore systems for better energy delay product. In Proceedings of the International Conference on Computing and Security.Google Scholar
- Tao Sun, Hong An, Tao Wang, Haibo Zhang, and Xiufeng Sui. 2012. CRQ-based fair scheduling on composable multicore architectures. In Proceedings of the International Conference on Supercomputing (ICS’12). 173--184. Google ScholarDigital Library
- Ibrahim Takouna, Wesam Dawoud, and Christoph Meinel. 2011. Efficient virtual machine scheduling-policy for virtualized heterogeneous multicore systems. In Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA’11).Google Scholar
- David Tarjan, Michael Boyer, and Kevin Skadron. 2008. Federation: Repurposing scalar cores for out-of-order instruction issue. In Proceedings of the Design Automation Conference (DAC’08). 772--775. Google ScholarDigital Library
- Kenzo Van Craeynest, Shoaib Akram, Wim Heirman, Aamer Jaleel, and Lieven Eeckhout. 2013. Fairness-aware scheduling on single-ISA heterogeneous multi-cores. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT’13). 177--187. Google ScholarDigital Library
- Kenzo Van Craeynest and Lieven Eeckhout. 2013. Understanding fundamental design choices in single-ISA heterogeneous multicore architectures. ACM Transactions on Architecture and Code Optimization 9, 4, 32. Google ScholarDigital Library
- Kenzo Van Craeynest, Aamer Jaleel, Lieven Eeckhout, Paolo Narvaez, and Joel Emer. 2012. Scheduling heterogeneous multi-cores through performance impact estimation (PIE). In Proceedings of the International Symposium on Computer Architecture (ISCA’12). 213--224. Google ScholarDigital Library
- Ashish Venkat and Dean M. Tullsen. 2014. Harnessing ISA diversity: Design of a heterogeneous-ISA chip multiprocessor. In Proceedings of the International Symposium on Computer Architecture (ISCA’14). 121--132. Google ScholarDigital Library
- Jeffrey Vetter and Sparsh Mittal. 2015. Opportunities for nonvolatile memory systems in extreme-scale high performance computing. Computing in Science and Engineering 17, 2, 73--82.Google ScholarDigital Library
- Carl A. Waldspurger and William E. Weihl. 1994. Lottery scheduling: Flexible proportional-share resource management. In Proceedings of the USENIX Conference on Operating Systems Design and Implementation (OSDI’94). Google ScholarDigital Library
- Yasuko Watanabe, John D. Davis, and David A. Wood. 2010. WiDGET: Wisconsin decoupled grid execution tiles. In Proceedings of the International Symposium on Computer Architecture (ISCA’10), Vol. 38. 2--13. Google ScholarDigital Library
- Ryan Whitwam. 2014. Qualcomm Unveils 64-Bit Snapdragon 808 and 810 SoCs: The Apple A7 Stop-Gap Measures Continue. Retrieved December 29, 2015, from http://goo.gl/v4ywMW.Google Scholar
- Youfeng Wu, Shiliang Hu, Edson Borin, and Cheng Wang. 2011. A HW/SW co-designed heterogeneous multi-core virtual machine for energy-efficient general purpose computing. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’11). 236--245. Google ScholarDigital Library
- Ying Zhang, Lide Duan, Bin Li, Lu Peng, and Srinivasan Sadagopan. 2014a. Energy efficient job scheduling in single-ISA heterogeneous chip-multiprocessors. In Proceedings of the International Symposium on Quality Electronic Design (ISQED’14). 660--666.Google ScholarCross Ref
- Ying Zhang, Li Zhao, Ramesh Illikkal, Ravi Iyer, Andrew Herdrich, and Lu Peng. 2014b. QoS management on heterogeneous architecture for parallel applications. In Proceedings of the IEEE International Conference on Computer Design (ICCD’14). 332--339.Google ScholarCross Ref
- Hongtao Zhong, Steven A. Lieberman, and Scott A. Mahlke. 2007. Extending multicore architectures to exploit hybrid parallelism in single-thread applications. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA’07). 25--36. Google ScholarDigital Library
- Yuhao Zhu and Vijay Janapa Reddi. 2013. High-performance and energy-efficient mobile web browsing on big/little systems. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA’13). 13--24. Google ScholarDigital Library
Index Terms
-
A Survey of Techniques for Architecting and Managing Asymmetric Multicore Processors
-
Recommendations
-
COLAB: a collaborative multi-factor scheduler for asymmetric multicore processors
CGO 2020: Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and OptimizationIncreasingly prevalent asymmetric multicore processors (AMP) are necessary for delivering performance in the era of limited power budget and dark silicon. However, the software fails to use them efficiently. OS schedulers, in particular, handle ...
-
HASpGEMM: Heterogeneity-Aware Sparse General Matrix-Matrix Multiplication on Modern Asymmetric Multicore Processors
ICPP '23: Proceedings of the 52nd International Conference on Parallel ProcessingSparse general matrix-matrix multiplication (SpGEMM) is an important kernel in computational science and engineering, and has been widely studied on homogeneous processors, e.g., CPUs and GPUs. Recently, the asymmetric multicore processors (AMPs), ...
-
Acceleration of bulk memory operations in a heterogeneous multicore architecture
PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniquesIn this paper, we present a novel approach of using the integrated GPU to accelerate conventional operations that are normally performed by the CPUs, the bulk memory operations, such as memcpy or memset. Offloading the bulk memory operations to the GPU ...
Comments