A Survey of Techniques for Architecting and Managing Asymmetric Multicore Processors

Author:
Sparsh Mittal

Oak Ridge National Laboratory, Tennessee, USA

Oak Ridge National Laboratory, Tennessee, USA

0000-0002-2908-993X
View Profile

Authors Info & Claims

ACM Computing Surveys Volume 48 Issue 3Article No.: 45pp 1–38https://doi.org/10.1145/2856125

Published:08 February 2016Publication History

Get Citation Alerts

New Citation Alert added!

This alert has been successfully added and will be sent to:

You will be notified whenever a record that you have chosen has been cited.

To manage your alert preferences, click on the button below.
Manage my Alerts

New Citation Alert!

Please log in to your account
Publisher Site

eReader
PDF

ACM Computing Surveys

Abstract

To meet the needs of a diverse range of workloads, asymmetric multicore processors (AMPs) have been proposed, which feature cores of different microarchitecture or ISAs. However, given the diversity inherent in their design and application scenarios, several challenges need to be addressed to effectively architect AMPs and leverage their potential in optimizing both sequential and parallel performance. Several recent techniques address these challenges. In this article, we present a survey of architectural and system-level techniques proposed for designing and managing AMPs. By classifying the techniques on several key characteristics, we underscore their similarities and differences. We clarify the terminology used in this research field and identify challenges that are worthy of future investigation. We hope that more than just synthesizing the existing work on AMPs, the contribution of this survey will be to spark novel ideas for architecting future AMPs that can make a definite impact on the landscape of next-generation computing systems.

References

Arunachalam Annamalai, Rance Rodrigues, Israel Koren, and Sandip Kundu. 2013. An opportunistic prediction-based thread scheduling to maximize throughput/watt in AMPs. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT’13). 63--72. Google ScholarDigital Library
Murali Annavaram, Ed Grochowski, and John Shen. 2005. Mitigating Amdahl’s law through EPI throttling. In Proceedings of the International Symposium on Computer Architecture (ISCA’05). 298--309. Google ScholarDigital Library
Amin Ansari, Shuguang Feng, Shantanu Gupta, Josep Torrellas, and Scott Mahlke. 2013. Illusionist: Transforming lightweight cores into aggressive cores on demand. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA’13). 436--447. Google ScholarDigital Library
ARM. 2015a. big.LITTLE Technology. Retrieved December 29, 2015, from http://www.arm.com/products/processors/technologies/biglittleprocessing.php.Google Scholar
ARM. 2015b. Cortex-A Series Processors. Retrieved December 29, 2015, from http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.set.cortexa/index.html.Google Scholar
Saisanthosh Balakrishnan, Ravi Rajwar, Mike Upton, and Konrad Lai. 2005. The impact of performance asymmetry in emerging multicore architectures. In Proceedings of the International Symposium on Computer Architecture (ISCA’05). 506--517. Google ScholarDigital Library
Antonio Barbalace, Marina Sadini, Saif Ansary, Christopher Jelesnianski, Akshay Ravichandran, Cagil Kendir, Alastair Murray, and Binoy Ravindran. 2015. Popcorn: Bridging the programmability gap in heterogeneous-ISA platforms. In Proceedings of the European Conference on Computer Systems (EuroSys’15). 29:1--29:16. Google ScholarDigital Library
Michela Becchi and Patrick Crowley. 2006. Dynamic thread assignment on heterogeneous multiprocessor architectures. In Proceedings of the Computing Frontiers Conference (CF’06). 29--40. Google ScholarDigital Library
Jeffery Brown, Leo Porter, and Dean M. Tullsen. 2011. Fast thread migration via cache working set prediction. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA’11). 193--204. Google ScholarDigital Library
Ting Cao, Stephen M. Blackburn, Tiejun Gao, and Kathryn S. McKinley. 2012. The yin and yang of power and performance for asymmetric hardware and managed software. In Proceedings of the International Symposium on Computer Architecture (ISCA’12). 225--236. Google ScholarDigital Library
Jian Chen and Lizy Kurian John. 2008. Energy-aware application scheduling on a heterogeneous multi-core system. In Proceedings of the International Symposium on Workload Characterization (IISWC’08). 5--13.Google Scholar
Jian Chen and Lizy Kurian John. 2009. Efficient program scheduling for heterogeneous multi-core processors. In Proceedings of the Design Automation Conference (DAC’09). 927--930. Google ScholarDigital Library
Quan Chen and Minyi Guo. 2014. Adaptive workload-aware task scheduling for single-ISA asymmetric multicore architectures. ACM Transactions on Architecture and Code Optimization 11, 1, 8:1--8:25. Google ScholarDigital Library
Nagabhushan Chitlur, Ganapati Srinivasa, Scott Hahn, Pragya K. Gupta, Dheeraj Reddy, David Koufaty, Paul Brett, Abirami Prabhakaran, Li Zhao, Nelson Ijih, Suchit Subhaschandra, Sabina Grover, Xiaowei Jiang, and Ravi Iyer. 2012. QuickIA: Exploring heterogeneous architectures on real prototypes. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA’12). 1--8. Google ScholarDigital Library
Jih-Ching Chiu, Yu-Liang Chou, and Po-Kai Chen. 2010. Hyperscalar: A novel dynamically reconfigurable multi-core architecture. In Proceedings of the International Conference on Parallel Processing (ICPP’10). 277--286. Google ScholarDigital Library
CNXSoft. 2014. ARM Cortex A15/A17 SoCs Comparison—Nvidia Tegra K1 vs Samsung Exynos 5422 vs Rockchip RK3288 vs AllWinner A80. Retrieved December 29, 2015, from http://www.cnx-software.com/2014/05/21/comparison-nvidia-tegra-k1-samsung-exynos-5422-rockchip-rk3288-allwinner-a80/.Google Scholar
Jason Cong and Bo Yuan. 2012. Energy-efficient scheduling on heterogeneous multi-core architectures. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED’12). 345--350. Google ScholarDigital Library
Matthew DeVuyst, Ashish Venkat, and Dean M. Tullsen. 2012. Execution migration in a heterogeneous-ISA chip multiprocessor. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’12). 261--272. Google ScholarDigital Library
Stijn Eyerman and Lieven Eeckhout. 2010. Modeling critical sections in Amdahl’s law and its implications for multicore design. In Proceedings of the International Symposium on Computer Architecture (ISCA’10). 362--370. Google ScholarDigital Library
Stijn Eyerman and Lieven Eeckhout. 2014. The benefit of SMT in the multi-core era: Flexibility towards degrees of thread-level parallelism. ACM SIGARCH Computer Architecture News 42, 1, 591--606. Google ScholarDigital Library
Chris Fallin, Chris Wilkerson, and Onur Mutlu. 2014. The heterogeneous block architecture. In Proceedings of the International Conference on Computer Design (ICCD’14). 386--393.Google ScholarCross Ref
Andrei Frumusanu and Ryan Smith. 2015. ARM A53/A57/T760 Investigated—Samsung Galaxy Note 4 Exynos Review. Retrieved December 29, 2015, from http://www.anandtech.com/show/8718/the-samsung-galaxy-note-4-exynos-rev iew/6.Google Scholar
Giorgis Georgakoudis, Dimitrios S. Nikolopoulos, and Spyros Lalis. 2013. Fast dynamic binary rewriting to support thread migration in shared-ISA asymmetric multicores. In Proceedings of the International Workshop on Code Optimisation for Multi and Many Cores (COSMIC’13). 4:1--4:10. Google ScholarDigital Library
Dan Gibson and David A. Wood. 2010. Forwardflow: A scalable core for power-constrained CMPs. ACM SIGARCH Computer Architecture News 38, 14--25. Google ScholarDigital Library
Lori Gil. 2015. NVIDIAs Tegra X1 Crushes the Competition. Retrieved December 29, 2015, from http://liliputing.com/2015/02/nvidias-tegra-x1-crushes-the-competition.html.Google Scholar
Ryan E. Grant and Ahmad Afsahi. 2006. Power-performance efficiency of asymmetric multiprocessors for multi-threaded scientific applications. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS’06). Google ScholarDigital Library
Ed Grochowski, Ronny Ronen, John Shen, and Hong Wang. 2004. Best of both latency and throughput. In Proceedings of the IEEE International Conference on Computer Design (ICCD’04). 236--243. Google ScholarDigital Library
Michael Gschwind, H. Peter Hofstee, Brian Flachs, Martin Hopkins, Yukio Watanabe, and Takeshi Yamazaki. 2006. Synergistic processing in Cell’s multicore architecture. IEEE Micro 26, 2, 10--24. Google ScholarDigital Library
Divya P. Gulati, Changkyu Kim, Simha Sethumadhavan, Stephen W. Keckler, and Doug Burger. 2008. Multitasking workload scheduling on flexible-core chip multiprocessors. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT’08). 187--196. Google ScholarDigital Library
Shantanu Gupta, Shuguang Feng, Amin Ansari, and Scott Mahlke. 2010. Erasing core boundaries for robust and configurable performance. In Proceedings of the International Symposium on Microarchitecture (MICRO’10). 325--336. Google ScholarDigital Library
Vishal Gupta and Ripal Nathuji. 2010. Analyzing performance asymmetric multicore processors for latency sensitive datacenter applications. In Proceedings of the Workshop on Power Aware Computing and Systems (HotPower’10). 1--8. Google ScholarDigital Library
Anthony Gutierrez, Ronald G. Dreslinski, and Trevor Mudge. 2014. Evaluating private vs. shared last-level caches for energy efficiency in asymmetric multi-cores. In Proceedings of the International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS’14). 191--198.Google ScholarCross Ref
Mark D. Hill and Michael R. Marty. 2008. Amdahl’s law in the multicore era. IEEE Computer 7, 33--38. Google ScholarDigital Library
Houman Homayoun, Vasileios Kontorinis, Amirali Shayan, Ta-Wei Lin, and Dean M. Tullsen. 2012. Dynamically heterogeneous cores through 3D resource pooling. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA’12). 1--12. Google ScholarDigital Library
Tomas Hruby, Herbert Bos, and Andrew S. Tanenbaum. 2013. When slower is faster: On heterogeneous multicores for reliable systems. In Proceedings of the USENIX Annual Technical Conference (ATC’13). 255--266. Google ScholarDigital Library
Ineda. 2015. Ineda Dhanush Wearable Processing Unit.Google Scholar
Engin Ipek, Meyrem Kirman, Nevin Kirman, and Jose F. Martinez. 2007. Core fusion: Accommodating software diversity in chip multiprocessors. In Proceedings of the International Symposium on Computer Architecture (ISCA’07). 186--197. Google ScholarDigital Library
Brian Jeff. 2012. Big.LITTLE system architecture from ARM: Saving power through heterogeneous multiprocessing and task context migration. In Proceedings of the ACM Design Automation Conference (DAC’12).Google ScholarCross Ref
José A. Joao, M. Aater Suleman, Onur Mutlu, and Yale N. Patt. 2012. Bottleneck identification and scheduling in multithreaded applications. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’12). 223--234. Google ScholarDigital Library
José A. Joao, M. Aater Suleman, Onur Mutlu, and Yale N. Patt. 2013. Utility-based acceleration of multithreaded applications on asymmetric CMPs. In Proceedings of the International Symposium on Computer Architecture (ISCA’13). 154--165. Google ScholarDigital Library
B. H. H. Juurlink and C. H. Meenderinck. 2012. Amdahl’s law for predicting the future of multicores considered harmful. ACM SIGARCH Computer Architecture News 40, 2, 1--9. Google ScholarDigital Library
Vahid Kazempour, Ali Kamali, and Alexandra Fedorova. 2010. AASH: An asymmetry-aware scheduler for hypervisors. ACM SIGPLAN Notices 45, 7, 85--96. Google ScholarDigital Library
Omer Khan and Sandip Kundu. 2010. A self-adaptive scheduler for asymmetric multi-cores. In Proceedings of the ACM Great Lakes Symposium on VLSI (GLSVLSI’10). 397--400. Google ScholarDigital Library
Khubaib Khubaib, M. Aater Suleman, Milad Hashemi, Chris Wilkerson, and Yale N. Patt. 2012. MorphCore: An energy-efficient microarchitecture for high performance ILP and high throughput TLP. In Proceedings of the International Symposium on Microarchitecture (MICRO’12). 305--316. Google ScholarDigital Library
Changkyu Kim, Simha Sethumadhavan, Madhu S. Govindan, Nitya Ranganathan, Divya Gulati, Doug Burger, and Stephen W. Keckler. 2007. Composable lightweight processors. In Proceedings of the International Symposium on Microarchitecture (MICRO’07). 381--394. Google ScholarDigital Library
Jun Kim, Joonwon Lee, and Jinkyu Jeong. 2015. Exploiting asymmetric CPU performance for fast startup of subsystem in mobile smart devices. IEEE Transactions on Consumer Electronics 61, 1, 103--111.Google ScholarDigital Library
Myungsun Kim, Kibeom Kim, James R. Geraci, and Seongsoo Hong. 2014. Utilization-aware load balancing for the energy efficient operation of the big.LITTLE processor. In Proceedings of the Conference on Design, Automation, and Test in Europe (DATE’14). 223:1--223:4. Google ScholarDigital Library
Byeong-Moon Ko, Joonwon Lee, and Heeseung Jo. 2012. AMP aware core allocation scheme for mobile devices. In Proceedings of the IEEE Spring Congress on Engineering and Technology (S-CET’12). 1--4.Google ScholarCross Ref
David Koufaty, Dheeraj Reddy, and Scott Hahn. 2010. Bias scheduling in heterogeneous multi-core architectures. In Proceedings of the European Conference on Computer Systems (EuroSys’10). 125--138. Google ScholarDigital Library
Rakesh Kumar, Keith I. Farkas, Norman P. Jouppi, Parthasarathy Ranganathan, and Dean M. Tullsen. 2003. Single-ISA heterogeneous multi-core architectures: The potential for processor power reduction. In Proceedings of the International Symposium on Microarchitecture (MICRO’03). 81--92. Google ScholarDigital Library
Rakesh Kumar, Norman P. Jouppi, and Dean M. Tullsen. 2004a. Conjoined-core chip multiprocessing. In Proceedings of the International Symposium on Microarchitecture (MICRO’04). 195--206. Google ScholarDigital Library
Rakesh Kumar, Dean M. Tullsen, and Norman P. Jouppi. 2006. Core architecture optimization for heterogeneous chip multiprocessors. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT’06). 23--32. Google ScholarDigital Library
Rakesh Kumar, Dean M. Tullsen, Parthasarathy Ranganathan, Norman P. Jouppi, and Keith I. Farkas. 2004b. Single-ISA heterogeneous multi-core architectures for multithreaded workload performance. ACM SIGARCH Computer Architecture News 32, 64. Google ScholarDigital Library
Youngjin Kwon, Changdae Kim, Seungryoul Maeng, and Jaehyuk Huh. 2011. Virtualizing performance asymmetric multi-core systems. In Proceedings of the International Symposium on Computer Architecture (ISCA’11). 45--56. Google ScholarDigital Library
Nagesh B. Lakshminarayana and Hyesoon Kim. 2008. Understanding performance, power and energy behavior in asymmetric multiprocessors. In Proceedings of the International Conference on Computer Design (ICCD’08). 471--477.Google Scholar
Nagesh B. Lakshminarayana, Jaekyu Lee, and Hyesoon Kim. 2009. Age based scheduling for asymmetric multiprocessors. In Proceedings of the Conference on High Performance Computing Networking, Storage, and Analysis (SC’09). 25:1--25:12. Google ScholarDigital Library
Tong Li, Dan Baumberger, David A. Koufaty, and Scott Hahn. 2007. Efficient operating system scheduling for performance-asymmetric multi-core architectures. In Proceedings of the ACM/IEEE Conference on Supercomputing (SC’07). 53:1--53:11. Google ScholarDigital Library
Tong Li, Paul Brett, Rob Knauerhase, David Koufaty, Dheeraj Reddy, and Scott Hahn. 2010. Operating system support for overlapping-ISA heterogeneous multi-core architectures. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA’10). 1--12.Google Scholar
Felix Xiaozhu Lin, Zhen Wang, Robert LiKamWa, and Lin Zhong. 2012. Reflex: Using low-power processors in smartphones without knowing them. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’12). 13--24. Google ScholarDigital Library
Felix Xiaozhu Lin, Zhen Wang, and Lin Zhong. 2014. K2: A mobile operating system for heterogeneous coherence domains. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’14). 285--300. Google ScholarDigital Library
Guangshuo Liu, Jinpyo Park, and Diana Marculescu. 2013. Dynamic thread mapping for high-performance, power-efficient heterogeneous many-core systems. In Proceedings of the International Conference on Computer Design (ICCD’13). 54--61.Google ScholarCross Ref
Andrew Lukefahr, Shruti Padmanabha, Reetuparna Das, Ronald Dreslinski Jr., Thomas F. Wenisch, and Scott Mahlke. 2014. Heterogeneous microarchitectures trump voltage scaling for low-power cores. In Proceedings of the International Conference on Parallel Architectures and Compilation (PACT’14). 237--250. Google ScholarDigital Library
Andrew Lukefahr, Shruti Padmanabha, Reetuparna Das, Faissal M. Sleiman, Ronald Dreslinski, Thomas F. Wenisch, and Scott Mahlke. 2012. Composite cores: Pushing heterogeneity into a core. In Proceedings of the International Symposium on Microarchitecture (MICRO’12). 317--328. Google ScholarDigital Library
Yangchun Luo, Venkatesan Packirisamy, Wei-Chung Hsu, and Antonia Zhai. 2010. Energy efficient speculative threads: Dynamic thread allocation in same-ISA heterogeneous multicore systems. In Proceedings of the International Conference on Parallel Architectures and Compilation (PACT’10). 453--464. Google ScholarDigital Library
Daniel Lustig, Caroline Trippel, Michael Pellauer, and Margaret Martonosi. 2015. ArMOR: Defending against memory consistency model mismatches in heterogeneous architectures. In Proceedings of the International Symposium on Computer Architecture (ISCA’15). 388--400. Google ScholarDigital Library
Felipe Lopes Madruga, Henrique C. Freitas, and Philippe Olivier Alexandre Navaux. 2010. Parallel shared-memory workloads performance on asymmetric multi-core architectures. In Proceedings of the Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP’10). 163--169. Google ScholarDigital Library
N. Markovic, D. Nemirovsky, O. Unsal, M. Valero, and A. Cristal. 2014. Thread lock section-aware scheduling on asymmetric single-ISA multi-core. IEEE Computer Architecture Letters 14, 2, 160--163. DOI:http://dx.doi.org/10.1109/LCA.2014.2357805 Google ScholarDigital Library
Sparsh Mittal. 2014a. A survey of techniques for improving energy efficiency in embedded computing systems. International Journal of Computer Aided Engineering and Technology 6, 4, 440--459.Google ScholarCross Ref
Sparsh Mittal. 2014b. Power Management Techniques for Data Centers: A Survey. Technical Report ORNL/TM-2014/381. Oak Ridge National Laboratory, Oak Ridge, TN.Google Scholar
Sparsh Mittal, Matthew Poremba, Jeffrey Vetter, and Yuan Xie. 2014. Exploring Design Space of 3D NVM and eDRAM Caches Using DESTINY Tool. Technical Report ORNL/TM-2014/636. Oak Ridge National Laboratory, Oak Ridge, TN.Google Scholar
Sparsh Mittal and Jeffrey Vetter. 2015. A survey of CPU-GPU heterogeneous computing techniques. ACM Computing Surveys 47, 4, 69:1--69:35. Google ScholarDigital Library
Jeffrey C. Mogul, Jayaram Mudigonda, Nathan Binkert, Parthasarathy Ranganathan, and Vanish Talwar. 2008. Using asymmetric single-ISA CMPs to save energy on operating systems. IEEE Micro 28, 3, 26--41. Google ScholarDigital Library
Tomer Y. Morad, Avinoam Kolodny, and Uri C. Weiser. 2010. Scheduling multiple multithreaded applications on asymmetric and symmetric chip multiprocessors. In Proceedings of the International Symposium on Parallel Architectures, Algorithms, and Programming (PAAP’10). 65--72. Google ScholarDigital Library
Tomer Y. Morad, Uri C. Weiser, Avinoam Kolodny, Mateo Valero, and Eduard Ayguade. 2006. Performance, power efficiency and scalability of asymmetric cluster chip multiprocessors. Computer Architecture Letters 5, 1, 14--17. Google ScholarDigital Library
Tobias Mühlbauer, Wolf Rödiger, Robert Seilbeck, Alfons Kemper, and Thomas Neumann. 2014. Heterogeneity-conscious parallel query execution: Getting a better mileage while driving faster&excl; In Proceedings of the International Workshop on Data Management on New Hardware (DaMoN’14). 2:1--2:10. Google ScholarDigital Library
Janani Mukundan, Saugata Ghose, Robert Karmazin, Engin Ipek, and José F. Martínez. 2012. Overcoming single-thread performance hurdles in the core fusion reconfigurable multicore architecture. In Proceedings of the International Conference on Supercomputing (ICS’12). 101--110. Google ScholarDigital Library
Thannirmalai Somu Muthukaruppan, Anuj Pathania, and Tulika Mitra. 2014. Price theory based power management for heterogeneous multi-cores. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’14). 161--176. Google ScholarDigital Library
Thannirmalai Somu Muthukaruppan, Mihai Pricopi, Vanchinathan Venkataramani, Tulika Mitra, and Sanjay Vishin. 2013. Hierarchical power management for asymmetric multi-core in dark silicon era. In Proceedings of the Design Automation Conference (DAC’13). 174. Google ScholarDigital Library
Hashem Hashemi Najaf-Abadi, Niket Kumar Choudhary, and Eric Rotenberg. 2009. Core-selectability in chip multiprocessors. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT’09). 113--122. Google ScholarDigital Library
Hashem H. Najaf-Abadi and Eric Rotenberg. 2009. Architectural contesting. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA’09). 189--200.Google Scholar
Sandeep Navada, Niket K. Choudhary, Salil V. Wadhavkar, and Eric Rotenberg. 2013. A unified view of non-monotonic core selection and application steering in heterogeneous chip multiprocessors. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. 133--144. Google ScholarDigital Library
Rajiv Nishtala, Daniel Mossé, and Vinicius Petrucci. 2013. Energy-aware thread co-location in heterogeneous multicore processors. In Proceedings of the International Conference on Embedded Software (EMSOFT’13). 1--9. Google ScholarDigital Library
NVIDIA. 2011. Variable SMP—A Multi-Core CPU Architecture for Low Power and High Performance. Retrieved December 29, 2015, from http://www.nvidia.com/content/PDF/tegra_white_papers/tegra-whitepaper-0 911b.pdf.Google Scholar
Shruti Padmanabha, Andrew Lukefahr, Reetuparna Das, and Scott Mahlke. 2013. Trace based phase prediction for tightly-coupled heterogeneous cores. In Proceedings of the International Symposium on Microarchitecture. 445--456. Google ScholarDigital Library
Sankaralingam Panneerselvam and Michael M. Swift. 2012. Chameleon: Operating system support for dynamic processors. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’12). 99--110. Google ScholarDigital Library
George Patsilaras, Niket K. Choudhary, and James Tuck. 2012. Efficiently exploiting memory level parallelism on asymmetric coupled cores in the dark silicon era. ACM Transactions on Architecture and Code Optimization 8, 4, 28:1--28:21. Google ScholarDigital Library
Miquel Pericas, Adrian Cristal, Francisco J. Cazorla, Ruben Gonzalez, Daniel A. Jimenez, and Mateo Valero. 2007. A flexible heterogeneous multi-core architecture. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques (PACT’07). 13--24. Google ScholarDigital Library
Vinicius Petrucci, Orlando Loques, and Daniel Mossé. 2012. Lucky scheduling for energy-efficient heterogeneous multi-core systems. In Proceedings of the USENIX Conference on Power-Aware Computing and Systems (HotPower’12). Google ScholarDigital Library
Dmitry Ponomarev, Gurhan Kucuk, and Kanad Ghose. 2001. Reducing power requirements of instruction scheduling through dynamic allocation of multiple datapath resources. In Proceedings of the International Symposium on Microarchitecture. 90--101. Google ScholarDigital Library
Mihai Pricopi and Tulika Mitra. 2012. Bahurupi: A polymorphic heterogeneous multi-core architecture. ACM Transactions on Architecture and Code Optimization 8, 4, 22:1--22:21. Google ScholarDigital Library
Mihai Pricopi and Tulika Mitra. 2014. Task scheduling on adaptive multi-core. IEEE Transactions on Computers 63, 10, 2590--2603. Google ScholarDigital Library
Mihai Pricopi, Thannirmalai Somu Muthukaruppan, Vanchinathan Venkataramani, Tulika Mitra, and Sanjay Vishin. 2013. Power-performance modeling on asymmetric multi-cores. In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES’13). 1--10. Google ScholarDigital Library
Moo-Ryong Ra, Bodhi Priyantha, Aman Kansal, and Jie Liu. 2012. Improving energy efficiency of personal sensing applications with heterogeneous multi-processors. In Proceedings of the ACM Conference on Ubiquitous Computing (Ubicomp’12). 1--10. Google ScholarDigital Library
M. Mustafa Rafique, Benjamin Rose, Ali R. Butt, and Dimitrios S. Nikolopoulos. 2009. Supporting MapReduce on large-scale asymmetric multi-core clusters. ACM SIGOPS Operating Systems Review 43, 2, 25--34. Google ScholarDigital Library
Behnam Robatmili, Dong Li, Hadi Esmaeilzadeh, Sibi Govindan, Aaron Smith, Andrew Putnam, Doug Burger, and Stephen W. Keckler. 2013. How to implement effective prediction and forwarding for fusable dynamic multicore architectures. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA’13). 460--471. Google ScholarDigital Library
Rance Rodrigues, Arunachalam Annamalai, Israel Koren, Sandip Kundu, and Omer Khan. 2011. Performance per watt benefits of dynamic core morphing in asymmetric multicores. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT’11). 121--130. Google ScholarDigital Library
Rance Rodrigues, Israel Koren, and Sandip Kundu. 2014. Performance and power benefits of sharing execution units between a high performance core and a low power core. In Proceedings of the International Conference on VLSI Design (VLSID’14). 204--209. Google ScholarDigital Library
Juan Carlos Saez, Alexandra Fedorova, David Koufaty, and Manuel Prieto. 2012. Leveraging core specialization via OS scheduling to improve performance on asymmetric multicore systems. ACM Transactions on Computer Systems 30, 2, 6:1--6:38. Google ScholarDigital Library
Juan Carlos Saez, Alexandra Fedorova, Manuel Prieto, and Hugo Vegas. 2010. Operating system support for mitigating software scalability bottlenecks on asymmetric multicore processors. In Proceedings of the Computing Frontiers Conference (CF’10). 31--40. Google ScholarDigital Library
Juan Carlos Saez, Adrian Pousa, Fernando Castro, Daniel Chaver, and Manuel Prieto-Matias. 2015. ACFS: A completely fair scheduler for asymmetric single-ISA multicore systems. In Proceedings of the ACM Symposium on Applied Computing (SAC’15). Google ScholarDigital Library
Pierre Salverda and Craig Zilles. 2008. Fundamental performance constraints in horizontal fusion of in-order cores. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA’08). 252--263.Google ScholarCross Ref
Samsung. 2013. SAMSUNG Highlights Innovations in Mobile Experiences Driven by Components, in CES Keynote. Retrieved December 29, 2015, from http://www.samsung.com/us/news/20353.Google Scholar
Karthikeyan Sankaralingam, Ramadass Nagarajan, Haiming Liu, Changkyu Kim, Jaehyuk Huh, Doug Burger, Stephen W. Keckler, and Charles R. Moore. 2003. Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture. In Proceedings of the International Symposium on Computer Architecture (ISCA’03). 422--433. Google ScholarDigital Library
Lina Sawalha and Ronald D. Barnes. 2012. Energy-efficient phase-aware scheduling for heterogeneous multicore processors. In Proceedings of the IEEE Green Technologies Conference. 1--6.Google Scholar
Daniel Shelepov, Juan Carlos Saez Alcaide, Stacey Jeffery, Alexandra Fedorova, Nestor Perez, Zhi Feng Huang, Sergey Blagodurov, and Viren Kumar. 2009. HASS: A scheduler for heterogeneous multicore systems. ACM SIGOPS Operating Systems Review 43, 2, 66--75. Google ScholarDigital Library
Tyler Sondag and Hridesh Rajan. 2009. Phase-guided thread-to-core assignment for improved utilization of performance-asymmetric multi-core processors. In Proceedings of the ICSE Workshop on Multicore Software Engineering. 73--80. Google ScholarDigital Library
Sudarshan Srinivasan, Nithesh Kurella, Israel Koren, and Sandip Kundu. 2015. Exploring heterogeneity within a core for improved power efficiency. IEEE Transactions on Parallel and Distributed Systems PP, 99, 1.Google Scholar
Sudarshan Srinivasan, Rance Rodrigues, Arunachalam Annamalai, Israel Koren, and Sandip Kundu. 2013. A study on polymorphing superscalar processor dynamically to improve power efficiency. In Proceedings of the IEEE Computer Society Annual Symposium on VLSI (ISVLSI’13). 46--51.Google ScholarCross Ref
Sadagopan Srinivasan, Li Zhao, Ramesh Illikkal, and Ravishankar Iyer. 2011. Efficient interaction between OS and architecture in heterogeneous platforms. ACM SIGOPS Operating Systems Review 45, 1, 62--72. Google ScholarDigital Library
Richard Strong, Jayaram Mudigonda, Jeffrey C. Mogul, Nathan Binkert, and Dean Tullsen. 2009. Fast switching of threads between cores. ACM SIGOPS Operating Systems Review 43, 2, 35--45. Google ScholarDigital Library
M. Aater Suleman, Onur Mutlu, José A. Joao, Khubaib, and Yale Patt. 2010. Data marshaling for multi-core architectures. In Proceedings of the International Symposium on Computer Architecture (ISCA’10). 441--450. Google ScholarDigital Library
M. Aater Suleman, Onur Mutlu, Moinuddin K. Qureshi, and Yale N. Patt. 2009. Accelerating critical section execution with asymmetric multi-core architectures. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’09). 253--264. Google ScholarDigital Library
M. Aater Suleman, Yale N. Patt, Eric Sprangle, Anwar Rohillah, Anwar Ghuloum, and Doug Carmean. 2007. Asymmetric Chip Multiprocessors: Balancing Hardware Efficiency and Programmer Efficiency. TR-HPS-2007-001. University of Texas, Austin, TX.Google Scholar
Hsin-Ching Sun, Bor-Yeh Shen, Wuu Yang, and Jenq-Kuen Lee. 2011. Migrating Java threads with fuzzy control on asymmetric multicore systems for better energy delay product. In Proceedings of the International Conference on Computing and Security.Google Scholar
Tao Sun, Hong An, Tao Wang, Haibo Zhang, and Xiufeng Sui. 2012. CRQ-based fair scheduling on composable multicore architectures. In Proceedings of the International Conference on Supercomputing (ICS’12). 173--184. Google ScholarDigital Library
Ibrahim Takouna, Wesam Dawoud, and Christoph Meinel. 2011. Efficient virtual machine scheduling-policy for virtualized heterogeneous multicore systems. In Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA’11).Google Scholar
David Tarjan, Michael Boyer, and Kevin Skadron. 2008. Federation: Repurposing scalar cores for out-of-order instruction issue. In Proceedings of the Design Automation Conference (DAC’08). 772--775. Google ScholarDigital Library
Kenzo Van Craeynest, Shoaib Akram, Wim Heirman, Aamer Jaleel, and Lieven Eeckhout. 2013. Fairness-aware scheduling on single-ISA heterogeneous multi-cores. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT’13). 177--187. Google ScholarDigital Library
Kenzo Van Craeynest and Lieven Eeckhout. 2013. Understanding fundamental design choices in single-ISA heterogeneous multicore architectures. ACM Transactions on Architecture and Code Optimization 9, 4, 32. Google ScholarDigital Library
Kenzo Van Craeynest, Aamer Jaleel, Lieven Eeckhout, Paolo Narvaez, and Joel Emer. 2012. Scheduling heterogeneous multi-cores through performance impact estimation (PIE). In Proceedings of the International Symposium on Computer Architecture (ISCA’12). 213--224. Google ScholarDigital Library
Ashish Venkat and Dean M. Tullsen. 2014. Harnessing ISA diversity: Design of a heterogeneous-ISA chip multiprocessor. In Proceedings of the International Symposium on Computer Architecture (ISCA’14). 121--132. Google ScholarDigital Library
Jeffrey Vetter and Sparsh Mittal. 2015. Opportunities for nonvolatile memory systems in extreme-scale high performance computing. Computing in Science and Engineering 17, 2, 73--82.Google ScholarDigital Library
Carl A. Waldspurger and William E. Weihl. 1994. Lottery scheduling: Flexible proportional-share resource management. In Proceedings of the USENIX Conference on Operating Systems Design and Implementation (OSDI’94). Google ScholarDigital Library
Yasuko Watanabe, John D. Davis, and David A. Wood. 2010. WiDGET: Wisconsin decoupled grid execution tiles. In Proceedings of the International Symposium on Computer Architecture (ISCA’10), Vol. 38. 2--13. Google ScholarDigital Library
Ryan Whitwam. 2014. Qualcomm Unveils 64-Bit Snapdragon 808 and 810 SoCs: The Apple A7 Stop-Gap Measures Continue. Retrieved December 29, 2015, from http://goo.gl/v4ywMW.Google Scholar
Youfeng Wu, Shiliang Hu, Edson Borin, and Cheng Wang. 2011. A HW/SW co-designed heterogeneous multi-core virtual machine for energy-efficient general purpose computing. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’11). 236--245. Google ScholarDigital Library
Ying Zhang, Lide Duan, Bin Li, Lu Peng, and Srinivasan Sadagopan. 2014a. Energy efficient job scheduling in single-ISA heterogeneous chip-multiprocessors. In Proceedings of the International Symposium on Quality Electronic Design (ISQED’14). 660--666.Google ScholarCross Ref
Ying Zhang, Li Zhao, Ramesh Illikkal, Ravi Iyer, Andrew Herdrich, and Lu Peng. 2014b. QoS management on heterogeneous architecture for parallel applications. In Proceedings of the IEEE International Conference on Computer Design (ICCD’14). 332--339.Google ScholarCross Ref
Hongtao Zhong, Steven A. Lieberman, and Scott A. Mahlke. 2007. Extending multicore architectures to exploit hybrid parallelism in single-thread applications. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA’07). 25--36. Google ScholarDigital Library
Yuhao Zhu and Vijay Janapa Reddi. 2013. High-performance and energy-efficient mobile web browsing on big/little systems. In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA’13). 13--24. Google ScholarDigital Library

Index Terms

A Survey of Techniques for Architecting and Managing Asymmetric Multicore Processors
1. Computer systems organization
2. General and reference
  1. Document types
    1. Reference works

Recommendations

COLAB: a collaborative multi-factor scheduler for asymmetric multicore processors

CGO 2020: Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization

Increasingly prevalent asymmetric multicore processors (AMP) are necessary for delivering performance in the era of limited power budget and dark silicon. However, the software fails to use them efficiently. OS schedulers, in particular, handle ...

Read More
HASpGEMM: Heterogeneity-Aware Sparse General Matrix-Matrix Multiplication on Modern Asymmetric Multicore Processors

ICPP '23: Proceedings of the 52nd International Conference on Parallel Processing

Sparse general matrix-matrix multiplication (SpGEMM) is an important kernel in computational science and engineering, and has been widely studied on homogeneous processors, e.g., CPUs and GPUs. Recently, the asymmetric multicore processors (AMPs), ...

Read More
Acceleration of bulk memory operations in a heterogeneous multicore architecture

PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniques

In this paper, we present a novel approach of using the integrated GPU to accelerate conventional operations that are normally performed by the CPUs, the bulk memory operations, such as memcpy or memset. Offloading the bulk memory operations to the GPU ...

Read More

Comments

comments powered by Disqus.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Computing Surveys Volume 48, Issue 3

February 2016

619 pages

ISSN:0360-0300

EISSN:1557-7341

DOI:10.1145/2856149

Editor:

Sartaj Sahni
Department of Computer and Information Science and Engineering/University of Florida/Gainesville

Issue’s Table of Contents
Copyright © 2016 ACM

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher

Association for Computing Machinery

New York, NY, United States
Publication History
- Published: 8 February 2016
- Accepted: 1 November 2015
- Revised: 1 August 2015
- Received: 1 April 2015
Published in csur Volume 48, Issue 3

Permissions

Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Review

asymmetric multicore processor

big/little system

classification

heterogeneous multicore architecture

reconfigurable AMP
Qualifiers
- survey
- Research
- Refereed
Conference
Funding Sources
Other Metrics

View Article Metrics

Article Metrics
- 91
  Total Citations
  View Citations
- 1,777
  Total Downloads
- Downloads (Last 12 months)273
- Downloads (Last 6 weeks)47
Other Metrics

View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A Survey of Techniques for Architecting and Managing Asymmetric Multicore Processors

ACM Computing Surveys

Abstract

References

Cited By

Index Terms

Recommendations

COLAB: a collaborative multi-factor scheduler for asymmetric multicore processors

HASpGEMM: Heterogeneity-Aware Sparse General Matrix-Matrix Multiplication on Modern Asymmetric Multicore Processors

Acceleration of bulk memory operations in a heterogeneous multicore architecture