Concurrent Data Structures for Near-Memory Computing




Loading...







Concurrent Data Structures for Near-Memory Computing

concurrent data structures, with a signi•cantly simpler design KEYWORDS concurrent data structures; parallel programs; processing-in-memory; near-memory computing 1 NEAR-MEMORY COMPUTING „e performance gap between memory and CPU has grown ex-ponentially Memory vendors have focused mainly on improv-

Concurrent Data Structures for Near-Memory Computing

Data Structures + Hardware 6 • Tight integration between algorithmic design and hardware characteristics • Memory becomes an active component in managing data • Managing data structures in PIM • Old work: pointer chasing for sequential data structures • Our work: concurrent data structures

HybriDS: Cache-Conscious Concurrent Data Structures for Near

near-memory compute units, and between the host threads and near-memory compute units is required to meet high concurrency and correctness guarantees In summary, this paper makes the following contributions: •We propose hybrid data structures, which are cache-conscious concurrent data structures designed for new NMP architectures

Concurrent Data Structures with Near-Data-Processing:an

the-art concurrent data structures This paper makes the following contributions: •We define a generic near-data-processing (NDP) architecture that is well-suited for concurrent data structures •We implement actual software kernels of the NDP-based concurrent data structures on a cycle-accurate full-system

Design by Jiwon Choe, PhD, Brown University, May 2022 In

Abstract of \Concurrent Data Structures with Near-Memory Processing: Software-Hardware Co-Design" by Jiwon Choe, Ph D , Brown University, May 2022 In recent years, there has been a renewed interest in near-memory processing (NMP) architectures as a workaround for the performance and energy issues of frequent and irregular memory access,

LPM: A Systematic Methodology for Concurrent Data Access

and software simulators con?rm our theoretical ?ndings, and they show that the LPM approach can be applied in diverse computing platforms and can effectivelyguide performance optimization of memory systems Index Terms—Memory wall, memory stall time, ef?ciency, performance optimization, layered performance matching (LPM), memory

Searches related to concurrent data structures for near memory computing filetype:pdf

How to Design Data Structures for PIM? Zhiyu Liu, Irina Calciu, Maurice Herlihy, and Onur Mutlu, "Concurrent Data Structures for Near-Memory Computing" Proceedings of the29th ACM Symposium on Parallelism in Algorithms and Architectures(SPAA), Washington, DC, USA, July 2017 [Slides (pptx) ( pdf )] 12

Concurrent Data Structures for Near-Memory Computing 61212_3a11ee5221ad6f7d97ae0c8ffec8c62c65fa8.pdf

© 2017 VMware Inc. All rights reserved.Concurrent Data Structures for Near-Memory ComputingZhiyuLiu (Brown)Irina Calciu(VMware Research)Maurice Herlihy(Brown)OnurMutlu(ETH)

Concurrent Data Structures2Are used everywhere: kernel, libraries, applicationsIssues:•Difficult to design and implement •Data layout and memory/cache hierarchy play crucial role in performanceCacheMemory

The Memory Wall3CPUL1 cacheL2 cacheL3 cacheMemory< 10 nsTens of nsHundredsof ns2.2 GHz = 220K cycles during this timeData movement

Near Memory Computing4•Also called Processing In Memory (PIM)•Avoid data movement by doing computation in memory•Old idea •New advances in 3D integration and die-stacked memory•Viable in the near future

Near Memory Computing: Architecture5•Vaults: memory partitions•PIM cores: lightweight•Fast access to its own vault•Communication•Between a CPU and a PIM•Between PIMs•Via messages sent to buffersCPUCrossbarNetworkCPUPIMcoreVaultPIM memoryPIMcorePIMcoreCPUCPUVaultVaultDRAM

Data Structures + Hardware 6•Tight integration between algorithmic designand hardware characteristics•Memory becomes an active component in managing data •Managing data structures in PIM •Old work: pointer chasing for sequential data structures•Our work: concurrent data structures

CONTENTION7LOWHIGHCacheable Pointer-chasingGoals: PIM Concurrent Data Structures1. How do PIM data structures compare to state-of-the-art concurrent data structures?2. How to design efficient PIM data structures?

CONTENTION8LOWHIGHCacheable Pointer-chasingGoals: PIM Concurrent Data Structures1. How do PIM data structures compare to state-of-the-art concurrent data structures?2. How to design efficient PIM data structures?e.g., uniform distribution skiplist& linkedlist

9111134678101114447101010812141214716161616Pointer Chasing Data Structures

CONTENTION10LOWHIGHCacheable Pointer-chasingGoals: PIM Concurrent Data Structures1. How do PIM data structures compare to state-of-the-art concurrent data structures?2. How to design efficient PIM data structures?e.g., uniform distribution skiplist& linkedlist

Naïve PIM Skiplist11CPUCPUCPUadd(30)add(5)delete(80)PIM Memory VaultPIMcore111134678101114447101010812141214716161616Low Latency

Concurrent Data Structures12CPUCPUCPUadd(30)add(5)delete(80)111134678101114447101010812141214716161616DRAMHigh Latency

Flat Combining13CPUCPUCPUadd(30)add(5)delete(80)111134678101114447101010812141214716161616Sequential accessCombiner lockDRAMHigh Latency

01000000200000030000004000000500000060000007000000800000090000001000000005101520253014SkiplistThroughputLock-freeFC Ops/sThreadsIntel Xeon E7-4850v3 28 hardware threads, 2.2 GHz

PIM Performance15NSize ofthe skiplistpNumber of processesLCPULatencyof a memory access from the CPULLLCLatency ofa LLC access LATOMICLatencyof an atomic instruction (by the CPU)LPIMLatency of a memory access from the PIM core LMSGLatencyof a message from the CPU to the PIM core

PIM Performance16LCPU= r1LPIM= r2LLLCLMSG= LCPUr1 = r2 = 3

PIM Performance17AlgorithmThroughputLock-freep / ( B * LCPU)Flat Combining (FC)1 / ( B * LCPU)PIM1 / ( B * LPIM+ LMSG )B= average number of nodes accessed during a skiplistoperation

01000000200000030000004000000500000060000007000000800000090000001000000005101520253018SkiplistThroughputLock-freeOps/sThreadsPIM (expected)FC

CONTENTION19LOWHIGHCacheable Pointer-chasingGoals: PIM Concurrent Data Structures1. How do PIM data structures compare to state-of-the-art concurrent data structures?2. How to design efficient PIM data structures?e.g., uniform distribution skiplist& linkedlist

New PIM algorithm: Exploit Partitioning20CPUCrossbarNetworkCPUPIMcoreVaultPIM memoryPIMcorePIMcoreCPUCPUVaultVaultDRAM

PIM Skiplistw/ Partitioning2111113467810112026447101010812202012207CPUCPUCPU11020CPUcacheVault1Vault2Vault3PIMcorePIMcorePIMcorePIM Memory

PIM Performance22AlgorithmThroughputLock-freep / ( B * LCPU)Flat Combining (FC)1 / ( B * LCPU)PIM1 / ( B * LPIM+ LMSG )FC + kpartitionsk / ( B * LCPU )PIM + kpartitionsk/(B *LPIM+LMSG )B= average number of nodes accessed during a skiplistoperation

23010000002000000300000040000005000000600000070000008000000900000010000000051015202530Lock-freeFC Ops/sThreadsFC w/ 16 partitions FC w/ 8 partitions FC w/ 4 partitions SkiplistThroughput

24020000004000000600000080000001000000012000000140000001600000018000000051015202530Lock-freeFC Ops/sThreadsFC w/ 16 partitions FC w/ 8 partitions FC w/ 4 partitions PIM w/ 8 partitions (expected) PIM w/ 16 partitions (expected) SkiplistThroughput

CONTENTION25LOWHIGHCacheable Pointer-chasingGoals: PIM Concurrent Data Structures1. How do PIM data structures compare to state-of-the-art concurrent data structures?2. How to design efficient PIM data structures?e.g., FIFO queue

26HEADTAILenqueuedequeueFIFO Queue

PIM FIFO Queue27TailCPUCPUPIMcoreDeq()Deq()1dequeuesanode2retrievesrequest3sendsbackthenode1retrievesrequestPIM Memory Vault

Pipelining2812Timeline3Deq()Deq()Deq()123123Canoverlap the execution of the next request ParallelizeEnqsandDeqs29HeadTailCPUCPUCPUVault2PIMcoreVault1PIMcoreenqsdeqs

Conclusion30PIM is becoming feasible in the near futureWe investigate Concurrent Data Structures (CDS) for PIMResults:•Naïve PIM data structures are less efficient than CDS•New PIM algorithms can leverage PIM features •They outperform efficient CDS•They are simpler to design and implement

Thankyou!https://research.vmware.com/icalciu@vmware.com

© 2017 VMware Inc. All rights reserved.Concurrent Data Structures for Near-Memory ComputingZhiyuLiu (Brown)Irina Calciu(VMware Research)Maurice Herlihy(Brown)OnurMutlu(ETH)

33

Politique de confidentialité -Privacy policy