Peripherals Bus Delivery

一、HCCS

Cache一致性总线HCCS，可以实现CPU和CPU之间的高速互联，通信速率高达每秒30GT，是业界主流CPU互联速率的2倍多。

通过多CPU互联，我们率先实现256个物理核的NUMA架构，从而推出业界首款兼容ARM架构的最强算力4路服务器。
异构计算的兴起，使得CPU与NPU之间的互联协议也很关键。华为创新性的将HCCS同样应用于CPU与NPU的高速互联
，构建了xPU间的统一Cache一致性架构，xPU之间可以进行直接内存访问，实现高速数据交互。同时基于此架构，
可实现通用算力和AI算力的灵活组合，打造最强算力的异构计算服务器。

二、HBM HIGH Bandwidth memory

        
https://china.xilinx.com/support/documentation/white_papers/c_wp485-hbm.pdf
硅与硅的堆叠结构允许通过非常小、分布非常密集的微凸块来连接相邻的硅器件

三、CXL

https://zhuanlan.zhihu.com/p/65435956

        
      
https://www.techdesignforums.com/practice/technique/compute-express-link-cxl-protocols/
The CXL standard defines three protocols that are dynamically multiplexed together before 
being transported via a standard PCIe 5.0 PHY at 32GT/s.

The CXL.io protocol is an enhanced version of a PCIe 5.0 protocol that can be used for 
initialization, link-up, device discovery and enumeration, and register access. It 
provides a non-coherent load/store interface for I/O devices.

The CXL.cache protocol defines interactions between a host and a device, allowing attached
 CXL devices to efficiently cache host memory with extremely low latency using a request and 
 response approach.

The CXL.mem protocol provides a host processor with access to the memory of an attached 
device using load and store commands, with the host CPU acting as a master and the CXL
 device acting as a subordinate. This approach can support both volatile and persistent memory architectures.

https://www.synopsys.com/designware-ip/technical-bulletin/compute-express-link-standard-2019q3.html

Peripherals Bus Delivery

一、HCCS

二、HBM HIGH Bandwidth memory

三、CXL

Further Reading

The Kernel Concurrency Sanitizer (KCSAN)

Kernel Probe (kprobe)

Kernel userfaultfd