

# Tailors: Accelerating Sparse Tensor Algebra by Overbooking Buffer Capacity Zi Yu Xue\*, Yannan Nellie Wu\*, Joel S. Emer\*^, Vivienne Sze\*

### **Sparse Tensors are Large and Highly Sparse**

- Tensor computation relies on tiling to improve data reuse and arithmetic intensity. Larger tiles maximize data reuse.
- Operations on sparse tensors, particularly multiple sparse operands, are especially challenging to tile effectively as they have further reduced arithmetic



### **Current Tiling Approaches are Insufficient**





- Same number of nonzeros per tile
- $\Rightarrow$  Ideal buffer utilization
- Varying coordinate range in second sparse operand  $\Rightarrow$  Hard to tile second

### operand

### **Uniform Shape**



- All tiles must fit in buffer  $\implies$  Low buffer utilization
- Fixed coordinate ranges
- $\Rightarrow$  Easy to tile both operands

## $MIT^*$ , $NVIDIA^{}$



enables tile size estimation and do not need the exact tile size since tiles which do not fit entirely in buffer are still supported by the hardware



# Conclusion

Tiling is key to improving data reuse and thus reducing memory traffic for sparse tensor algebra applications. We balance the tiling strategy's adaptability and efficiency by overbooking. We support overbooking in hardware with Tailors and speculatively tile with Swiftiles

MIT AI Hardware Program NSERC PGS-D