

# LoopTree: Exploring the Fused-layer Dataflow Accelerator Design Space

## Introduction

• Data movement is expensive.



# Methodology (1)

(1) Explore systematically! Choose the right representation of the design space.

Insight 1.1

• Reuse opportunity across DNN layers:



Can keep on-chip and reuse  $\rightarrow$  reduce DRAM transfers

A challenge: <u>on-chip memory is limited</u>

Address challenges of fusion via comprehensive exploration.

There are many ways to fuse. Specifically, we highlight three design choices:

dataflow, tiling, recomputation.

Gap: prior work is scattered and not comprehensive.





Need to represent only 2, the remainder is implied.

Insight 1.2



Note, many features of LoopTree not shown: parallelization, pipelining, storing data in multiple levels of memory.

Michael Gilbert, Yannan Nellie Wu, Joel S. Emer, and Vivienne Sze

#### Choose exactly one: refetch, reuse, recompute

**Expensive DRAM transfers** 

Store in DRAM and refetch



# Methodology (2)

### Insight 2.1

DNNs are compute- and data-intensive: Simulation takes a long time! But the operations follow a predictable pattern.

> Element of B produced/consumed

Can use a compact data structure to represent hardware states and actions over time.

Benefit of choosing the right representation: The LoopTree mapping abstraction makes calculating these states and actions easy.

# Insight 2.2

### Frame analysis as set operations on data.

Data to reuse from Buffer

Data required for operations ∩ Data in Buffer

Data to refetch from DRAM

(Required data – Reused data) ∩ Data in DRAM

### (2) Evaluate designs with a versatile model

Need a fast and accurate model that supports a wide design space



Eliminate assumptions of specific dataflow or tiling

### Results

#### **Comprehensive exploration enables** efficient fusion.

On-chip memory capacity required for fusing ResNet (lower is better)



# Conclusion

For more efficient designs

- consider a wide design space,
- explore systematically, and
- use a versatile model

# Learn More

Appeared in TCAS-AI ISPASS 2023



# Acknowledgements

This work is sponsored by the MIT AI Hardware Program