# ARM Compute Library introduction

Nesterov Alexander, Obolenskiy Arseniy

**ITLab** 

November 15, 2024

### Contents

- ARM Compute Library
- 2 Build ACL
- Return to examples
- 4 ACL operators
- 6 ACL activation operator
- 6 Validate activation operator
- TensorInfo for operators
- 8 Configure activation operator
- Tensor for operators
- Run activation operator
- Get ONNX model

## **ARM Compute Library**

### **arm** COMPUTE LIBRARY

### Compute Library latest release 24.09

The Compute Library is a collection of low-level machine learning functions optimized for Arm® Cortex®-A, Arm® Neoverse® and Arm® Mali™ GPUs architectures.

The library provides superior performance to other open source alternatives and immediate support for new Arm® technologies e.g. SVE2.

#### Key Features:

- . Open source software available under a permissive MIT license
- . Over 100 machine learning functions for CPU and GPU
- . Multiple convolution algorithms (GeMM, Winograd, FFT, Direct and indirect-GeMM)
- Support for multiple data types: FP32, FP16, INT8, UINT8, BFLOAT16
- . Micro-architecture optimization for key ML primitives
- · Highly configurable build options enabling lightweight binaries
- · Advanced optimization techniques such as kernel fusion, Fast math enablement and texture utilization
- · Device and workload specific tuning using OpenCL tuner and GeMM optimized heuristics

| Repository  | Link                                                             |
|-------------|------------------------------------------------------------------|
| Release     | https://github.com/arm-software/ComputeLibrary                   |
| Development | https://review.mlplatform.org/#/admin/projects/ml/ComputeLibrary |

Source: https://github.com/ARM-software/ComputeLibrary

# Supported Architectures/Technologies

### Supported Architectures/Technologies

- · Arm® CPUs:
  - Arm® Cortex®-A processor family using Arm® Neon™ technology
  - Arm® Neoverse® processor family
  - Arm® Cortex®-R processor family with Armv8-R AArch64 architecture using Arm® Neon™ technology
  - Arm® Cortex®-X1 processor using Arm® Neon™ technology
- Arm® Mali™ GPUs:
  - Arm® Mali™-G processor family
  - Arm® Mali™-T processor family
- x86

Source: https://github.com/ARM-software/ComputeLibrary



# Supported Systems

# **Supported Systems**

- Android™
- Bare Metal
- Linux®
- OpenBSD®
- macOS®
- Tizen™

Source: https://github.com/ARM-software/ComputeLibrary



### **Build ACL**

## **Building for macOS**

To natively compile the library with accelerated CPU support:

scons Werror=1 -j8 neon=1 opencl=0 os=macos arch=armv8.2-a build=native

### Source:

 $https://artificial-intelligence.sites.arm.com/computelibrary/latest/how\_to\_build.xhtml$ 

## Return to examples



Source: docs.openvino.ai

## ACL operators

#### Supported Operators

#### Supported Operators

Compute Library supports operators that are listed in below table.

Compute Library supports a wide list of data-types, information can been directly found in the documentation of each kernel/function. The main data-types that the Machine Learning functions support are the following:

- . BFLOAT16: 16-bit non-standard brain floating point
- · QASYMM8: 8-bit unsigned asymmetric quantized
- · QASYMM8 SIGNED: 8-bit signed asymmetric quantized
- . QSYMM8 PER CHANNEL: 8-bit signed symmetric quantized (Used for the weights)
- · OSYMM8: 8-bit unsigned symmetric quantized
- · QSYMM16: 16-bit unsigned symmetric quantized
- . F32: 32-bit single precision floating point
- . F16: 16-bit half precision floating point
- · S32: 32-bit signed integer
- . U8: 8-bit unsigned char · All: Agnostic to any specific data type

Compute Library supports the following data layouts (fast changing dimension from right to left):

- . NHWC: The native layout of Compute Library that delivers the best performance where channels are in the fastest changing dimension
- . NCHW: Legacy layout where width is in the fastest changing dimension
- . NDHWC: New data layout for supporting 3D operators
- · All: Agnostic to any specific data layout

where N = batches, C = channels, H = height, W = width, D = depth

### Source:

https://artificial-intelligence.sites.arm.com/computelibrary/latest/operators\_list.xhtml



## ACL activation operator



### Source:

https://artificial-intelligence.sites.arm.com/computelibrary/latest/operators\_list.xhtml

# Validate activation operator



Source: Description of activation operator

## TensorInfo for operators

### TensorInfo() [10/11]

```
TensorInfo ( const TensorShape & tensor_shape, size_t num_channels, DataType data_type, DataLayout data_layout
```

### Constructor.

#### **Parameters**

- [in] tensor\_shape It specifies the size for each dimension of the tensor in number of elements.
- [in] num channels It indicates the number of channels for each tensor element
- [in] data\_type Data type to use for each tensor element
- [in] data\_layout The data layout setting for the tensor data.

Source: Description of TensorInfo



# Configure activation operator



Source: Description of activation operator



## Tensor for operators

| Public Member Functions |                                                                                                                                               |
|-------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------|
|                         | Tensor (IRuntimeContext *ctx=nullptr) Constructor. More                                                                                       |
|                         | ~Tensor ()=default Destructor: free the tensor's memory. More                                                                                 |
|                         | Tensor (Tensor &&)=default Allow instances of this class to be move constructed. More                                                         |
| Tensor &                | operator= (Tensor &&)=default Allow instances of this class to be moved. More                                                                 |
| TensorAllocator *       | allocator () Return a pointer to the tensor's allocator. More                                                                                 |
| lTensorInfo *           | Info () const override Interface to be implemented by the child class to return the tensor's metadata. More                                   |
| lTensorInfo *           | Info () override Interface to be implemented by the child class to return the tensor's metadata. More                                         |
| uint8_t *               | buffer () const override  Interface to be implemented by the child class to return a pointer to CPU memory. More                              |
| void                    | associate_memory_group (IMemoryGroup *memory_group) override Associates a memory managable object with the memory group that manages it. More |

Source: Description of Tensor

### Run activation operator



- · All the kernels are enqueued on the queue associated with CLScheduler.
- · The queue is then flushed.

#### Note

The function will not block until the kernels are executed. It is the user's responsibility to wait.

Will call prepare() on first run if hasn't been done

Implements IFunction.

Source: Description of activation operator



### Get ONNX model



Source: https://docs.ultralytics.com/integrations/onnx/