KISS Seminars

Optimization and Deployment of Deep Neural Networks for Heterogenous Systems

The increasing popularity of neural networks requires them to be executable on embedded systems. The latter are typically realized as heterogeneous hardware platforms and include e.g. multi-core CPUs with GPUs as well as dedicated AI accelerators. The systems are thereby limited in their resources such as memory or power consumption. This leads to the fact that common network architectures such as ResNet, VGG or U-Net can only be executed very slowly or the available possibilities are not used optimally.


In order to execute pre-trained neural networks efficiently at inference time nevertheless, they have to be optimized for the specific use case and adapted to the target hardware while maintaining the accuracy of the model. For application-specific optimization, compression methods such as quantization or pruning are used, for example, which can significantly reduce the number of necessary computational operations. Afterwards, the simplified network can be adapted to the conditions of the target platform with special programs, so called Deep Learning (DL) compilers, such as TVM or TensorRT, so that the hardware is used optimally. Substantial performance improvements can be achieved in this way. In practice, however, this is made difficult by high complexity and a lack of interoperability.

Our seminar therefore provides basic know-how about the respective tools as well as their practical usage in order to adapt neural network quickly with respect to requirements and constraints.

We will address the following topics:

  • Basics of the DL compilers TensorRT and TVM
  • Basics about exchange formats, e.g. ONNX
  • Application of TensorRT and TVM in detail
    • Practical examples
    • Throughput maximization on GPU's
    • Inference with reduced computational accuracy
    • Best practice
  • Performance profiling to identify bottlenecks
  • Basics of compression tools, e.g. compression in Pytorch / TFLite, Intellabs Distiller, Microsoft Neural Network Interface (NNI)
  • Interaction between compression tools and DL compilers
  • Integration and deployment of neural networks in C/C++
  • Target platforms, including CPU (x86/ARM), Nvidia desktop GPUs, Nvidia Jetson boards and Google Coral Edge TPU and NPU

Target group:
Our seminar is aimed at software developers who have already gained concrete experience in working with neural networks or at companies that are active in the field of AI and embedded systems or would like to become active.

The seminar materials as well as practical examples can be viewed and downloaded on our GitLab portal.

Advanced Training Methods for Deep Neural Networks


The development of more accurate AI models, especially in the area of audio, image and video signal processing, requires more and more computing power in addition to extensive data sets.
On the one hand, the number of learnable parameters of a neural network increases, on the other hand, the resolution of the data also increases. This can quickly lead to requirements that exceed the capabilities of a single computer because, for example, too little GPU memory is available or the training takes a very long time.
Multi-GPU systems or even larger computing clusters promise a remedy here. However, the corresponding neural network has to be adapted manually to the possibilities of the hardware, for example by partitioning it accordingly.
The goal of the seminar is to train a neural network as efficiently as possible by using the possibilities of the deep learning frameworks as well as the hardware in the best possible way.

 

In detail, the seminar is divided into the following chapters:

  • Basics of resource consumption during training
  • Use of "Activation Checkpointing"
  • Data Parallel Training
  • Model Parallel Training
  • Pipeline Parallel Training
  • Distributed training
  • Mixed-precision training / use of tensor cores
  • Input pipelines & data handling
  • Data dependencies & Receiptive field & Scaling
  • Performance profiling

Target group:
Our seminar is aimed at software developers who have already gained concrete experience in the development of neural networks or at companies that are active in the field of AI or would like to become active.

The seminar materials as well as practical examples can be viewed and downloaded on our GitLab portal.

Simplification and Compression of Neural Networks

The last KISS seminar addressed the challenges of using large neural models. It presented different techniques for reducing the complexity and the size of a trained model.

The seminar discussed an end-to-end pipeline for model compression. It explained various state-of-the-art compression algorithms, with a focus on quantization and pruning algorithms. Practical use cases using common compression frameworks exemplified typical application scenarios.

By these means, the seminar introduced different compression tools and demonstrated their usability with several practical examples and hands-on exercises.

In more detail, the following topics were addressed:

  • Deep Compression: End-to-End compression
  • Comparison of different compression methods
  • Model Profiling
  • Model adaptations for compression
  • Compression with Pytorch
  • Introduction to Intellabs Distiller
  • Introduction to Microsoft Neural Network Intelligence
  • Recommendations for best practice
  • Comparisons with Compiler Optimizations
  • Useful examples
  • Hands-on exercises

The seminar materials as well as examples will be available online soon.

If you are a company, and have a specific need concerning the training content, the seminar can also be tailored to your specific needs for an individual training. To know more about this possibility, please contact us.

Also, if you have any suggestions, requests or questions about our seminars, feel free to contact us here!

 

Partners

The Friedrich-Alexander-Universität Erlangen-Nürnberg is a project partner.

Back to the KISS Project Overview

Credits Header: Fraunhofer IIS/fotomek – fotolia.de