A chapter-by-chapter walkthrough of cutlass, generated from its source code and tutorial markdown.
cutlass is organized as connected concepts and components. Start broad, then drill down chapter by chapter.
CUTLASS is a high-performance CUDA C++ template library for implementing matrix-matrix multiplication (GEMM) and related linear algebra primitives. It provides hierarchical abstractions to efficiently target NVIDIA GPUs, from Volta to the latest Blackwell architecture, handling complex operations like sparse GEMM, block scaling, and Stream-K scheduling. The project includes a Profiler for benchmarking, extensive unit tests, and a Python-based CuTe DSL for developing kernels with rapid iteration.
Source Repository: https://github.com/NVIDIA/cutlass
Follow sequentially or jump to any topic. Start with Build Configuration.
This tutorial was automatically generated by Code IQ and rendered with the shared tutorial site builder. It can be produced for any repository tutorial folder that follows the numbered markdown chapter layout.
View Code IQ ↗