Articles
| Open Access |
https://doi.org/10.37547/ajast/Volume05Issue11-08
Mapping The Computational And Memory-Bound Characteristics Of Image Filtering Operations Using The Roofline Model
Abstract
The Roofline model offers a compact way to reason about whether an image-processing kernel is limited by floating-point throughput or by memory bandwidth. This article applies the Roofline framework to canonical filters—generic 3×3 convolution, Sobel edge detection with gradient magnitude, and separable Gaussian blurs (5×5 and 7×7)—to map their computational intensity and predict performance on representative CPU and GPU profiles. We formalize arithmetic intensity under transparent assumptions for single-channel 32-bit images and two-pass separable pipelines, and we derive bandwidth-bound ceilings and compute-bound plateaus for each architecture. Using an illustrative device pair (CPU with 500 GFLOP/s peak and 50 GB/s bandwidth; GPU with 10 TFLOP/s peak and 300 GB/s bandwidth), we show that the studied filters occupy the bandwidth-limited region with intensities between ≈0.38 and ≈1.25 FLOPs/byte, which implies that improving blocking, reuse, and data movement often dominates raw FLOP optimization. A Roofline chart and a comparative table report predicted ceilings in GFLOP/s and converted pixel-throughput ceilings for each kernel. We discuss implications for schedule design, separability, and pipeline organization and argue that Roofline-guided reasoning helps prioritize cache tiling, fusion, and memory-traffic reduction before micro-optimizing scalar FLOPs.
Keywords
Roofline model, arithmetic intensity, memory bandwidth
References
Williams S., Waterman A., Patterson D. Roofline: An Insightful Visual Performance Model for Multicore Architectures // Communications of the ACM. 2009. Vol. 52, No. 4. P. 65–76.
Ilic A., Pratas F., Sousa L. Cache-Aware Roofline Model: Upgrading the Roofline Model with Memory Levels and Caches // IEEE Transactions on Parallel and Distributed Systems. 2014. Vol. 26, No. 4. P. 1178–1190.
Amdahl G. M. Validity of the Single Processor Approach to Achieving Large Scale Computing Capabilities // AFIPS Conference Proceedings. 1967. Vol. 30. P. 483–485.
Gustafson J. L. Reevaluating Amdahl’s Law // Communications of the ACM. 1988. Vol. 31, No. 5. P. 532–533.
Hennessy J. L., Patterson D. A. Computer Architecture: A Quantitative Approach. 6th ed. Amsterdam: Morgan Kaufmann, 2017. 936 p.
He Y., Tjahjadi T. Image Derivative Filters and Edge Detection // Gonzalez R., Woods R. (eds.). Digital Image Processing (companion topics). Upper Saddle River, NJ: Prentice Hall, 2018. P. 215–242.
Smith S. M., Brady J. M. SUSAN—A New Approach to Low Level Image Processing // International Journal of Computer Vision. 1997. Vol. 23, No. 1. P. 45–78.
Farnebäck G. Two-Frame Motion Estimation Based on Polynomial Expansion // Proceedings of the 13th Scandinavian Conference on Image Analysis. 2003. P. 363–370.
Borkar S., Dally W. The Future of Microprocessors // Communications of the ACM. 2011. Vol. 54, No. 5. P. 67–77.
Adams A., Ragan-Kelley J., et al. Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines // Communications of the ACM. 2019. Vol. 61, No. 12. P. 93–102.
OpenMP Architecture Review Board. OpenMP Application Program Interface. Version 5.0. 2018. 300 p.
NVIDIA Corporation. CUDA C Programming Guide. Version 12.x. Santa Clara: NVIDIA, 2024. 450 p.
Article Statistics
Copyright License
Copyright (c) 2025 Gaybullayeva Mahbuba, Ismailov Shixnazar

This work is licensed under a Creative Commons Attribution 4.0 International License.