Hybrid Reactive Autoscaling for Task-Based Pipelines on Kubernetes

Content

Abstract
Authors
Shortfacts

Abstract

We present Python-to-Kubernetes (PTK), a hybrid autoscaling framework for pipeline-oriented, task-based Python applications on Kubernetes. PTK coordinates queue-length-driven horizontal scaling for CPU, memory, and GPU, together with reactive in-place vertical scaling of CPU and memory. The framework introduces source-code annotations, enabling users to define task-specific scaling constraints and automatically generate Kubernetes manifests. A periodic controller uses utilization and queue metrics to coordinate horizontal and vertical scaling, improving resource efficiency while maintaining pipeline performance. In a streaming machine learning (ML) inference pipeline, PTK sustains the target throughput while reducing hourly cost by 40.6%, CPU by 32.1%, and memory by 22.4%, and lowering the GPU count from 4 to 3, compared with an uncoordinated baseline that combines the Horizontal Pod Autoscaler (HPA) and the Vertical Pod Autoscaler (VPA). It also cuts peak cost by 23.6% compared with a queue-driven HPA baseline.

Top

Authors

Nagiyev, Andrey
Bajrovic, Enes
Benkner, Siegfried

Top

Shortfacts

Category	Paper in Conference Proceedings or in Workshop Proceedings (Paper)
Event Title	The 16th IEEE International Conference on Cloud Computing Technology and Science
Divisions	Scientific Computing
Subjects	Programmierung Allgemeines Software Engineering Programmiersprachen Anwendungssoftware Systemarchitektur Allgemeines
Event Location	Shenzhen, China
Event Type	Conference
Event Dates	14-16 Nov 2025
Date	2025
Export

Top