Hybrid Reactive Autoscaling for Task-Based Pipelines on Kubernetes
We present Python-to-Kubernetes (PTK), a hybrid autoscaling framework for pipeline-oriented, task-based Python applications on Kubernetes. PTK coordinates queue-length-driven horizontal scaling for CPU, memory, and GPU, together with reactive in-place vertical scaling of CPU and memory. The framework introduces source-code annotations, enabling users to define task-specific scaling constraints and automatically generate Kubernetes manifests. A periodic controller uses utilization and queue metrics to coordinate horizontal and vertical scaling, improving resource efficiency while maintaining pipeline performance. In a streaming machine learning (ML) inference pipeline, PTK sustains the target throughput while reducing hourly cost by 40.6%, CPU by 32.1%, and memory by 22.4%, and lowering the GPU count from 4 to 3, compared with an uncoordinated baseline that combines the Horizontal Pod Autoscaler (HPA) and the Vertical Pod Autoscaler (VPA). It also cuts peak cost by 23.6% compared with a queue-driven HPA baseline.
Top
- Nagiyev, Andrey
- Bajrovic, Enes
- Benkner, Siegfried
Top
Category |
Paper in Conference Proceedings or in Workshop Proceedings (Paper) |
Event Title |
The 16th IEEE International Conference on Cloud Computing Technology and Science |
Divisions |
Scientific Computing |
Subjects |
Programmierung Allgemeines Software Engineering Programmiersprachen Anwendungssoftware Systemarchitektur Allgemeines |
Event Location |
Shenzhen, China |
Event Type |
Conference |
Event Dates |
14-16 Nov 2025 |
Date |
2025 |
Export |
Top
