Hybrid Reactive Autoscaling for Task-Based Pipelines on Kubernetes

Hybrid Reactive Autoscaling for Task-Based Pipelines on Kubernetes

Abstract

We present Python-to-Kubernetes (PTK), a hybrid autoscaling framework for pipeline-oriented, task-based Python applications on Kubernetes. PTK coordinates queue-length-driven horizontal scaling for CPU, memory, and GPU, together with reactive in-place vertical scaling of CPU and memory. The framework introduces source-code annotations, enabling users to define task-specific scaling constraints and automatically generate Kubernetes manifests. A periodic controller uses utilization and queue metrics to coordinate horizontal and vertical scaling, improving resource efficiency while maintaining pipeline performance. In a streaming machine learning (ML) inference pipeline, PTK sustains the target throughput while reducing hourly cost by 40.6%, CPU by 32.1%, and memory by 22.4%, and lowering the GPU count from 4 to 3, compared with an uncoordinated baseline that combines the Horizontal Pod Autoscaler (HPA) and the Vertical Pod Autoscaler (VPA). It also cuts peak cost by 23.6% compared with a queue-driven HPA baseline.

Grafik Top
Authors
  • Nagiyev, Andrey
  • Bajrovic, Enes
  • Benkner, Siegfried
Grafik Top
Shortfacts
Category
Paper in Conference Proceedings or in Workshop Proceedings (Paper)
Event Title
The 16th IEEE International Conference on Cloud Computing Technology and Science
Divisions
Scientific Computing
Subjects
Programmierung Allgemeines
Software Engineering
Programmiersprachen
Anwendungssoftware
Systemarchitektur Allgemeines
Event Location
Shenzhen, China
Event Type
Conference
Event Dates
14-16 Nov 2025
Date
2025
Export
Grafik Top