Enabling Small Language Models for Text-to-Process Extraction: Balancing Accuracy and Efficiency through Distant Supervision

Enabling Small Language Models for Text-to-Process Extraction: Balancing Accuracy and Efficiency through Distant Supervision

Abstract

In recent years, Large Language Models (LLMs) have received increasing attention for tackling open issues in information systems engineering. A prominent example of this is the extraction of processes from unstructured texts. However, the adoption of LLMs for this task is impeded in many real-world scenarios, due to issues such as hardware limitations, confidential processes, personal information, realtime settings, or concerns about ecological sustainability. In this paper, we, therefore, explore how specialized Small Language Models (SLM) can be used to extract process model information from textual descriptions. We propose a framework for synthesizing distantly supervised training data using LLMs, effectively shifting their use away from online inference toward offline training. We find that SLMs can match the extraction quality of modern LLMs orders of magnitude larger, measured on one of the most popular process model extraction datasets. This makes SLMs promising candidates for implementation in on-premise information systems without the need for prohibitively expensive specialized hardware. We make our approach publicly available

Grafik Top
Authors
  • Neuberger, Julian
  • van der Aa, Han
  • Khrop, Ivan
  • A. López, Hugo
Grafik Top
Shortfacts
Category
Paper in Conference Proceedings or in Workshop Proceedings (Paper)
Event Title
International conference on advanced Information Systems Engineering 2026 (CAiSE'26)
Divisions
Workflow Systems and Technology
Subjects
Informatik Allgemeines
Event Location
Verona, Italy
Event Type
Conference
Event Dates
8-12 Jun 2026
Date
June 2026
Export
Grafik Top