Enabling Small Language Models for Text-to-Process Extraction: Balancing Accuracy and Efficiency through Distant Supervision
In recent years, Large Language Models (LLMs) have received increasing attention for tackling open issues in information systems engineering. A prominent example of this is the extraction of processes from unstructured texts. However, the adoption of LLMs for this task is impeded in many real-world scenarios, due to issues such as hardware limitations, confidential processes, personal information, realtime settings, or concerns about ecological sustainability. In this paper, we, therefore, explore how specialized Small Language Models (SLM) can be used to extract process model information from textual descriptions. We propose a framework for synthesizing distantly supervised training data using LLMs, effectively shifting their use away from online inference toward offline training. We find that SLMs can match the extraction quality of modern LLMs orders of magnitude larger, measured on one of the most popular process model extraction datasets. This makes SLMs promising candidates for implementation in on-premise information systems without the need for prohibitively expensive specialized hardware. We make our approach publicly available
Top
- Neuberger, Julian
- van der Aa, Han
- Khrop, Ivan
- A. López, Hugo
Top
Category |
Paper in Conference Proceedings or in Workshop Proceedings (Paper) |
Event Title |
International conference on advanced Information Systems Engineering 2026 (CAiSE'26) |
Divisions |
Workflow Systems and Technology |
Subjects |
Informatik Allgemeines |
Event Location |
Verona, Italy |
Event Type |
Conference |
Event Dates |
8-12 Jun 2026 |
Date |
June 2026 |
Export |
Top
