Enabling Small Language Models for Text-to-Process Extraction: Balancing Accuracy and Efficiency through Distant Supervision

Content

Abstract
Authors
Shortfacts

Abstract

In recent years, Large Language Models (LLMs) have received increasing attention for tackling open issues in information systems engineering. A prominent example of this is the extraction of processes from unstructured texts. However, the adoption of LLMs for this task is impeded in many real-world scenarios, due to issues such as hardware limitations, confidential processes, personal information, realtime settings, or concerns about ecological sustainability. In this paper, we, therefore, explore how specialized Small Language Models (SLM) can be used to extract process model information from textual descriptions. We propose a framework for synthesizing distantly supervised training data using LLMs, effectively shifting their use away from online inference toward offline training. We find that SLMs can match the extraction quality of modern LLMs orders of magnitude larger, measured on one of the most popular process model extraction datasets. This makes SLMs promising candidates for implementation in on-premise information systems without the need for prohibitively expensive specialized hardware. We make our approach publicly available

Top

Authors

Neuberger, Julian
van der Aa, Han
Khrop, Ivan
A. López, Hugo

Top

Shortfacts

Category	Paper in Conference Proceedings or in Workshop Proceedings (Paper)
Event Title	International conference on advanced Information Systems Engineering 2026 (CAiSE'26)
Divisions	Workflow Systems and Technology
Subjects	Informatik Allgemeines
Event Location	Verona, Italy
Event Type	Conference
Event Dates	8-12 Jun 2026
Date	June 2026
Export

Top