Automatic Software Requirements Extraction from Natural Language Texts Using Natural Language Processing and Large Language Models

Document Type : Original Article

Authors
1 Yazd Islamic Azad University
2 Islamic Azad University, Central Tehran Branch
3 Department of Computer Engineering, Islamic Azad University, Meybod Branch, Iran
Abstract
Automatic extraction of software requirements from natural language texts remains a central challenge in Requirements Engineering due to ambiguity, polysemy, and the heterogeneity of information sources. In recent years, approaches based on Natural Language Processing (NLP) have provided more controllable and structured outputs; however, they face limitations when dealing with implicit expressions, complex sentence structures, and unstructured data. In contrast, Large Language Models (LLMs), with their semantic understanding and reasoning capabilities, demonstrate strong potential for processing complex textual content and conversational data. Nevertheless, output instability, prompt sensitivity, and the risk of generating inaccurate or fabricated content make their direct application in engineering contexts challenging.

This study adopts an analytical–comparative approach to examine a reference framework based on a large language model and compares it with three representative approaches for requirements extraction from documents and user feedback. The findings indicate that although LLM-based approaches offer advantages in handling unstructured data, achieving reliable outputs requires standardization mechanisms and robust quality control strategies. Accordingly, the study emphasizes the necessity of developing hybrid approaches that integrate semantic intelligence with structured control mechanisms.
Keywords