Software Engineering

Software engineering provides a wide range of application possibilities for applied machine learning research. In cooperation with the Department of Software Engineering, the group is active in the fields of software quality, software testing, complex event processing, and Internet of Things.

Recently, the group conducted research on finding security vulnerabilities in source code and provided a manually validated dataset of refactoring transformations on real systems to facilitate software quality research. The group provided a detailed analysis on the differences of byte-code and source-code based instrumentation methods in program analysis. New methods were introduced in the hierarchical delta debugging process.

Current active research areas are as follows:

Machine learning source code, source code similarity
Software quality models
Testing, test suite quality
Fault localization, automated debugging
Test-code traceability

People

László Vidács (contact), Péter Pusztai, Péter Hegedűs, Gergő Balogh, András Kicsi, László Tóth, Balázs Nagy

Projects

Multi-label classification for tagging user feedbacks given in natural language form When users or customers express their expectations relating to the software, they use natural languages. These sentences of feedbacks or requirements often contain more than one aspect of the expectations, therefore, they can be classified more than one classes.The object of the project is to develop machine-learning (and deep-learning) based methods which can be applied to multi-label classification and to develop tagger tool based on these methods. The method is to be extended also to support multi-label tagging process of sentences.

Feature extraction and analysis Product line architecture is a timely answer to chanllenges of maintaining several program variants within the same codebase. Semi-automatic feature extraction starts from the high level feature list provided by domain experts and uses information retrieval and static code analysis techniques to determine the program code implementing a given feature. To support the adoption of the product line architecture in case of separately developed but similar products, feature level similarity and quality metrics are defined. Partners: SZEGED Software Ltd., Hungary Co-evolution analysis of production and test code Motivated by the large amount of automated tests we apply association rule learning on the version control history of software projects to reason about developer behaviour in maintaining tests and production code in parallel. Partners: University of Klagenfurt, Austria Recovering test to code traceability links from source code Recovering traceability links between tests and the class or method that is intended to be tested facilitates software evolution activities like fault localization and program understanding. In case of the JUnit framework a clean naming convention is promoted to establish links. When naming conventions are not followed, we apply natural language processing, information retrieval and static analysis techniques to recover links.

Classification of non-functional requirements Requirements engineering is one of the very first tasks of the software development processes which fundamentally influences the quality of the software under development. The requirements are mostly given in natural language form which can be both functional and non-functional requirements. The non-functional requirements are the foundation of the quality aspects of the software such as security, usability, reliability. Classifying the non-functional requirements is one of the most important tasks of software engineering. The object of the project is to develop machine-learning (and deep-learning) based methods and tools which can support system analysts in classifying non-functional requirements given in natural language form. The collection of classified non-functional requirements can be used for both analysis and design phases.

Selected publications

Kicsi A, Vidács L, Gyimothy T. 2020. TestRoutes: A Manually Curated Method Level Dataset for Test-to-Code Traceability. Proceedings of the 17th International Conference on Mining Software Repositories, MSR 2020. :593-597.

Csuvik V, Horváth D, Horváth F, Vidács L. 2020. Utilizing Source Code Embeddings to Identify Correct Patches. Proceedings of the Second International Workshop on Intelligent Bug Fixing (IBF 2020). :18-25.

Tóth L, Nagy B, Gyimóthy T, Vidács L. 2020. Why Will My Question Be Closed? NLP-Based Pre-Submission Predictions of Question Closing Reasons on Stack Overflow Proceedings of the 42nd International Conference on Software Engineering, NIER Track (ICSE 2020). :105-108.

Kicsi A, Csuvik V, Vidács L, Horváth F, Beszédes Á, Gyimóthy T, Kocsis F. 2019. Feature Analysis using Information Retrieval, Community Detection and Structural Analysis Methods in Product Line Adoption. Journal of Systems and Software. 155:70-90.

Tóth L, Nagy B, Janthó D, Vidács L, Gyimóthy T. 2019. Towards an Accurate Prediction of the Question Quality at Stack Overflow Using a Deep-Learning-Based NLP Approach. Proceedings of ICSOFT 2019, 14th International Conference on Software Technologies. :631-639.

Csuvik V, Kicsi A, Vidács L. 2019. Source Code Level Word Embeddings in Aiding Semantic Test-to-Code Traceability. Proceedings of the 10th International Workshop on Software and Systems Traceability, (SST 2019 @ ICSE). :29-36.

Csuvik V, Kicsi A, Vidács L. 2019. Evaluation of Textual Similarity Techniques in Code Level Traceability. Proceedings of the 19th International Conference on Computational Science and Its Applications (ICCSA 2019). :529-543.

Kicsi A, Vidács L, Csuvik V, Horváth F, Beszédes Á, Kocsis F. 2018. Supporting Product Line Adoption by Combining Syntactic and Textual Feature Extraction. New Opportunities for Software Reuse - 17th International Conference on Software Reuse (ICSR 2018). :1-16.

Hegedüs P, Kádár I, Ferenc R, Gyimóthy T. 2018. Empirical evaluation of software maintainability based on a manually validated refactoring dataset. Information and Software Technology. 95:313-327.

Antal G, Hegedüs P, Tóth Z, Ferenc R, Gyimóthy T. 2018. Static JavaScript Call Graphs: A Comparative Study. SCAM. :177-186.

Kicsi A, Csuvik V, Vidács L, Beszédes Á, Gyimóthy T. 2018. Feature Level Complexity and Coupling Analysis in 4GL Systems. Proceedings of the 18th International Conference on Computational Science and Its Applications (ICCSA 2018).

Bán D, Ferenc R, Siket I, Kiss Á, Gyimóthy T. 2018. Prediction Models for Performance, Power, and Energy Efficiency of Software Executed on Heterogeneous Hardware. Journal of Supercomputing. :25.

Kicsi A, Tóth L, Vidács L. 2018. Exploring the Benefits of Utilizing Conceptual Information in Test-to-Code Traceability. Proceedings of the IEEE/ACM 6th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE 2018 @ ICSE).

Hodován R, Kiss Á, Gyimóthy T. 2018. Grammarinator: A Grammar-based Open Source Fuzzer. Proceedings of the 9th Workshop on Automating Test Case Design, Selection and Evaluation (A-TEST 2018).

Tóth L, Vidács L. 2018. Study of Various Classifiers for Identification and Classification of Non-Functional Requirements. Proceedings of the 18th International Conference on Computational Science and Its Applications (ICCSA 2018). 10964:492-503.

Horváth F, Gergely T, Beszédes Á, Tengeri D, Balogh G, Gyimóthy T. 2017. Code Coverage Differences of {Java} Bytecode and Source Code Instrumentation Tools. Software Quality Journal.

Gábor S, Gábor A, Nagy C, Ferenc R, Gyimóthy T. 2017. Empirical Study on Refactoring Large-scale Industrial Systems and Its Effects on Maintainability. Journal of Systems and Software. 129:107–126.

Tengeri D, Vidács L, Beszédes Á, Jász J, Balogh G, Vancsics B, Gyimóthy T. 2016. Relating Code Coverage, Mutation Score and Test Suite Reducibility to Defect Density. Proceedings of the 9th IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW'16); The 11th International Workshop on Mutation Analysis (MUTATION'16). :174-179.