Lielu valodas modeļu veiktspējas novērtēšana eksāmenos un testos

Noslēguma darbu reģistrs

Noslēguma darbu meklēšana

Studiju darba apraksts

Studiju veids	bakalaura akadēmiskās studijas
Studiju programmas nosaukums	Datorsistēmas
Nosaukums	Lielu valodas modeļu veiktspējas novērtēšana eksāmenos un testos
Nosaukums angļu valodā	Evaluating the Performance of Large Language Models in Exams and Tests
Struktūrvienība	33000 Datorzinātnes, informācijas tehnoloģijas un enerģētikas fakultāte
Darba vadītājs	Alla Anohina-Naumeca
Recenzents	Vadims Zīlnieks
Anotācija	There are many tests of the capabilities of large language models (LLMs), mainly various automated tests. Each of them has its own advantages and limitations. The primary objective of the thesis is the evaluation of LLMs performance on complex reasoning tasks in education. The thesis will examine the testing of models such as GPT-4o, Claude Opus 4, Gemini 2.5 Pro, DeepSeek V3, and LLaMA 4 Maverick. The testing will be based on real test assignments from the artificial intelligence course. The tasks being tested include state-space and graph search, optimization and logical planning, game tree construction and decision-making, as well as basic machine learning and semantic structure interpretation. The results show that the models achieve better performance in the case of prepared answer options, while they performed worse on tasks involving uninformed and heuristic search algorithms on a given graph. The bachelor thesis consists of 54 pages; it contains 1 figures, 1 formula nine tables, four appendices, 61 references.
Atslēgas vārdi	Lielie Valodu Modeļi (LLM), MI izglītībā, Automatizēta novērtēšana, Eksāmenu izvērtēšana, GPT-4o, Claude Opus 4, Gemini 2.5 Pro, DeepSeek V3, LLaMA 4 Maverick, Promptu inženierija, Cilvēka-MI saskaņa.
Atslēgas vārdi angļu valodā	Large Language Models (LLMs), AI in Education, Automated Assessment, Exam Evaluation, GPT-4o, Claude Opus 4, Gemini 2.5 Pro, DeepSeek V3, LLaMA 4 Maverick, Prompt Engineering, Human-AI Concordance.
Valoda	eng
Gads	2025
Darba augšupielādes datums un laiks	02.09.2025 23:50:08