Cukurova NLP (CuNLP) Research Group

Cukurova NLP (CuNLP) is a Natural Language Processing research group operating within the Department of Computer Engineering at Cukurova University.

Our group focuses primarily on Turkish NLP, developing innovative solutions at the intersection of artificial intelligence and linguistics.

Current Focus

We are currently working on Large Language Models (LLMs), particularly on hallucination and reasoning problems.
We welcome hardworking and motivated students who are passionate about LLMs and eager to explore these topics.

Undergraduate and graduate students interested in joining our research are encouraged to apply.
Prior experience or coursework in Machine Learning, Natural Language Processing, or related areas will be considered an advantage.

Collaborations with researchers and groups sharing similar interests are also highly welcome.

Our Team

Current Members

Dr. Umut Orhan – Group Lead
Research Interests: Natural Language Processing, Large Language Models, and Semantic Representation
Dr. B. Tahir Tahiroğlu – Turkish Linguist
Focus: Turkish syntax, morphology, and linguistic resources for NLP
Ferhat Albayrak (Ph.D. Student) – Evaluating reasoning capabilities of Large Language Models (LLMs)
Melisa Biçer (Ph.D. Student) – Enhancing reasoning in LLMs through graph-based systems
Eren Demir (M.Eng. Student) – Measuring semantic similarity in LLMs
Arda Mülayim (M.Eng. Student) – Detecting hallucinations in LLMs
Çağrı Sayallar (B.Sc. Student) – BERT-based Turkish lemmatization

Former Members

Çağatay N. Tülü (Ph.D.) – Developed the SemSpace semantic space model using WordNet relations
Enis Arslan (Ph.D.) – Focused on Turkish morphology and morphological parsing
Erhan Turan (Ph.D.) – Designed a machine-readable dictionary for Turkish
Elif Gülfidan Dayıoğlu (Ph.D.) – Worked on open-ended exam evaluation using SemSpace vectors and deep learning

Resources and Tools

CU-CE – A Large Language Model (LLM) based chatbot prepared for Çukurova University Computer Engineering Department students.
🔗 https://t.me/CU_CengBOT
Turkish Corpus for Morphological Disambiguation
If you use this corpus in your publication, please cite:
U. Orhan, E. Arslan. “Learning Word-Vector Quantization: A Case Study in Morphological Disambiguation,” Transactions on Asian and Low-Resource Language Information Processing, 19(5), 72, 2020.
🔗 Download Dataset
Learning Word-Vector Quantization (in MATLAB)
If you use this code or dataset in your publication, please cite the same paper as above.
🔗 Download Code
CU-NLP Dataset for Automatic Short Answer Grading
If you use this dataset in your publication, please cite:
C.N. Tulu, O. Ozkaya, U. Orhan. “Short Answer Grading with SemSpace Sense Vectors and MaLSTM,” IEEE Access, 9, 19270–19280, 2021.
🔗 Download Dataset
Synset Vectors Computed by Generalized SemSpace
If you use this dataset in your publication, please cite:
U. Orhan, E.G. Tosun, O. Ozkaya. “Intent Detection Using Contextualized Deep SemSpace,” Arabian Journal for Science and Engineering, Volume 48, pages 2009–2020, 2023.
🔗 Download Dataset

Join Us

We are looking for motivated master’s and Ph.D. students who are passionate about Artificial Intelligence and Natural Language Processing.
If you are interested in joining our research group or collaborating with us, please contact Dr. Umut Orhan directly via email at uorhan@cu.edu.tr.

Undergraduate students with strong interest in Large Language Models (LLMs), reasoning, or hallucination detection are also welcome to get involved in our ongoing projects.