
Something happened in 2016 - I found myself drawn into the world of Natural Language Processing. And not just by curiosity, but through a nationally funded research project. Supported by TÜBİTAK, the Scientific and Technological Research Council of Turkey (Project No. 215E256), our core team at Çukurova University set out to answer a deceptively simple question:
If a machine is ever to understand Turkish - truly understand it beyond a sequence of letters - shouldn’t it also have its own dictionary?
But not an ordinary dictionary.
A living one.
A dictionary that could update itself — a web of words that continuously evolves.
That vision became the foundation of our project, “The Live Turkish Dictionary Network Design with Weighted Graphs,” which I led from April 2016 to April 2019.
It was completed on schedule, without delay, but the real story lies in how it reshaped our understanding of language itself.
Natural Language Processing (NLP) sits at the crossroads of linguistics and artificial intelligence — a field where morphology meets meaning.
But working with Turkish was far from easy. Unlike English, where morphology is relatively light, Turkish is an agglutinative language — each suffix adds a new layer of meaning, blurring the boundary between form and semantics. Finding the true stem of a word can feel like solving a riddle.
We began our journey with these challenges. Our goal was not only to design a morphological analyzer, but also to build an intelligent system capable of automatically discovering semantic relations among words in the dictionary.
Traditional resources such as WordNet had long guided this type of research, yet their manual, expert-based construction was slow and expensive. We dreamed of something more dynamic — a dictionary that could learn from its own patterns.
At the project’s core was a finite-state morphological analyzer tailored for Turkish morphosyntax.
We implemented Viterbi-based disambiguation to reduce uncertainty and proposed a novel stem discovery algorithm based on recurrence statistics.
Meanwhile, our system began to study thousands of definitions from the TDK Contemporary Turkish Dictionary, gradually recognizing recurring definitional patterns. These patterns revealed hidden links — synonymy, hierarchy, and antonymy — between words.
Using our MentionSense architecture, we transformed these findings into a weighted semantic graph that connected meanings rather than mere entries.
Each edge in this network carried a measurable strength, allowing the system to reason about the closeness of concepts.
In the end, what we built was more than a dictionary — it was a computational model of how meanings resonate across the Turkish language.
Through this network, we discovered that semantic proximity could be quantified using graph-theoretic measures such as PageRank and bidirectional accessibility.
This enabled not only the detection of synonym clusters but also the identification of new lemma candidates.
We even devised a procedure to generate definition suggestions for newly formed or compound words.
In essence, the project showed that Turkish — with all its morphological depth — can be modeled as a weighted, dynamic semantic network.
It revealed that language is not a static tree with roots and branches, but a living web that constantly rebalances itself.
I was fortunate to lead a passionate and interdisciplinary team:
Our work resulted in a number of journal and conference publications, including: