Kurdish language needs more standardization for AI personal assistants: expert

“We need more people to work on the Kurdish language in the field of Natural Language Processing."
Peshmerge Morad, software engineer, earlier this month graduated with his MA-thesis  Part-of-Speech tagging for Northern Kurdish at the University of Twente (Photo: Peshmerge Morad).
Peshmerge Morad, software engineer, earlier this month graduated with his MA-thesis Part-of-Speech tagging for Northern Kurdish at the University of Twente (Photo: Peshmerge Morad).

ERBIL (Kurdistan 24) -  Peshmerge Morad, a Dutch Kurdish software engineer, who earlier this month graduated with his MA-thesis  Part-of-Speech tagging for Northern Kurdish (Kurmanji) at the Dutch University of Twente underlined that there is need for more people to work on Kurdish language in the Natural Language Processing in order for AI personal assistants like Amazon Alexa, Google Voice, or Apple Siri to understand Kurdish.

Morad, originally from Afrin, fled from Syria in 2012 due to his political activities against the Syrian government at the University of Aleppo, and lives in the Netherlands.

“We need more people to work on the Kurdish language in the field of NLP. In addition, Kurdish language institutions must work on the language standardization process. One of our biggest obstacles when we want to work on the Kurdish-NLP is the lack of standardization,” he told Kurdistan 24.  

Natural Language Processing enables computer systems to understand and comprehend information the same way humans do.

“My research topic is Part-of-Speech tagging for Northern Kurdish (Kurmanji). Part-of-speech is a sub-task in Natural Language Processing (NLP). NLP is a subfield in artificial intelligence where we aim to teach the machine how to speak and understand human languages. In this case, Northern Kurdish.” 

“Part-of-speech is vital for other NLP tasks such as machine translation (Google Translate) or speech recognition (Amazon Alex). Thus, if you want to make sure that applications like Google Translate or Amazon Alexa works well for Northern Kurdish, you have to do the task of Part-of-speech correctly,” he said. 

He added that the Kurdish language is considered a low-resource language (English and German are high-resource languages). “Because we don't have enough resources (annotated and labeled data) to facilitate Kurdish-NLP research. This means that working on any research in the field of NLP for the Kurdish language is very challenging.” 

“Suppose we want to have good translation systems, AI personal assistants like Amazon Alexa, Google Voice, or Apple Siri that can understand and communicate in Kurdish; then, we must work more on the Kurdish language in the field of NLP. “

He also said that with the recent developments in artificial intelligence, in the future, machines and the internet are becoming increasingly integrated into our daily lives. 

“In the coming years, talking to every appliance in your house will become very normal (it's already there). Imagine living in Kurdistan, and you have to speak in Arabic, Turkish, Persian, or English with those appliances because they don't understand Kurdish or they can't understand your dialect or accent!” 

“So, if we want to be represented and save the Kurdish language, we have to put more effort into getting the Kurdish language, regardless of dialect, more represented. This asks for coordinated efforts facilitated and funded by our political parties and educational institutions.”