Building an Arabic Transliterator and Annotating Data Morphologically

Lieferzeit: Lieferbar innerhalb 14 Tagen

28,90 

Using Weka to Annotate our Arabic Corpus Morphologically and Compare it with the Xerox Arabic Analyser’s Results.

ISBN: 3330847751
ISBN 13: 9783330847750
Autor: Al Jumaia, Abdulaziz
Verlag: Noor Publishing
Umfang: 56 S.
Erscheinungsdatum: 02.02.2017
Auflage: 1/2017
Format: 0.4 x 22 x 15
Gewicht: 102 g
Produktform: Kartoniert
Einband: Kartoniert
Artikelnummer: 1898299 Kategorie:

Beschreibung

In this book, we, firstly, discuss related works to ours. Secondly, we create a transliteration program, produce our own corpus, use the Xerox Arabic analyser to morphologically annotate a raw Arabic text, use Weka to train our transliterated corpus, and then, compare the annotation of the Xerox analyser with the results of Weka. The book shows the methods used to create our own transliteration system using a dictionary which maps the Arabic letters with the Latin letters. To do that, we use a raw Arabic text taken from a chapter of the book "Al-Bidayah Wan-Nihayah" for Ibn Kathir and store the results for a later use. the book progresses to discuss the use of the same original text, used previously for transliteration, in the Xerox Arabic analyser which uses a finite-state transducer to annotate the text morphologically. The annotations are, then, selected manually (gold-standard), added to our transliterated text and trained using different algorithms in Weka. Ultimately, the results of Weka are compared with the gold-standard annotation.

Autorenporträt

Abdulaziz bin Yaqoob Yousef Al Jumaia was born in Al Ahsa, Saudi Arabia in 1991. He has a bachelor degree in English linguistics and a master's degree in language and information processing. He is interested in Machine Learning, Machine Translation, Corpus Linguistics and Natural Language Processing.

Herstellerkennzeichnung:


OmniScriptum SRL
Str. Armeneasca 28/1, office 1
2012 Chisinau
MD

E-Mail: info@omniscriptum.com

Das könnte Ihnen auch gefallen …