Compilation of an Arabic children’s corpus

  • Latifa Al-Sulaiti
  • , Noorhan Abbas
  • , Claire Brierley
  • , Eric Atwell
  • , Ayman Alghamdi

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

Abstract

Inspired by the Oxford Children's Corpus, we have developed a prototype corpus of Arabic texts written and/or selected for children. Our Arabic Children's Corpus of 2950 documents and nearly 2 million words has been collected manually from the web during a 3-month project. It is of high quality, and contains a range of different children's genres based on sources located, including classic tales from The Arabian Nights, and popular fictional characters such as Goha. We anticipate that the current and subsequent versions of our corpus will lead to interesting studies in text classification, language use, and ideology in children's texts.
Original languageEnglish
Title of host publicationProceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), 23-28 May 2016, Portorož, Slovenia
EditorsBente Maegaard, Hélène Mazo, Asunción Moreno, Khalid Choukri, Marko Grobelnik, Joseph Mariani, Nicoletta Calzolari, Thierry Declerck, Sara Goggi, Jan Odijk, Stelios Piperidis
Place of PublicationParis, France
PublisherEuropean Language Resources Association (ELRA)
Pages1808-1812
ISBN (Print)9782951740891
Publication statusPublished online - 2016

Keywords

  • Arabic
  • children's corpus
  • genre classification

Fingerprint

Dive into the research topics of 'Compilation of an Arabic children’s corpus'. Together they form a unique fingerprint.

Cite this