The Persian Learner Corpus: Introducing Design Criteria, Annotation and the Corpus Tools
Paper ID : 1059-ICIL
Authors:
Saeed ُSafari *
Faculty of Philology, University of Belgrade, Serbia
Abstract:
The Salam Farsi Learner Corpus (SFLC) is the first developed learner corpus of the Persian language which is designed to systematically collect the learners written, spoken, mixed and media production while learning Farsi. The aim of the corpus is to detect and classify linguistic errors based on some metadata such as the first language, age, gender, etc. The SFLC software is equipped with four main tools in order to function as an error-tagged learner corpus and provide the statistical reports. These tools include a tool for submitting data and metadata into the corpus database, a computer-aided error editor to facilitate error tagging, filters and search, and data statistics tools which show various statistical data related to the corpus. Based on the SFLC statistical reports, the frequency and error distribution in the corpus could be determined. In the present paper the proposed design criteria for developing the learner corpus for Persian, Error tagging system and the corpus tools are introduced .
Keywords:
Learner Corpus, Corpus Linguistics, Error Analysis, Teaching Persian to Non-Persian Speakers
Status : Paper Accepted
10th International Iranian Conference on Linguistics
login