Database of Arabic Dialects (ضاد)

A website for collaborative collection of Arabic dialect data

The Database of Arabic Dialects (ضاد) is a project to collect the vast amount of published and unpublished data on Arabic dialects in a searchable electronic format. The project is aimed at researchers and students of the Arabic language. This website is the public interface to the database, and focuses on allowing for intuitive and controlled data input, and powerful data visualization. Data input relies on contributions from scholars in the field - user contributions are listed prominently on the website so that contributors can receive credit for their work. The interface will allow for control over data - not all data must be public, so users can take advantage of the features of the website even for private data that is part of a work in progress.

Using this website

For a tour of the website's feature, you can watch the video here.

In the website's current state, users with no account may use all of the data visualization tools but may not input data. The only current public dataset is a collection of demonstrative data that were originally published in Magidow, Alexander. 2013. Towards a Sociohistorical Reconstruction of Pre-Islamic Arabic Dialect Diversity, Ph.D. dissertation: University of Texas at Austin.

There are four main data visualization tools. In the map and list views, you can view all the publicly available data simply by pressing the search button. For more narrrow searches, follow the instructions on the page. In the paradigm view, you can select as many dialects as you want to visualize entire paradigms. Of course, if no data is available, nothing will show up in these paradigms. In the 'cross-search' view, you can do an 'implicational' type search, where you look at dialects which have a particular form, and see what other forms are found in those same dialects.

If you are interested in obtaining a user account in order to enter your own data, please contact the author.

Contributing

This is an open source project, released under the GPL license. The Github repository is here. Note that the source code is open source - the data held in the database is subject to contributor permissions.

There are two primary ways contributors can assist the project at the moment. The first is to submit data, whether via the website or by submitting a CSV type file, and to test data entry and data visualization components. The second is to contribute code, whether python/django, HTML/CSS or Javascript.

Data contributors

We are currently looking for scholars interested in uploading their data to the website to test how well the database can accommodate different data types, as well as to develop more data input views. Interested scholars should contact amagidow AT gmail DOT com, describing their data sets and how/whether they intend to continue entering data.

Code Contributors

Potential code contributors should view the readme in the git repository and contact the lead developer with any questions.

How it works

The database itself is designed around the central concept of the 'language datum', a single piece of linguistic data. Each datum can be any kind of string with meaningful, searchable linguistic information, so in theory the database could hold anything from individual phonemes to idioms. Every datum is marked with a set of tags to allow for easy searching. Each datum is linked to a dialect and to a bibliographic entry, and each datum has permission information and is linked to its contributor. Datums may be in relationships with other datums, with the types of relationships expressed via another set of tags.

The website is based on a Django framework, with a PostgreSQL database backend. Many pages rely on Jquery, Datatables and Leaflet, and some are rendered using Jinja2 templates. All of these tools are open-source, free software, and we plan to open-source this website as well to contribute back to the community. The design of the database is flexible enough to handle almost any language, and many of the database tools are easily adapted to handle other languages. The tag ontologies are entirely free to be defined by the users, and so it should be a useful tool for any language with high dialectal diversity dispersed across a large geographical area. The website is currently hosted by Pythonanywhere.

The database structure has been described in the following article: Magidow, Alexander. 2015. A Database Model and Prototype for Storing Diverse Linguistic Data. In: Journal for Language Technology and Computational Linguistics 30:1.

Citing the website and data

The simplest way to cite the website is as follows. Note that as code contributors increase, the attribution may change:

Magidow, Alexander. 2015. Database of Arabic Dialects. site: http://database-of-arabic-dialects.org/

Most data is drawn from published sources, so you should cite that source as well (and refer to it for further information). Every datum has also been contributed by a scholar, so if you find yourself citing many datums from the same scholar, it would be polite to acknowledge them in the acknowledgements or a footnote in your paper.

This website was developed primarily by Alexander Magidow at the Univeristy of Rhode Island where he is an Assistant Professor of Arabic. The current state of the project is the culmination of nearly 3 years of research and development. He would like to thank Yonatan Belinkov for his help in the design of the database framework, for his suggestion to use Django, and for contributing some code to the project.

This page last updated 5/19/2016

Webmaster and author: Alexander Magidow.