Syrian Tweets Arabic Sentiment Analysis dataset Version 1.0 17 March 2015 Copyright (C) 2015 National Research Council Canada (NRC) Contact: Mohammad Salameh (msalameh@ualberta.ca) Saif Mohammad (saif.mohammad@nrc-cnrc.gc.ca) Svetlana Kiritchenko (Svetlana.Kiritchenko@nrc-cnrc.gc.ca) Terms of use: 1. This dataset can be used freely for research purposes. 2. The papers listed below provide details of the creation and use of the dataset. If you use a dataset, then please cite the associated papers. 3. If you use the dataset in a product or application, then please credit the authors and NRC appropriately. Also, if you send us an email, we will be thrilled to know about how you have used the dataset. 4. National Research Council Canada (NRC) disclaims any responsibility for the use of the dataset and does not provide technical support. However, the contact listed above will be happy to respond to queries and clarifications. 5. Rather than redistributing the data, please direct interested parties to this page: [ADD LINK] Please feel free to send us an email: - with feedback regarding the datasets. - with information on how you have used the dataset. - if interested in a collaborative research project. ....................................................................... Syrian Tweets Arabic Sentiment Analysis dataset ---------------------------------- Syrian tweets dataset has 2000 tweets originating from Syria (a country where Levantine dialectal Arabic is commonly spoken). These tweets were collected in May 2014 by polling the Twitter API. This dataset is not provided with manual English translation. We manually annotated this subset and its translations (both manual and automatic) for sentiment (positive, negative, or neutral). ....................................................................... PUBLICATIONS ------------ Details of the Syrian Tweets dataset and its use in an Arabic sentiment analysis system can be found in the following peer-reviewed publications: --Sentiment After Translation: A Case-Study on Arabic Social Media Posts. Mohammad Salameh, Saif M Mohammad and Svetlana Kiritchenko, In Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL-2015), June 2015, Denver, Colorado. --How Translation Alters Sentiment. Saif M Mohammad, Mohammad Salameh, and Svetlana Kiritchenko, In Journal of Artificial Intelligence Research, in press. Links to the papers are available here: http://saifmohammad.com/WebPages/WebDocs/arabicSA-JAIR.pdf http://aclweb.org/anthology/N/N15/N15-1078.pdf ....................................................................... VERSION INFORMATION ------------------- Version 1.0 is the first version as of 17 March 2015. ....................................................................... FORMAT ------ The Syrian Tweets dataset has sheets with these title 1)ar_manual.sent.: The manual sentiment annotation for the Arabic posts 2)en_auto.trans-manl.sent: has the automatic translation that are manually annotated for sentiment Annotation categories are: positive, negative, neutral and both. The above datasets are also provided with the confidence of the annotation calculated by CrowdFlower. .......................................................................