Please join our mailing list for announcements about new data releases and updates.
OPP-115 Corpus (ACL 2016)
The dataset is made available for research, teaching, and scholarship purposes only, with further parameters in the spirit of a Creative Commons Attribution-NonCommercial License. Contact Prof. Norman Sadeh with any questions.
If you use this dataset as part of a publication, you must cite the following paper:
Shomir Wilson, Florian Schaub, Aswarth Abhilash Dara, Frederick Liu, Sushain Cherivirala, Pedro Giovanni Leon, Mads Schaarup Andersen, Sebastian Zimmeck, Kanthashree Mysore Sathyendra, N. Cameron Russell, Thomas B. Norton, Eduard Hovy, Joel Reidenberg, and Norman Sadeh. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, August 2016.
The above paper is also an essential read for understanding the structure and contents of the corpus.
For information on subsequent releases of the corpus, subscribe to our mailing list.
Download the dataset: OPP-115_v1_0.zip (94.5 MB).
ACL/COLING 2014 Dataset
We created a corpus of 1,010 privacy policies from the top websites ranked on Alexa.com. The privacy policies in the dataset were retrieved in December 2013 and January 2014.
This dataset is made available for research, teaching, and scholarship purposes only, with further parameters in the spirit of a Creative Commons Attribution-NonCommercial License. Contact Prof. Norman Sadeh with any questions.
Download the dataset: acl-coling-2014-corpus.zip (5.5 MB) and supplementary material (pdf).