In linguistic research, it has previously been challenging to test hypotheses about lexical behavior at the scale of the entire lexicon. This effort, led by Aaron Steven White and Kyle Rawlins, aims to collect large-scale annotation sets of the selectional behavior of verbs, based on human acceptability judgments. Such data sets can be used for testing linguistic hypotheses as well as for computational modeling.
The MegaAttitude set is a data set of ordinal acceptability judgments on approximately 1000 verbs of English that embed clauses in some form, for 40 frames each, with 5 observations per verb per frame, collected on Amazon Mechanical Turk. The intent is that this set has complete coverage of all clause-embedding verbs in English. This data set also introduces a method for avoiding item effects in large-scale acceptability judgment studies: we constructed the item templates by replacing arguments with indefinites and other semantically ‘bleached’ items. For example, two frames for the verb see are as follows: (grammatical) Someone saw that something happened and (ungrammatical) Someone saw something to happen.
This work was first presented at SALT 26 in 2016; the SALT paper introduces the data set and discusses some first modeling results aimed at extracting semantic type signatures.
- SALT 26 talk slides [pdf].
- SALT 26 paper, A computational model of S-selection [pdf, prerelease].
- MegaAttitude v1.0: github