Software and Data (Masters)
Saif Mohammad
Advisor : Dr.Ted Pedersen
WORD SENSE DISAMBIGUATION
A Word Sense Disambiguation system using WEKA's C4.5 Decision Tree learning algorithm, based on individual and combinations of lexical and syntactic features. The system may be used to disambiguate any data in Senseval-2 data format. SyntaLex README
PART-OF-SPEECH TAGGED, PARSED AND SENSE-TAGGED DATA IN SENSEVAL-2 DATA FORMAT
Senseval-3 Senseval-2 Senseval-1 line hard serve interest
SENSE-TAGGED DATA IN SENSEVAL-2 DATA FORMAT
This is the data used to train and evaluate the word sense disambiguation system.
Senseval-2 (a small sample ) and Senseval-1 (a small sample ) Lexical Sample Space is available at the Senseval webpage .
Senseval-1 data has certain erroneous sense tags (view sample ). The cleaned up data: Csenseval1sampleREADME (May15, 2002)
Corrected Senseval-1 data in Senseval-2 data format: CSenseval1in2sampleREADME (May15, 2002)Packages to convert LINE, HARD, SERVE and INTEREST data to Senseval-1 and Senseval-2 data formats:
LineOneTwo README (Jan, 2003) HardOneTwo README (Jan, 2003) ServeOneTwoREADME (Jan, 2003) InterestOneTwoREADME (Jan, 2003)
Line data (a small sample ) in : Senseval-1 data format: Line-S1sample Senseval-2 data format: Line-S2 sample
Hard data (a small sample ) in : Senseval-1 data format: Hard-S1sample Senseval-2 data format: Hard-S2 sample
Serve data (a small sample ) in : Senseval-1 data format: Serve-S1sample Senseval-2 data format: Serve-S2 sample
Interest data (a small sample ) in : Senseval-1 data format: Interest-S1sample Senseval-2 data format: Interest-S2 sample
PART-OF-SPEECH TAGGING
Parts-of-speech may be assigned to any data in Senseval-2 data format using the package posSensevalREADME(Feb 23, 2003)The Brill Tagger is utilized to part-of-speech tag the data. Given that we know the part of speech of certain words in the data, the accuracy of tagging may be improved if we pre-tagged these words with their correct part-of- speech. Thereby, providing anchor points around which words may be tagged more reliably. A patch to the Brill Tagger which guarantees pre-tagging and also resolves a problem in the existing pre-tagging may be downloaded from here: BrillPatch README (Feb, 2003).The details of this work can be found in the recently accepted paper, "Guaranteed Pre-Tagging for the Brill Tagger " ABSTRACT. The paper is to appear in the proceedings of the Fourth International Conference on Intelligent Text Processing and Computational Linguistics, CICLing-2003 , February, 2003 in Mexico City.Part-of-speech tagged Senseval-2 format data (Feb 23, 2003),
Senseval-2 test data: test-S2.possample Senseval-2 training data: train-S2.possample
Corrected Senseval-1 test data: test-S1.possample Corrected Senseval-1 training data: train-S1.possample
Line data: line.pos sample Hard data: hard.pos sample Serve data: serve.pos sample Interest data: interest.pos sample
Last updated: Feb. 2005