Articals

Predicting the Semantic Orientation of Adjectives Vasileios Hatzivassiloglou and Kathleen R. McKeown Department of Computer Science 450 Computer Science Building Columbia University New York, N.Y. 10027, USA {vh, kathy)©cs, columbia, edu Abstract We identify and validate from a large cor- pus constraints from conjunctions on the positive or negative semantic orientation of the conjoined adjectives. A log-linear regression model uses these constraints to predict whether conjoined adjectives are of same or different orientations, achiev- ing 82% accuracy in this task when each conjunction is considered independently.

Combining the constraints across many ad- jectives, a clustering algorithm separates the adjectives into groups of different orien- tations, and finally, adjectives are labeled positive or negative. Evaluations on real data and simulation experiments indicate high levels of performance: classification precision is more than 90% for adjectives that occur in a modest number of conjunc- tions in the corpus. 1 Introduction The semantic orientation or polarity of a word indi- cates the direction the word deviates from the norm for its semantic group or lezical field (Lehrer, 1974).

It also constrains the word's usage in the language (Lyons, 1977), due to its evaluative characteristics (Battistella, 1990). For example, some nearly syn- onymous words differ in orientation because one im- plies desirability and the other does not (e.g., sim- ple versus simplisfic).

In linguistic constructs such as conjunctions, which impose constraints on the se- mantic orientation of their arguments (Anscombre and Ducrot, 1983; Elhadad and McKeown, 1990), the choices of arguments and connective are mutu- ally constrained, as illustrated by: The tax proposal was simple and well-received } simplistic but well-received *simplistic and well-received by the public. In addition, almost all antonyms have different se- mantic orientations3 If we know that two words relate to the same property (for example, members of the same scalar group such as hot and cold) but have different orientations, we can usually infer that they are antonyms. Given that semantically similar words can be identified automatically on the basis of distributional properties and linguistic cues (Brown et al., 1992; Pereira et al., 1993; Hatzivassiloglou and McKeown, 1993), identifying the semantic orienta- tion of words would allow a system to further refine the retrieved semantic similarity relationships, ex- tracting antonyms.

Unfortunately, dictionaries and similar sources (theusari, WordNet (Miller et al., 1990)) do not in- clude semantic orientation information. 2 Explicit links between antonyms and synonyms may also be lacking, particularly when they depend on the do- main of discourse; for example, the opposition bear- bull appears only in stock market reports, where the two words take specialized meanings.

In this paper, we present and evaluate a method that automatically retrieves semantic orientation in- formation using indirect information collected from a large corpus. Because the method relies on the cor- pus, it extracts domain-dependent information and automatically adapts to a new domain when the cor- pus is changed. Our method achieves high preci- sion (more than 90%), and, while our focus to date has been on adjectives, it can be directly applied to other word classes. Ultimately, our goal is to use this method in a larger system to automatically identify antonyms and distinguish near synonyms.

2 Overview of Our Approach Our approach relies on an analysis of textual corpora that correlates linguistic features, or indicators, with 1 Exceptions include a small number of terms that are both negative from a pragmatic viewpoint and yet stand in all antonymic relationship; such terms frequently lex- icalize two unwanted extremes, e.g., verbose-terse. 2 Except implicitly, in the form of definitions and us- age examples. 174 semantic orientation. While no direct indicators of positive or negative semantic orientation have been proposed 3, we demonstrate that conjunctions be- tween adjectives provide indirect information about orientation. For most connectives, the conjoined ad- jectives usually are of the same orientation: compare fair and legitimate and corrupt and brutal which ac- tually occur in our corpus, with ~fair and brutal and *corrupt and legitimate (or the other cross-products of the above conjunctions) which are semantically anomalous. The situation is reversed for but, which usually connects two adjectives of different orienta- tions.

The system identifies and uses this indirect infor- mation in the following stages:

1. All conjunctions of adjectives are extracted from the corpus along with relevant morpho- logical relations. 2. A log-linear regression model combines informa- tion from different conjunctions to determine if each two conjoined adjectives are of same or different orientation. The result is a graph with hypothesized same- or different-orientation links between adjectives. 3. A clustering algorithm separates the adjectives into two subsets of different orientation. It places as many words of same orientation as possible into the same subset. 4. The average frequencies in each group are com- pared and the group with the higher frequency is labeled as positive.

In the following sections, we first present the set of adjectives used for training and evaluation. We next validate our hypothesis that conjunctions con- strain the orientation of conjoined adjectives and then describe the remaining three steps of the algo- rithm. After presenting our results and evaluation, we discuss simulation experiments that show how our method performs under different conditions of sparseness of data.

3 Data Collection For our experiments, we use the 21 million word 1987 Wall Street Journal corpus 4, automatically an- notated with part-of-speech tags using the PARTS tagger (Church, 1988).

In order to verify our hypothesis about the ori- entations of conjoined adjectives, and also to train and evaluate our subsequent algorithms, we need a 3Certain words inflected with negative affixes (such as in- or un-) tend to be mostly negative, but this rule applies only to a fraction of the negative words. Further- more, there are words so inflected which have positive orientation, e.g., independent and unbiased. 4Available form the ACL Data Collection Initiative as CD ROM 1. Positive: adequate central clever famous intelligent remarkable reputed sensitive slender thriving Negative: contagious drunken ignorant lanky listless primitive strident troublesome unresolved unsuspecting Figure 1: Randomly selected adjectives with positive and negative orientations.

set of adjectives with predetermined orientation la- bels. We constructed this set by taking all adjectives appearing in our corpus 20 times or more, then re- moving adjectives that have no orientation. These are typically members of groups of complementary, qualitative terms (Lyons, 1977), e.g., domestic or medical.

We then assigned an orientation label (either + or -) to each adjective, using an evaluative approach.

The criterion was whether the use of this adjective ascribes in general a positive or negative quality to the modified item, making it better or worse than a similar unmodified item. We were unable to reach a unique label out of context for several adjectives which we removed from consideration; for example, cheap is positive if it is used as a synonym of in- expensive, but negative if it implies inferior quality.

The operations of selecting adjectives and assigning labels were performed before testing our conjunction hypothesis or implementing any other algorithms, to avoid any influence on our labels. The final set con- tained 1,336 adjectives (657 positive and 679 nega- tive terms). Figure 1 shows randomly selected terms from this set.

To further validate our set of labeled adjectives, we subsequently asked four people to independently label a randomly drawn sample of 500 of these adjectives. They agreed with us that the posi- tive/negative concept applies to 89.15% of these ad- jectives on average. For the adjectives where a pos- itive or negative label was assigned by both us and the independent evaluators, the average agreement on the label was 97.38%. The average inter-reviewer agreement on labeled adjectives was 96.97%. These results are extremely significant statistically and compare favorably with validation studies performed for other tasks (e.g., sense disambiguation) in the past. They show that positive and negative orien- tation are objective properties that can be reliably determined by humans.

To extract conjunctions between adjectives, we used a two-level finite-state grammar, which covers complex modification patterns and noun-adjective apposition. Running this parser on the 21 mil- lion word corpus, we collected 13,426 conjunctions of adjectives, expanding to a total of 15,431 con- joined adjective pairs. After morphological trans- 175 Conjunction category Conjunction types analyzed All appositive and conjunctions All conjunctions 2,748 All and conjunctions 2,294 All or conjunctions 305 All but conjunctions 214 All attributive and conjunctions 1,077 All predicative and conjunctions 860 30 % same- orientation (types) 77.84% 81.73% 77.05% 30.84% 80.04% 84.77% 70.00% % same- orientation (tokens) 72.39% 78.07% 60.97% 25.94% 76.82% 84.54% 63.64% P-Value (for types) < i • i0 -I~ < 1 • 10 -1~ < 1 • 10 -1~ 2.09.10 -:~ < 1.

i0 -16 < 1.

i0 -I~ 0.04277 Table 1: Validation of our conjunction hypothesis. The P-value is the probability that similar extreme results would have been obtained if same- and different-orientation conjunction types were equally distributed. or more actually formations, the remaining 15,048 conjunction tokens involve 9,296 distinct pairs of conjoined adjectives (types). Each conjunction token is classified by the parser according to three variables: the conjunction used (and, or, bu~, either-or, or neither-nor), the type of modification (attributive, predicative, appos- itive, resultative), and the number of the modified noun (singular or plural).

4 Validation of the Conjunction Hypothesis Using the three attributes extracted by the parser, we constructed a cross-classification of the conjunc- tions in a three-way table. We counted types and to- kens of each conjoined pair that had both members in the set of pre-selected labeled adjectives discussed above; 2,748 (29.56%) of all conjoined pairs (types) and 4,024 (26.74%) of all conjunction occurrences (tokens) met this criterion. We augmented this ta- ble with marginal totals, arriving at 90 categories, each of which represents a triplet of attribute values, possibly with one or more "don't care" elements. We then measured the percentage of conjunctions in each category with adjectives of same or differ- ent orientations. Under the null hypothesis of same proportions of adjective pairs (types) of same and different orientation in a given category, the num- ber of same- or different-orientation pairs follows a binomial distribution with p = 0.5 (Conover, 1980).

We show in Table 1 the results for several repre- sentative categories, and summarize all results be- low: • Our conjunction hypothesis is validated overall and for almost all individual cases. The results are extremely significant statistically, except for a few cases where the sample is small.

• Aside from the use of but with adjectives of different orientations, there are, rather surpris- ingly, small differences in the behavior of con- junctions between linguistic environments (as represented by the three attributes). There are a few exceptions, e.g., appositive and conjunc- tions modifying plural nouns are evenly split between same and different orientation. But in these exceptional cases the sample is very small, and the observed behavior may be due to chance.

• Further analysis of different-orientation pairs in conjunctions other than but shows that con- joined antonyms are far more frequent than ex- pected by chance, in agreement with (Justeson and Katz, 1991). 5 Prediction of Link Type The analysis in the previous section suggests a base- line method for classifying links between adjectives:

since 77.84% of all links from conjunctions indicate same orientation, we can achieve this level of perfor- mance by always guessing that a link is of the same- orientation type. However, we can improve perfor- mance by noting that conjunctions using but exhibit the opposite pattern, usually involving adjectives of different orientations. Thus, a revised but still sim- ple rule predicts a different-orientation link if the two adjectives have been seen in a but conjunction, and a same-orientation link otherwise, assuming the two adjectives were seen connected by at least one conjunction.

Morphological relationships between adjectives al- so play a role. Adjectives related in form (e.g., ade- quate-inadequate or thoughtful-thoughtless) almost always have different semantic orientations. We im- plemented a morphological analyzer which matches adjectives related in this manner. This process is highly accurate, but unfortunately does not apply to many of the possible pairs: in our set of 1,336 labeled adjectives (891,780 possible pairs), 102 pairs are morphologically related; among them, 99 are of different orientation, yielding 97.06% accuracy for the morphology method. This information is orthog- onal to that extracted from conjunctions: only 12 of the 102 morphologically related pairs have been observed in conjunctions in our corpus. Thus, we 176 Prediction method Always predict same orientation But rule Log-linear model Morphology used?

No Yes No Yes Accuracy on reported same-orientation links 77.84% 78.18% 81.81% 82.20% No Yes 81.53% 82.00% Accuracy on reported different-orientation links 97.06% 69.16% 78.16% 73.70% 82.44% Table 2: Accuracy of several link prediction models. Overall accuracy 77.84% 78.86% 80.82% 81.75% 80.97% 82.05% add to the predictions made from conjunctions the different-orientation links suggested by morphologi- cal relationships. We improve the accuracy of classifying links de- rived from conjunctions as same or different orienta- tion with a log-linear regression model (Santner and Duffy, 1989), exploiting the differences between the various conjunction categories. This is a generalized linear model (McCullagh and Nelder, 1989) with a linear predictor = wWx where x is the vector of the observed counts in the various conjunction categories for the particular ad- jective pair we try to classify and w is a vector of weights to be learned during training. The response y is non-linearly related to r/ through the inverse logit function, e0 Y= l q-e" Note that y E (0, 1), with each of these endpoints associated with one of the possible outcomes.

We have 90 possible predictor variables, 42 of which are linearly independent. Since using all the 42 independent predictors invites overfitting (Duda and Hart, 1973), we have investigated subsets of the full log-linear model for our data using the method of iterative stepwise refinement: starting with an ini- tial model, variables are added or dropped if their contribution to the reduction or increase of the resid- ual deviance compares favorably to the resulting loss or gain of residual degrees of freedom. This process led to the selection of nine predictor variables.

We evaluated the three prediction models dis- cussed above with and without the secondary source of morphology relations. For the log-linear model, we repeatedly partitioned our data into equally sized training and testing sets, estimated the weights on the training set, and scored the model's performance on the testing set, averaging the resulting scores. 5 Table 2 shows the results of these analyses. Al- though the log-linear model offers only a small im- provement on pair classification than the simpler but prediction rule, it confers the important advantage 5When morphology is to be used as a supplementary predictor, we remove the morphologically related pairs from the training and testing sets. of rating each prediction between 0 and 1. We make extensive use of this in the next phase of our algo- rithm.

6 Finding Groups of Same-Oriented Adjectives The third phase of our method assigns the adjectives into groups, placing adjectives of the same (but un- known) orientation in the same group. Each pair of adjectives has an associated dissimilarity value between 0 and 1; adjectives connected by same- orientation links have low dissimilarities, and con- versely, different-orientation links result in high dis- similarities. Adjective pairs with no connecting links are assigned the neutral dissimilarity 0.5.

The baseline and but methods make qualitative distinctions only (i.e., same-orientation, different- orientation, or unknown); for them, we define dis- similarity for same-orientation links as one minus the probability that such a classification link is cor- rect and dissimilarity for different-orientation links as the probability that such a classification is cor- rect. These probabilities are estimated from sep- arate training data. Note that for these prediction models, dissimilarities are identical for similarly clas- sifted links.

The log-linear model, on the other hand, offers an estimate of how good each prediction is, since it produces a value y between 0 and 1. We construct the model so that 1 corresponds to same-orientation, and define dissimilarity as one minus the produced value.

Same and different-orientation links between ad- jectives form a graph. To partition the graph nodes into subsets of the same orientation, we employ an iterative optimization procedure on each connected component, based on the exchange method, a non- hierarchical clustering algorithm (Spgth, 1985). We define an objective/unction ~ scoring each possible partition 7 ) of the adjectives into two subgroups C1 and C2 as i=1 x,y E Ci 177 Number of adjectives in test set (\[An\[) 2 730 3 516 4 369 5 236 Number of links in test set (\[L~\[) 2,568 2,159 1,742 1,238 Average number oflinksfor each adjective 7.04 8.37 9.44 10.49 Accuracy 78.08% 82.56% 87.26% 92.37% Ratio of average group frequencies 1.8699 1.9235 L3486 1.4040 Table 3: Evaluation of the adjective classification and labeling methods. where \[Cil stands for the cardinality of cluster i, and d(z, y) is the dissimilarity between adjectives z and y. We want to select the partition :Pmin that min- imizes ~, subject to the additional constraint that for each adjective z in a cluster C, 1 1 ICl- 1 d(=,y) < --IVl d(=, y) (1) where C is the complement of cluster C, i.e., the other member of the partition. This constraint, based on Rousseeuw's (1987) s=lhoue~es, helps cor- rect wrong cluster assignments.

To find Pmin, we first construct a random parti- tion of the adjectives, then locate the adjective that will most reduce the objective function if it is moved from its current cluster. We move this adjective and proceed with the next iteration until no movements can improve the objective function. At the final it- eration, the cluster assignment of any adjective that violates constraint (1) is changed. This is a steepest- descent hill-climbing method, and thus is guaran- teed to converge. However, it will in general find a local minimum rather than the global one; the prob- lem is NP-complete (Garey and $ohnson, 1979). We can arbitrarily increase the probability of finding the globally optimal solution by repeatedly running the algorithm with different starting partitions.

7 Labeling the Clusters as Positive or Negative The clustering algorithm separates each component of the graph into two groups of adjectives, but does not actually label the adjectives as positive or neg- ative. To accomplish that, we use a simple criterion that applies only to pairs or groups of words of oppo- site orientation. We have previously shown (Hatzi- vassiloglou and McKeown, 1995) that in oppositions of gradable adjectives where one member is semanti- cally unmarked, the unmarked member is the most frequent one about 81% of the time. This is relevant to our task because semantic markedness exhibits a strong correlation with orientation, the unmarked member almost always having positive orientation (Lehrer, 1985; Battistella, 1990).

We compute the average frequency of the words in each group, expecting the group with higher av- erage frequency to contain the positive terms. This aggregation operation increases the precision of the labeling dramatically since indicators for many pairs of words are combined, even when some of the words are incorrectly assigned to their group.

8 Results and Evaluation Since graph connectivity affects performance, we de- vised a method of selecting test sets that makes this dependence explicit. Note that the graph density is largely a function of corpus size, and thus can be increased by adding more data. Nevertheless, we report results on sparser test sets to show how our algorithm scales up.

We separated our sets of adjectives A (containing 1,336 adjectives) and conjunction- and morphology- based links L (containing 2,838 links) into training and testing groups by selecting, for several values of the parameter a, the maximal subset of A, An, which includes an adjective z if and only if there exist at least a links from L between x and other elements of An. This operation in turn defines a subset of L, L~, which includes all links between members of An. We train our log-linear model on L - La (excluding links between morphologically re- lated adjectives), compute predictions and dissimi- larities for the links in L~, and use these to classify and label the adjectives in An. c~ must be at least 2, since we need to leave some links for training.

Table 3 shows the results of these experiments for a = 2 to 5. Our method produced the correct clas- sification between 78% of the time on the sparsest test set up to more than 92% of the time when a higher number of links was present. Moreover, in all cases, the ratio of the two group frequencies correctly identified the positive subgroup. These results are extremely significant statistically (P-value less than 10 -16 ) when compared with the baseline method of randomly assigning orientations to adjectives, or the baseline method of always predicting the most fre- quent (for types) category (50.82% of the adjectives in our collection are classified as negative). Figure 2 shows some of the adjectives in set A4 and their clas- sifications. 178 Classified as positive: bold decisive disturbing generous good honest important large mature patient peaceful positive proud sound stimulating straightforward strange talented vigorous witty Classified as negative: ambiguous cautious cynical evasive harmful hypocritical inefficient insecure irrational irresponsible minor outspoken pleasant reckless risky selfish tedious unsupported vulnerable wasteful Figure 2: Sample retrieved classifications of adjec- tives from set A4. Correctly matched adjectives are shown in bold.

9 Graph Connectivity and Performance A strong point of our method is that decisions on individual words are aggregated to provide decisions on how to group words into a class and whether to label the class as positive or negative. Thus, the overall result can be much more accurate than the individual indicators. To verify this, we ran a series of simulation experiments. Each experiment mea- sures how our algorithm performs for a given level of precision P for identifying links and a given av- erage number of links k for each word. The goal is to show that even when P is low, given enough data (i.e., high k), we can achieve high performance for the grouping.

As we noted earlier, the corpus data is eventually represented in our system as a graph, with the nodes corresponding to adjectives and the links to predic- tions about whether the two connected adjectives have the same or different orientation. Thus the pa- rameter P in the simulation experiments measures how well we are able to predict each link indepen- dently of the others, and the parameter k measures the number of distinct adjectives each adjective ap- pears with in conjunctions. P therefore directly rep- resents the precision of the link classification algo- rithm, while k indirectly represents the corpus size.

To measure the effect of P and k (which are re- flected in the graph topology), we need to carry out a series of experiments where we systematically vary their values. For example, as k (or the amount of data) increases for a given level of precision P for in- dividual links, we want to measure how this affects overall accuracy of the resulting groups of nodes.

Thus, we need to construct a series of data sets, or graphs, which represent different scenarios cor- responding to a given combination of values of P and k. To do this, we construct a random graph by randomly assigning 50 nodes to the two possible orientations. Because we don't have frequency and morphology information on these abstract nodes, we cannot predict whether two nodes are of the same or different orientation. Rather, we randomly as- sign links between nodes so that, on average, each node participates in k links and 100 x P% of all links connect nodes of the same orientation. Then we consider these links as identified by the link pre- diction algorithm as connecting two nodes with the same orientation (so that 100 x P% of these pre- dictions will be correct). This is equivalent to the baseline link classification method, and provides a lower bound on the performance of the algorithm actually used in our system (Section 5).

Because of the lack of actual measurements such as frequency on these abstract nodes, we also de- couple the partitioning and labeling components of our system and score the partition found under the best matching conditions for the actual labels. Thus the simulation measures only how well the system separates positive from negative adjectives, not how well it determines which is which. However, in all the experiments performed on real corpus data (Sec- tion 8), the system correctly found the labels of the groups; any misclassifications came from misplacing an adjective in the wrong group. The whole proce- dure of constructing the random graph and finding and scoring the groups is repeated 200 times for any given combination of P and k, and the results are averaged, thus avoiding accidentally evaluating our system on a graph that is not truly representative of graphs with the given P and k.

We observe (Figure 3) that even for relatively low t9, our ability to correctly classify the nodes ap- proaches very high levels with a modest number of links. For P = 0.8, we need only about ? links per adjective for classification performance over 90% and only 12 links per adjective for performance over 99%. s The difference between low and high values of P is in the rate at which increasing data increases overall precision. These results are somewhat more optimistic than those obtained with real data (Sec- tion 8), a difference which is probably due to the uni- form distributional assumptions in the simulation.

Nevertheless, we expect the trends to be similar to the ones shown in Figure 3 and the results of Table 3 on real data support this expectation.

10 Conclusion and Future Work We have proposed and verified from corpus data con- straints on the semantic orientations of conjoined ad- jectives. We used these constraints to automatically construct a log-linear regression model, which, com- bined with supplementary morphology rules, pre- dicts whether two conjoined adjectives are of same 812 links per adjective for a set of n adjectives requires 6n conjunctions between the n adjectives in the corpus. 179 ~ 75' 70.

65.

60" 55- 50 ~ 0i2~4567891() 1'2 14 16 18 20 Avem0e neiohbo~ per node (a) P = 0.75 25 30 32.77 95.

90.

85.

~75' Average neighbors per node (b) P = 0.8 ,~ 70 65 6O 5,5 50 Average netghbo~ per node (c) P = 0.85 25 28.64 Figure 3: Simulation results obtained on 50 nodes. 10( 95 9O 85 P ~ 7o 55 Average neighb0m per node (d) P = 0.9 In each figure, the last z coordinate indicates the (average) maximum possible value of k for this P, and the dotted line shows the performance of a random classifier.

or different orientation with 82% accuracy. We then classified several sets of adjectives according to the links inferred in this way and labeled them as posi- tive or negative, obtaining 92% accuracy on the clas- sification task for reasonably dense graphs and 100% accuracy on the labeling task. Simulation experi- ments establish that very high levels of performance can be obtained with a modest number of links per word, even when the links themselves are not always correctly classified. As part of our clustering algorithm's output, a "goodness-of-fit" measure for each word is com- puted, based on Rousseeuw's (1987) silhouettes.

This measure ranks the words according to how well they fit in their group, and can thus be used as a quantitative measure of orientation, refining the binary positive-negative distinction. By restricting the labeling decisions to words with high values of this measure we can also increase the precision of our system, at the cost of sacrificing some coverage.

We are currently combining the output of this sys- tem with a semantic group finding system so that we can automatically identify antonyms from the cor- pus, without access to any semantic descriptions.

The learned semantic categorization of the adjec- tives can also be used in the reverse direction, to help in interpreting the conjunctions they partici- pate. We will also extend our analyses to nouns and verbs.

Acknowledgements This work was supported in part by the Office of Naval Research under grant N00014-95-1-0745, jointly by the Office of Naval Research and the Advanced Research Projects Agency under grant N00014-89-J-1782, by the National Science Founda- 180 tion under grant GER-90-24069, and by the New York State Center for Advanced Technology un- der contracts NYSSTF-CAT(95)-013 and NYSSTF- CAT(96)-013. We thank Ken Church and the AT&T Bell Laboratories for making the PARTS part-of-speech tagger available to us. We also thank Dragomir Radev, Eric Siegel, and Gregory Sean McKinley who provided models for the categoriza- tion of the adjectives in our training and testing sets as positive and negative.

References Jean-Claude Anscombre and Oswald Ducrot. 1983. L ' Argumentation dans la Langue.

Philosophic et Langage. Pierre Mardaga, Brussels, Belgium.

Edwin L. Battistella. 1990.

Markedness: The Eval- uative Superstructure of Language.

State Univer- sity of New York Press, Albany, New York.

Peter F. Brown, Vincent J. della Pietra, Peter V.

de Souza, Jennifer C. Lai, and Robert L. Mercer.

1992. Class-based n-gram models of natural lan- guage.

Computational Linguistics, 18(4):487-479.

Kenneth W. Church. 1988. A stochastic parts program and noun phrase parser for unrestricted text. In Proceedings of the Second Conference on Applied Natural Language Processing (ANLP-88), pages 136-143, Austin, Texas, February. Associa- tion for Computational Linguistics.

W. J. Conover. 1980.

Practical Nonparametric Statistics.

Wiley, New York, 2nd edition. Richard O. Duda and Peter E. Hart. 1973.

Pattern Classification and Scene Analysis.

Wiley, New York.

Michael Elhadad and Kathleen R. McKeown. 1990.

A procedure for generating connectives. In Pro- ceedings of COLING, Helsinki, Finland, July. Michael R. Garey and David S. Johnson. 1979. Computers and Intractability: A Guide to the Theory ofNP-Completeness.

W. H. Freeman, San Francisco, California. Vasileios Hatzivassiloglou and Kathleen R. McKe- own. 1993. Towards the automatic identification of adjectival scales: Clustering adjectives accord- ing to meaning. In Proceedings of the 31st Annual Meeting of the ACL, pages 172-182, Columbus, Ohio, June. Association for Computational Lin- guistics.

Vasileios I-Iatzivassiloglou and Kathleen R. MeKe- own. 1995. A quantitative evaluation of linguis- tic tests for the automatic prediction of semantic markedness. In Proceedings of the 83rd Annual Meeting of the ACL, pages 197-204, Boston, Mas- sachusetts, June. Association for Computational Linguistics.

John S. Justeson and Slava M. Katz. 1991. Co- occurrences of antonymous adjectives and their contexts.

Computational Linguistics, 17(1):1-19.

Adrienne Lehrer. 1974.

Semantic Fields and Lezical Structure.

North Holland, Amsterdam and New York.

Adrienne Lehrer. 1985. Markedness and antonymy. Journal of Linguistics, 31(3):397-429, September. John Lyons. 1977.

Semantics, volume 1. Cambridge University Press, Cambridge, England.

Peter McCullagh and John A. Nelder. 1989.

Gen- eralized Linear Models.

Chapman and Hall, Lon- don, 2nd edition.

George A. Miller, Richard Beckwith, Christiane Fell- baum, Derek Gross, and Katherine J. Miller.

1990. Introduction to WordNet: An on-line lexi- cal database.

International Journal of Lexicogra- phy (special issue), 3(4):235-312.

Fernando Pereira, Naftali Tishby, and Lillian Lee.

1993. Distributional clustering of English words.

In Proceedings of the 3Ist Annual Meeting of the ACL, pages 183-190, Columbus, Ohio, June. As- sociation for Computational Linguistics.

Peter J. Rousseeuw. 1987. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis.

Journal of Computational and Applied Mathematics, 20:53-65. Thomas J. Santner and Diane E. Duffy. 1989.

The Statistical Analysis of Discrete Data.

Springer- Verlag, New York.

Helmuth Sp~ith. 1985.

Cluster Dissection and Anal- ysis: Theory, FORTRAN Programs, Examples.

Ellis Horwo0d, Chiehester, West Sussex, England. 181