Articals

Thu m bs U p o r T hu m bs D ow n? S em an tic O rie n ta tio n A pplie d t o U nsu p erv is e d C la ss if ic a tio n o f R ev ie w s P ete r D . T urn ey I n stitu te f o r I n fo rm atio n T ech nolo gy N atio nal R ese a rc h C ouncil o f C an ad a O tta w a, O nta rio , C an ad a, K 1A 0 R 6 p [email protected] Abstr a ct T his p ap er p re se n ts a s im ple u nsu p erv is e d l e arn in g a lg o rit h m f o r c la ss if y in g r e v ie w s a s r e co m men ded (th um bs u p) o r n ot r e c - o m men ded (th um bs d ow n). T he cla ssif i- c ati o n of a re v ie w is pre d ic te d by th e a v era g e se m anti c orie n ta tio n of th e p hra se s in th e re v ie w th at c o nta in a d je c- t iv es o r a d verb s. A p hra se h as a p osit iv e s e m an tic o rie n ta ti o n w hen it h as g o od a s- s o cia ti o ns (e .g ., “su btle n uan ces” ) an d a n eg ativ e s e m an tic o rie n ta tio n w hen it h as b ad a sso cia tio ns ( e .g ., “ v ery c av alie r” ). I n t h is p ap er, th e se m an tic o rie n ta tio n o f a p hra se is c a lc u la te d a s th e m utu al in fo r- m atio n b etw een th e g iv en p hra se a n d th e w ord “ex celle n t” min us th e mutu al i n fo rm atio n b etw ee n th e g iv en p hra se a n d t h e w ord “ p oor” . A r e v ie w is c la ssif ie d a s r e co m men ded if t h e a v era g e s e m an tic o ri- e n ta tio n o f it s p hra se s is p o sit iv e. T he a l- g o rith m a ch ie v es a n a v era g e a ccu ra c y o f 7 4% w hen e v alu ate d o n 4 10 r e v ie w s f ro m E pin io ns, sa m ple d fro m fo ur dif f e re n t d om ain s (re v ie w s o f a u to m obile s, b an ks, m ovie s, a n d tr a v el d estin atio ns). T he a c - c u ra c y ra n ges fro m 8 4% fo r au to m obile r e v ie w s t o 6 6% f o r m ovie r e v ie w s. 1 In tr o d uctio n I f y o u a re c o nsid erin g a v ac ati o n in A ku m al, M ex - i c o , y o u m ig h t g o to a s e arc h e n gin e a n d e n te r th e q uery “A ku m al tr a v el re v ie w ”. H ow ev er, in th is c ase , G oogle 1 re p orts ab ou t 5,0 00 m atc h es. It w ould b e u se fu l to k n ow w hat fra c ti o n o f th ese m atc h es re c o m men d A ku m al as a tr a v el d esti n a- t io n. W it h a n a lg o rith m f o r a u to m atic all y c la ssif y - i n g a r e v ie w a s “ th um bs u p” o r “ th um bs d ow n”, it w ould be p ossib le fo r a se arc h en gin e to re p ort s u ch s u m mary s ta tis ti c s. T his is th e m oti v ati o n f o r t h e re se a rc h d esc rib ed h ere . O th er p ote n tia l a p pli - c ati o ns in clu de re c o gn iz in g “fla m es” (a b usiv e n ew sg ro up m essa g es) ( S pertu s, 1 997) a n d d ev elo p- i n g n ew k in ds o f s e arc h t o ols ( H ears t, 1 9 9 2). I n th is p ap er, I p re se n t a sim ple u nsu perv is e d l e arn in g a lg o rit h m fo r c la ss if y in g a r e v ie w a s r e c- o m men ded or no t re co m m en ded . T he alg o rit h m t a k es a w ritt e n re v ie w as in put an d pro du ces a c la ss if ic ati o n a s o utp u t. T he fir s t ste p is to u se a p art- o f-s p eech ta g ger to id en tif y p hra se s in th e in - p ut te x t th at c o nta in a d je ctiv es o r a d verb s (B ril l, 1 994). T he s e co nd s te p is to e sti m ate th e s e m antic o rie n ta tio n o f ea ch ex tr a cte d p hra se (H atz iv assi- l o glo u & M cK eo w n, 1 9 97). A p hra se h as a p osi- t iv e se m an tic orie n ta ti o n when it has go od a sso cia tio n s (e .g ., “ro m an tic am bie n ce” ) an d a n eg ativ e s e m an tic o rie n ta tio n w hen it h as b ad a s- s o cia ti o ns ( e .g ., “ h orr if ic e v en ts ” ). T he t h ir d s te p i s t o a ssig n t h e g iv en r e v ie w t o a c la ss, r e co m men ded o r n ot r e co m men ded , b ase d o n th e a v era g e s e m an - t ic o rie n ta ti o n o f t h e p hra se s e x tr a cte d f r o m th e r e - v ie w . If th e a v era g e is p ositi v e, th e p re d ic tio n is t h at th e re v ie w re co m men ds th e ite m it d is c u ss e s.

O th erw is e , th e p re d ic ti o n is th at th e ite m is n ot r e co m men ded . T he P M I-IR a lg o rith m is e m plo yed to e stim ate t h e se m an tic orie n ta ti o n of a phra se (T urn ey , 2 001). P M I-IR u se s P oin tw is e M utu al In fo rm atio n ( P M I) a n d In fo rm atio n R etr ie v al (IR ) to m easu re t h e s im ila rit y o f p air s o f w ord s o r p hra se s. T he s e - 1 h ttp ://w w w.g o ogle .c o m man tic o rie n ta tio n o f a g iv en p hra se is c alc u la te d b y c o m parin g its s im ila rit y to a p ositi v e re fe re n ce w ord ( “ ex celle n t” ) w it h it s s im ila rity to a n eg ati v e r e fe re n ce w ord (“ p oor” ). M ore sp ec if ic a ll y , a p hra se is a ss ig n ed a n um eric al r a ti n g b y ta k in g th e m utu al in fo rm atio n b etw een th e g iv en p hra se a n d t h e w ord “ex ce ll e n t” an d su btr a ctin g th e m utu al i n fo rm atio n b etw een t h e g iv en p hra se a n d t h e w ord “ p oor” . In a d ditio n to d ete rm in in g th e d ir e c ti o n o f t h e p h ra se ’s s e m an tic o rie n ta ti o n ( p osit iv e o r n eg a- t iv e, b ase d o n t h e s ig n o f t h e r a ti n g), t h is n um eric a l r a ti n g a ls o in dic ate s th e str e n gth o f th e se m an tic o rie n ta tio n (b ase d o n th e m ag n itu de o f th e n um - b er). T he a lg o rit h m i s p re se n te d i n S ec ti o n 2 .

H atz iv assil o glo u an d M cK eo w n (1 997) hav e a ls o d ev elo ped a n a lg o rit h m f o r p re d ic ti n g s e m an - t ic o rie n ta tio n. T heir a lg o rit h m p erfo rm s w ell, b ut i t is d esig n ed fo r is o la te d ad je cti v es, ra th er th an p hra se s co n ta in in g ad je ctiv es o r ad verb s. T his is d is c u sse d in m ore d eta il in S ectio n 3 , a lo ng w it h o th er r e la te d w ork .

T he c la ssif ic atio n a lg o rit h m i s e v alu ate d o n 4 10 r e v ie w s fro m E pin io ns2 , ra n dom ly sa m ple d fr o m f o ur dif fe re n t dom ain s: re v ie w s of au to m obile s, b an ks, m ovie s, a n d tr a v el d esti n atio n s. R ev ie w s a t E pin io ns are n ot w ritt e n b y p ro fe ssio nal w rite rs ; a n y pers o n w it h a W eb bro w se r ca n beco m e a m em ber o f E pin io ns a n d c o ntr ib ute a r e v ie w . E ach o f th ese 4 10 r e v ie w s w as w ritt e n b y a d if fe re n t a u - t h or. O f th ese re v ie w s, 1 70 a re n ot r e c o m men ded a n d th e re m ain in g 240 are re c o m men ded (th ese c la ss if ic ati o ns are g iv en b y th e au th ors ). A lw ay s g u essin g th e m ajo rit y c la ss w ould y ie ld a n a ccu - r a cy o f 5 9 % . T he alg o rit h m a ch ie v es a n a v era g e a ccu ra cy o f 7 4% , ra n gin g fro m 8 4% fo r a u to m o- b ile re v ie w s to 6 6% fo r m ovie re v ie w s. T he ex - p erim en ta l r e su lts a re g iv en i n S ecti o n 4 .

T he in te rp re ta tio n o f th e e x perim en ta l re su lt s , t h e li m ita ti o ns o f th is w ork , a n d fu tu re w ork a re d is c u sse d in S ectio n 5 . P ote n tia l a p plic a ti o ns are o utli n ed in S ecti o n 6 . F in all y , c o nclu sio ns a re p re - s e n te d i n S ectio n 7 .

2 Cla ssif y in g R ev ie w s T he f ir s t s te p o f th e a lg o rith m is to e x tr a ct p hra se s c o nta in in g ad je ctiv es or ad verb s. P ast w ork has d em onstr a te d th at a d je ctiv es a re g o od in dic a to rs o f s u bje c ti v e, e v alu ati v e se n te n ces (H atz iv assilo glo u 2 h ttp ://w w w.e p in io ns.c o m & W ie b e, 2 000 ; W ie b e, 2 00 0; W ie b e e t a l., 2 00 1).

H ow ev er, a lth ough a n is o la te d a d je ctiv e m ay in di- c ate s u bje c ti v ity , th ere m ay b e in su ffic ie n t c o nte x t t o dete rm in e se m an tic orie n ta tio n . F or ex am ple , t h e a d je c ti v e “ u npre d ic ta b le ” m ay h av e a n eg ativ e o rie n ta tio n in an au to m otiv e re v ie w , in a p h ra se s u ch a s “ u npre d ic ta b le s te erin g”, b ut it c o uld h av e a positi v e orie n ta tio n in a m ovie re v ie w , in a p hra se s u ch a s “ u np re d ic ta b le p lo t” . T here fo re th e a lg o rit h m ex tr a cts tw o co n se cu tiv e w ord s, w here o ne m em ber o f t h e p air i s a n a d je ctiv e o r a n a d verb a n d t h e s e c o nd p ro vid es c o n te x t.

F ir s t a p art- o f-s p eech ta g g er is a p pli e d to th e r e v ie w (B rill , 1 994 ).3 T w o c o nse c u ti v e w ord s a re e x tr a cte d fro m th e re v ie w if th eir ta g s c o nfo rm to a n y o f th e p atte rn s in T ab le 1 . T he J J ta g s in d ic a te a d je c ti v es, th e N N ta g s a re n ouns, th e R B ta g s a re a d verb s, a n d th e V B ta g s a re v erb s.4 T he se co nd p atte rn , fo r e x am ple , m ea n s th at tw o c o nse cu tiv e w ord s a re e x tr a c te d if th e fir s t w ord is a n a d verb a n d th e s e co nd w ord is a n a d je ctiv e, b ut th e th ir d w ord (w hic h is n o t e x tr a cte d ) can not b e a n oun.

N NP a n d N NPS ( s in gu la r a n d p lu ra l p ro per n ouns) a re a v o id ed , s o th at th e n am es o f th e ite m s in th e r e v ie w c an n ot i n flu en ce t h e c la ssif ic ati o n.

T ab le 1 . P atte rn s o f t a g s f o r e x tr a cti n g t w o-w ord p hra se s f r o m r e v ie w s. Fir s t W ord Seco nd W ord Thir d W ord ( N ot E xtr a cte d ) 1 . JJ NN o r N NS an yth in g 2 . R B , R B R, o r R B S J J no t N N n o r N NS 3 .

JJ JJ no t N N n o r N NS 4 .

N N o r N NS JJ no t N N n o r N NS 5 .

R B , R B R, o r R B S V B, V BD , V BN , o r V BG a n yth in g T he s e co n d s te p is to e sti m ate th e s e m an ti c o ri- e n ta tio n o f t h e e x tr a c te d p hra se s, u sin g t h e P M I-IR a lg o rit h m . T his a lg o rit h m u se s m utu al in fo rm atio n a s a m easu re o f th e str e n gth o f se m an ti c a sso cia - t io n b etw een tw o w ord s (C hurc h & H an ks, 1 989).

P M I-IR h as b een em pir ic a lly ev alu ate d u sin g 8 0 s y n onym t e st q uesti o ns f ro m t h e T est o f E nglis h a s a F ore ig n L an gu ag e ( T O EFL ), o bta in in g a s c o re o f 7 4% (T urn ey , 2 001). F or c o m paris o n, L ate n t S e- m an tic A naly sis ( L SA ), a n o th er s ta ti s ti c a l m easu re o f w ord a sso cia ti o n, a tta in s a s c o re o f 6 4 % o n th e 3 h ttp ://w w w.c s.j h u.e d u /~ brill/R BT1_14.t a r.Z 4 S ee S an to rin i ( 1 99 5) f o r a c o m ple te d esc rip tio n o f t h e t a g s. sa m e 8 0 T O EFL q uesti o ns (L an dau er & D um ais , 1 997). T he P oin tw is e M utu al In fo rm atio n (P M I) b e- t w een tw o w ord s, w ord 1 a n d w ord 2, is d efin ed a s f o ll o w s ( C hurc h & H an ks, 1 989):

p (w ord 1 & w ord 2) P M I(w ord 1, w ord 2) = l o g 2 p (w ord 1) p (w ord 2) ( 1 ) H ere , p(w ord 1 & w ord 2) is th e pro bab il it y th at w ord 1 a n d w ord 2 c o -o ccu r. I f th e w ord s a re s ta ti s ti- c all y in dep en den t, th en th e pro bab ili ty th at th ey c o -o ccu r is giv en by th e pro duct p(w ord 1) p (w ord 2) . T he r a tio b etw een p (w ord 1 & w ord 2) a n d p (w ord 1) p (w ord 2) is th u s a m easu re o f th e d eg re e o f sta ti s tic al d ep en d en ce b etw een th e w ord s. T he l o g o f th is ra tio is th e a m ount o f in fo rm ati o n th at w e a cq uir e a b out th e p re se n ce o f o ne o f th e w ord s w hen w e o bse rv e t h e o th er. T he Sem an tic O rie n ta ti o n (S O ) of a phra se , p hra se , i s c alc u la te d h ere a s f o llo w s:

S O (p hra se ) = P M I(p hra se , “ e x ce lle n t” ) - P M I( p hra se , “ p oor” ) ( 2 ) T he re fe re n ce w ord s “ ex celle n t” a n d “ p oo r” w ere c h ose n b ec au se , in th e fiv e s ta r re v ie w ra tin g s y s- t e m , it is c o m mon to d efin e o ne s ta r a s “ p oo r” a n d f iv e sta rs as “ex ce ll e n t” . SO is posit iv e w hen p hra se i s m ore s tr o ngly a sso cia te d w ith “ e x celle n t” a n d n eg ati v e w hen p hra se is m ore s tr o n gly a sso ci- a te d w ith “ p oor” . P M I-IR e stim ate s P M I b y is su in g q uerie s to a s e arc h e n gin e ( h en ce th e I R in P M I-IR ) a n d n otin g t h e n um ber o f h its ( m atc h in g d ocu m en ts ). T he f o l- l o w in g ex perim en ts use th e A lta V is ta A dvan ced S earc h e n gin e5 , w hic h in dex es a p pro x im ate ly 3 50 m illio n w eb p ag es (c o untin g o nly th ose p ag es th at a re in E nglis h ). I c h ose A lt a V is ta b ecau se it h as a N EA R opera to r. T he A lta V is ta N EA R opera to r c o nstr a in s th e s e a rc h t o d ocu m en ts th at c o nta in t h e w ord s w ith in te n w ord s o f o ne a n oth er, in e it h er o rd er. P re v io us w ork h as sh ow n th at N EA R p er- f o rm s bett e r th an AND when measu rin g th e s tr e n gth of se m an tic asso cia tio n betw ee n w ord s ( T urn ey , 2 001).

L et h its (q uery ) b e th e n um ber o f h its re tu rn ed , g iv en th e q uery q uery . T he fo ll o w in g e stim ate o f S O c an b e d eriv ed f ro m e q uatio ns ( 1 ) a n d ( 2 ) w it h 5 h ttp ://w w w.a lta v is ta .c o m /s ite s/s e arc h /a d v s o m e min or alg eb ra ic man ip ula tio n, if co - o ccu rr e n ce i s i n te rp re te d a s N EA R:

S O (p hra se ) = h its (p hra se N EA R “ ex celle n t” ) h its (“ p o or” ) l o g 2 h its (p hra se N EA R “ p oor” ) h its (“ ex celle n t” ) ( 3 ) E quati o n ( 3 ) i s a lo g-o dds r a tio ( A gre sti, 1 996).

T o a v o id d iv is io n b y z e ro , I a d ded 0 .0 1 to th e h it s .

I als o sk ip ped phra se when both hits (p hra se N EA R “ex celle n t” ) an d hit s (p hra se NEA R “ p oor” ) were (s im ulta n eo usly ) le ss th an fo ur.

T hese n um bers (0 .0 1 a n d 4 ) w ere a rb it r a ril y c h o- s e n . T o e lim in ate a n y p ossib le in flu en ce fro m th e t e stin g d ata , I a d ded “ A ND (N OT h ost:e p in io ns)” t o e v ery q uery , w hic h te ll s A lta V is ta n ot t o i n clu de t h e E pin io ns W eb s it e i n i t s s e arc h es.

T he t h ir d s te p i s t o c alc u la te t h e a v era g e s e m an - t ic o rie n ta tio n o f th e p hra se s in th e g iv en re v ie w a n d c la ssif y th e r e v ie w a s r e co m men ded if th e a v - e ra g e i s p ositi v e a n d o th erw is e n ot r e co m m en ded . T ab le 2 s h ow s a n e x am ple fo r a r e co m men ded r e v ie w a n d T ab le 3 sh ow s a n e x am ple fo r a n ot r e co m men ded re v ie w . B oth are re v ie w s of th e B an k o f A m eric a. B oth a re in th e c o ll e cti o n o f 4 10 r e v ie w s fr o m E pin io ns th at a re u se d in th e e x peri- m en ts i n S ectio n 4 .

T ab le 2 . A n e x am ple o f t h e p ro cessin g o f a r e v ie w t h at t h e a u th o r h as c la ss if ie d a s r e c o m men ded .6 E xtr a cte d P hra se Part- o f- S peech T ag s S em an tic O rie n ta tio n o nlin e e x p erie n ce JJ N N 2 .2 53 lo w f e es JJ N NS 0 .3 33 l o cal b ra n ch JJ N N 0 .4 21 s m all p art JJ N N 0 .0 53 o nlin e s e rv ic e JJ N N 2 .7 80 p rin ta b le v ers io n JJ N N -0 .7 05 d ir e ct d ep osit JJ N N 1 .2 88 w ell o th er RB J J 0 .2 37 i n co nven ie n tly l o cate d R B V BN -1 .5 41 o th er b an k JJ N N -0 .8 50 t r u e s e rv ic e JJ N N -0 .7 32 A vera g e S em an tic O rie n ta tio n 0 .3 22 6 T he s e m an tic o rie n ta tio n i n t h e f o llo w in g t a b le s i s c alc u la te d u sin g t h e n atu ra l l o garith m ( b ase e ), r a th er t h an b ase 2 . T he n atu ra l l o g i s m ore c o m mon i n t h e l ite ra tu re o n l o g-o d ds r a tio .

S in ce a ll l o gs a re e q uiv ale n t u p t o a c o nsta n t f a cto r, i t m ak es n o d if fe re n ce f o r t h e a lg o rith m . Tab le 3 . A n e x am ple o f t h e p ro cessin g o f a r e v ie w t h at t h e a u th o r h as c la ss if ie d a s n ot r e co m men ded .

E xtr a cte d P hra se Part- o f- S peech T ag s S em an tic O rie n ta tio n little d if fe re n ce JJ N N -1 .6 15 cle v er t r ic k s JJ N NS -0 .0 40 p ro gra m s s u ch NNS J J 0 .1 17 p ossib le m om en t JJ N N -0 .6 68 u neth ic al p ra ctic es JJ N NS -8 .4 84 l o w f u nd s JJ N NS -6 .8 43 o ld m an JJ N N -2 .5 66 o th er p ro ble m s JJ N NS -2 .7 48 p ro bab ly w ond erin g RB V BG -1 .8 30 v ir tu al m ono poly JJ N N -2 .0 50 o th er b an k JJ N N -0 .8 50 e x tr a d ay JJ N N -0 .2 86 d ir e ct d ep osits JJ N NS 5 .7 71 o nlin e w eb JJ N N 1 .9 36 c o ol t h in g JJ N N 0 .3 95 v ery h an d y RB J J 1 .3 49 l e sse r e v il RB R J J -2 .2 88 A vera g e S em an tic O rie n ta tio n -1 .2 18 3 Rela te d W ork T his w ork is m ost clo se ly re la te d to H atz iv assi- l o glo u a n d M cK eo w n’s ( 1 9 97) w ork o n p re d ic ti n g t h e se m an tic o rie n ta ti o n o f a d je c ti v es. T hey n ote t h at th ere a re lin gu is ti c c o nstr a in ts o n th e s e m an tic o rie n ta tio ns o f ad je c ti v es in co nju ncti o ns. A s an e x am ple , th ey pre se n t th e fo llo w in g th re e se n - t e n ce s ( H atz iv assilo glo u & M cK eo w n, 1 9 97):

1 . The ta x pro posa l w as sim ple an d w ell - r e ceiv ed b y t h e p ub li c .

2 . The ta x p ro posa l w as sim plis ti c b u t w ell- r e ceiv ed b y t h e p ub li c .

3 . (* ) T he ta x p ro posa l w as sim plis ti c an d w ell- r e ceiv ed b y t h e p ublic .

T he th ir d se n te n ce is in co rre c t, becau se w e use “ an d” w ith a d je cti v es th at h av e th e s a m e s e m an tic o rie n ta tio n (“ sim ple ” a n d “ w ell- r e ceiv ed ” a re b oth p osit iv e), but w e use “b ut” w it h ad je cti v es th at h av e d if fe re n t se m an tic o rie n ta ti o ns (“ sim plis ti c ” i s n eg ati v e). H atz iv assil o glo u an d M cK eo w n (1 9 97) u se a f o ur-s te p s u perv is e d le arn in g a lg o rith m to in fe r th e s e m an tic o rie n ta ti o n o f a d je cti v es fr o m c o nstr a in ts o n c o nju nctio n s: 1 . All c o nju n cti o ns o f a d je ctiv es a re e x tr a cte d f ro m t h e g iv en c o rp u s.

2 . A s u perv is e d le a rn in g a lg o rit h m c o m bin es m ultip le so u rc e s o f e v id en ce to la b el p air s o f a d je c ti v es a s h av in g th e s a m e s e m anti c o rie n ta - t io n o r d iff e re n t s e m antic o rie n ta tio n s. T he re - s u lt is a g ra p h w here th e n odes a re a d je ctiv es a n d li n ks in d ic a te sa m en ess or dif f e re n ce of s e m an tic o rie n ta ti o n. 3 . A c lu ste rin g a lg o rith m p ro cesse s th e g ra p h s tr u ctu re to p ro du ce tw o su bse ts o f a d je c ti v es, s u ch th at lin ks acro ss th e tw o su b se ts are m ain ly d if fe re n t- o rie n ta ti o n li n ks, a n d li n ks in - s id e a s u bse t a re m ain ly s a m e-o rie n ta ti o n l in ks.

4 . Sin ce it is k n ow n th at p ositi v e a d je c ti v es t e n d to b e u se d m ore fre q u en tl y th an n eg ati v e a d je c ti v es, th e c lu ste r w ith th e h ig h er a v era g e f re q u en cy is cla ssif ie d as hav in g po siti v e se - m an tic o rie n ta ti o n.

T his a lg o rit h m c la ssif ie s a d je ctiv es w ith a cc u ra c ie s r a n gin g fro m 78% to 92% , dep en din g on th e a m ount o f tr a in in g d ata th at is a v ail a b le . T he a lg o - r it h m c an g o b ey o nd a b in ary p ositiv e-n eg ativ e d is - t in cti o n, b eca u se th e c lu ste rin g a lg o rit h m (s te p 3 a b ove) can pro duce a “g o odness -o f-f it” m easu re t h at in dic ate s h ow w ell a n a d je cti v e fits in its a s- s ig n ed c lu ste r. A lth ough th ey d o n ot c o nsid er th e ta sk o f c la s- s if y in g re v ie w s, it s e em s th eir a lg o rith m c o uld b e p lu gged in to th e c la ss if ic a ti o n a lg o rit h m p re se n te d i n S ectio n 2 , w here it w ou ld re p la ce P M I-IR a n d e q uati o n (3 ) in th e s e co nd s te p . H ow ev er, P M I-IR i s c o ncep tu all y s im ple r, e asie r to im ple m en t, a n d i t c an h an dle p hra se s a n d a d verb s, in a d diti o n to is o - l a te d a d je ctiv es.

A s f a r a s I k n ow , th e o nly p rio r p ublis h ed w ork o n th e ta sk o f c la ssif y in g re v ie w s a s th um bs u p o r d ow n is T on g’s (2 001) s y ste m fo r g en era tin g s e n - t im en t ti m eli n es. T his s y ste m tr a ck s o nli n e d is c u s- s io ns ab o ut m ovie s an d dis p la y s a plo t of th e n um ber o f p ositi v e se n ti m en t a n d n eg ativ e se n ti - m en t m essa g es o ver ti m e. M essa g es a re c la ss if ie d b y lo okin g fo r sp ecif ic ph ra se s th at in dic ate th e s e n ti m en t o f th e au th o r to w ard s th e m ovie (e .g ., “ g re at actin g”, “w onderf u l vis u als ” , “te rrib le s c o re ”, “u nev en ed it in g”). E ach p hra se m ust b e m an ually a d ded to a s p ecia l le x ic o n a n d m an ually t a g ged a s i n dic a ti n g p ositi v e o r n eg ativ e s e n tim en t.

T he l e x ic o n i s s p ec if ic t o t h e d om ain ( e .g ., m ovie s) an d m ust b e b uilt a n ew f o r e ach n ew d om ain . T he c o m pan y M in dfu le y e7 o ffe rs a te c h nolo gy calle d L ex an t™ th at ap pears sim ila r to T ong’s (2 001) s y ste m . O th er r e la te d w ork is c o nce rn ed w it h d ete rm in - i n g s u bje ctiv it y (H atz iv assil o glo u & W ie b e, 2 000; W ie b e, 2 000; W ie b e e t a l., 2 001). T he ta sk is to d is tin gu is h se n te n ce s th at pre se n t opin io ns an d e v alu ati o ns f r o m s e n te n ces th at o bje c ti v ely p re se n t f a ctu al in fo rm atio n (W ie b e, 2000). W ie b e et al.

( 2 001) lis t a v arie ty o f p o te n tia l ap plic a ti o ns fo r a u to m ate d su bje cti v ity ta g gin g, su ch a s re co gn iz - i n g “fla m es” (S pertu s, 199 7), cla ssif y in g em ail, r e co gn iz in g sp eak er ro le in ra d io b ro ad casts , a n d m in in g re v ie w s. In se v era l o f th ese ap p li c atio n s, t h e f ir s t s te p is to r e c o gn iz e th at th e te x t is s u bje c- t iv e a n d th en th e n atu ra l se co nd ste p is to d ete r- m in e th e se m an ti c orie n ta tio n of th e su bje ctiv e t e x t. F or e x am ple , a fla m e d ete cto r c an not m ere ly d ete c t th at a n ew sg ro up m essa g e is su bje cti v e, it m ust f u rth er d ete ct th at th e m essa g e h as a n eg ati v e s e m an tic o rie n ta tio n; o th erw is e a m essa g e o f p ra is e c o uld b e c la ss if ie d a s a f la m e. H ears t (1 992) obse rv es th at m ost se arc h en - g in es f o cu s o n f in din g d ocu m en ts o n a g iv en t o pic , b ut d o n ot a ll o w t h e u se r t o s p ecif y t h e d ir e ctio n al- i ty o f th e d o cu m en ts ( e .g ., is th e a u th or i n f a v o r o f, n eu tr a l, o r o ppose d to th e e v en t o r it e m d is c u sse d i n th e d ocu m en t? ). T he d ir e ctio nali ty o f a d ocu - m en t is dete rm in ed by it s deep arg u m en ta ti v e s tr u ctu re , ra th er th an a s h allo w a n aly sis o f it s a d - j e ctiv es. S en te n ce s a re in te rp re te d m eta p horic ally i n te rm s o f a g en ts e x erti n g fo rc e, re sis ti n g fo rc e, a n d overc o m in g re sis ta n ce . It se em s li k ely th at t h ere c o uld b e s o m e b en efit to c o m bin in g s h all o w a n d d eep a n aly sis o f t h e t e x t. 4 Exp erim en ts T ab le 4 d esc rib es th e 4 1 0 re v ie w s fr o m E pin io ns t h at w ere u se d in th e e x perim en ts . 1 70 (4 1% ) o f t h e re v ie w s a re n ot r e co m m en ded a n d th e re m ain - i n g 2 40 ( 5 9% ) a re r e co m men ded . A lw ay s g u essin g t h e m ajo rity c la ss w ould y ie ld a n a c cu ra cy o f 5 9% .

T he th ir d co lu m n sh ow s th e av era g e n um ber o f p hra se s t h at w ere e x tr a c te d f ro m t h e r e v ie w s.

T ab le 5 s h ow s th e e x p erim en ta l re su lts . E xce p t f o r th e tr a v el re v ie w s, th ere is su rp ris in gly littl e v aria ti o n in th e a c cu ra cy w ith in a d om ain . I n a d di- 7 h ttp ://w w w.m in dfu le y e.c o m / t io n to r e c o m men ded a n d n o t r e co m men ded , E pin - i o ns r e v ie w s a re c la ssif ie d u sin g t h e f iv e s ta r r a ti n g s y ste m . T he t h ir d c o lu m n s h ow s t h e c o rre la tio n b e- t w een th e av era g e se m an tic orie n ta ti o n an d th e n um ber o f sta rs a ssig n ed b y th e a u th or o f th e re - v ie w . T he re su lt s sh o w a str o n g p ositi v e c o rre la - t io n b etw een th e a v era g e s e m an tic o rie n ta ti o n a n d t h e a u th or’s r a ti n g o ut o f f iv e s ta rs .

T ab le 4 . A s u m mary o f t h e c o rp us o f r e v ie w s.

D om ain o f R ev ie w Num ber o f R ev ie w s A vera g e P hra se s p er R ev ie w A uto m obile s 7 5 20.8 7 H ond a A cco rd 3 7 1 8 .7 8 V olk sw ag en J e tta 3 8 2 2 .8 9 B an ks 120 18.5 2 B an k o f A m eric a 6 0 2 2 .0 2 W ash in gto n M utu al 6 0 1 5 .0 2 M ovie s 120 29.1 3 T he M atr ix 6 0 1 9 .0 8 P earl H arb o r 6 0 3 9 .1 7 T ra v el D estin atio ns 9 5 35.5 4 C an cu n 5 9 3 0 .0 2 P uerto V alla rta 3 6 4 4 .5 8 A ll 410 26.0 0 T ab le 5 . T he a ccu ra cy o f t h e c la ssif ic atio n a n d t h e c o r- r e la tio n o f t h e s e m an tic o rie n ta tio n w it h t h e s ta r r a tin g.

D om ain o f R ev ie w Acc u ra c y Corre la tio n A uto m obile s 84.0 0 % 0.4 618 H ond a A cco rd 8 3.7 8 % 0 .2 721 V olk sw ag en J e tta 8 4.2 1 % 0 .6 299 B an ks 80.0 0 % 0.6 167 B an k o f A m eric a 7 8.3 3 % 0 .6 423 W ash in gto n M utu al 8 1.6 7 % 0 .5 896 M ovie s 65.8 3 % 0.3 608 T he M atr ix 6 6.6 7 % 0 .3 811 P earl H arb o r 6 5.0 0 % 0 .2 907 T ra v el D estin atio ns 70.5 3 % 0.4 155 C an cu n 6 4.4 1 % 0 .4 194 P uerto V alla rta 8 0.5 6 % 0 .1 447 A ll 74.3 9 % 0.5 174 5 Dis c u ssio n o f R esu lt s A n atu ra l q uestio n, g iv en th e p re ced in g re su lts , is w hat m ak es m ovie r e v ie w s h ard to c la ssif y ? T ab le 6 s h ow s t h at c la ssif ic atio n b y t h e a v era g e S O t e n ds t o e rr o n th e s id e o f g u essin g th at a re v ie w is n o t r e co m men ded , w hen it is actu all y re co m men ded .

T his su ggests th e h yp oth esis th at a go o d m ovie w ill o fte n c o nta in u nple asa n t s c e n es ( e .g ., v io le n ce , d eath , m ay h em ), an d a re co m men ded m ovie re - vie w m ay th us h av e its a v era g e s e m an tic o rie n ta - t io n r e d u ced if it c o nta in s d esc rip tio n s o f th ese u n- p le asa n t sc e n es. H ow ev er, if w e ad d a co nsta n t v alu e to th e a v era g e S O o f th e m ovie re v ie w s, to c o m pen sa te fo r th is bia s, th e accu ra cy does not i m pro ve. T his su ggests th at, ju st as posit iv e re - v ie w s m en tio n u nple asa n t th in gs, so n eg ativ e re - v ie w s o fte n m en tio n p le a sa n t s c e n es.

T ab le 6 . T he c o nfu sio n m atr ix f o r m ovie c la ssif ic atio ns.

Auth o r’s C la ssif ic atio n A vera g e S em an tic O rie n ta tio n T hu m bs U p T hu m bs D ow n S um Positiv e 2 8.3 3 % 1 2.5 0 % 4 0.8 3 % Neg ati v e 2 1.6 7 % 3 7.5 0 % 5 9.1 7 % Sum 5 0.0 0 % 5 0.0 0 % 100.0 0 % Tab le 7 s h ow s s o m e e x am ple s th at l e n d s u pport t o th is h yp oth esis . F or e x am ple , th e p hra se “ m ore e v il” d oes h av e n eg ativ e c o nnota tio ns, th us a n S O o f -4 .3 84 is a p pro pria te , b ut a n e v il c h ara cte r d oes n ot m ak e a b ad m ovie . T he d if f ic u lt y w ith m ovie r e v ie w s i s t h at t h ere a re t w o a sp ects t o a m ovie , t h e e v en ts a n d a c to rs i n th e m ovie ( th e e le m en ts o f th e m ovie ), an d th e sty le an d art of th e m ovie (th e m ovie a s a g esta lt; a u n if ie d w hole ). T his is lik ely a ls o th e e x pla n ati o n fo r th e lo w er a ccu ra cy o f th e C an cu n re v ie w s: g o od b eac h es d o n ot n ecessa ril y a d d u p t o a g o od v acati o n. O n t h e o th er h an d, g o od a u to m otiv e parts usu ally do ad d up to a go od a u to m obile a n d g o od b an kin g s e rv ic es a d d u p to a g o od b an k. I t i s n ot c le ar h o w t o a d dre ss t h is i s su e.

F utu re w ork m ig h t lo ok a t w heth er it is p oss ib le to t a g s e n te n ces a s d is c u ss in g e le m en ts o r w hole s.

A noth er a re a fo r fu tu re w ork is to e m pir ic ally c o m pare P M I-IR a n d th e a lg o rith m o f H atz iv assi- l o glo u a n d M cK eo w n ( 1 997 ). A lt h ough th eir a lg o - r it h m d oes n ot r e ad il y e x te n d to t w o-w ord p hra se s, I h av e n ot y et d em onstr a te d th at tw o-w ord p hra se s a re n ece ssa ry f o r a c cu ra te c la ssif ic a ti o n o f r e v ie w s.

O n t h e o th er h an d, it w ould b e i n te re sti n g t o e v alu - a te P M I-IR o n t h e c o ll e ctio n o f 1 ,3 36 h an d -la b ele d a d je c ti v es th at w ere use d in th e ex perim en ts of H atz iv assil o glo u a n d M cK eo w n (1 997). A re la te d q uesti o n f o r f u tu re w ork is th e re la tio nsh ip o f a c- c u ra c y o f th e e stim atio n o f s e m an tic o rie n ta ti o n a t t h e le v el o f in div id ual p hra se s to a ccu ra c y o f re - v ie w c la ssif ic atio n. S in ce th e re v ie w c la ssif ic atio n i s b ase d o n a n a v era g e, it m ig h t b e q u it e re sis ta n t t o n ois e in th e S O e sti m ate f o r in div id ual p hra se s. B ut it is p oss ib le th at a b ette r S O e sti m ato r c o uld p ro duce s ig n if ic an tly b ette r c la ssif ic ati o ns.

T ab le 7 . S am ple p hra se s f r o m m is c la ss if ie d r e v ie w s.

M ovie : The M atr ix A uth o r’s R ati n g : re co m men ded ( 5 s ta rs ) A vera g e S O : -0 .2 19 ( n o t r e co m men ded ) S am ple P hra se : more e v il [ R B R J J] S O o f S am ple P hra se : - 4 .3 84 C onte x t o f S am ple P hra se : T he s lo w , m eth o dic al w ay h e s p oke. I l o ved i t! I t m ad e h im s e e m m ore a rro gan t a n d e v en m ore e v il.

M ovie : Pearl H arb o r A uth o r’s R ati n g : re co m men ded ( 5 s ta rs ) A vera g e S O : -0 .3 78 ( n o t r e co m men ded ) S am ple P hra se : sic k f e eli n g [ J J N N] S O o f S am ple P hra se : - 8 .3 08 C onte x t o f S am ple P hra se : D urin g t h is p erio d I h ad a s ic k f e elin g, k no w in g w hat w as c o m in g, k no w in g w hat w as p art o f o ur h is to ry .

M ovie : The M atr ix A uth o r’s R ati n g : not r e co m men ded ( 2 s ta rs ) A vera g e S O : 0.1 77 ( r e co m men ded ) S am ple P hra se : very t a le n te d [ R B J J] S O o f S am ple P hra se : 1 .9 92 C onte x t o f S am ple P hra se : W ell a s u su al K ea n u R eev es i s n o th in g s p ecia l, b ut s u rp ris - i n g ly , t h e v ery t a le n te d L au r- e n ce F is h b ourn e i s n o t s o g o od e ith er, I w as s u rp ris e d .

M ovie : Pearl H arb o r A uth o r’s R ati n g : not r e co m men ded ( 3 s ta rs ) A vera g e S O : 0.0 15 ( r e co m men ded ) S am ple P hra se : blu e s k ie s [ J J N NS] S O o f S am ple P hra se : 1 .2 63 C onte x t o f S am ple P hra se : A nyo ne w ho s a w t h e t r a ile r i n t h e t h ea te r o ver t h e c o urs e o f t h e l a st y ear w ill n ev er f o rg et t h e i m ag es o f J a p an ese w ar p la n es s w oopin g o ut o f t h e b lu e s k ie s, f ly in g p ast t h e c h ild re n p la y in g b ase b all, o r t h e t r u ly r e m ark ab le s h o t o f a b om b f a lli n g f r o m a n e n em y p la n e i n to t h e d eck o f t h e U SS A riz o na.

E quati o n (3 ) is a v ery sim ple e stim ato r o f se - m an tic o rie n ta ti o n. It m ig h t b en efit fr o m m ore s o - p his tic ate d s ta ti s tic al a n aly sis ( A gre sti , 1 9 96). O ne possib il it y is to a p ply a s ta tis tic al s ig n if ic an ce te st t o e ach e sti m ate d S O . T here is a la rg e sta tis tic al l it e ra tu re o n th e lo g-o dds ra ti o , w hic h m ig h t le ad t o i m pro ved r e su lt s o n t h is t a sk .

T his p ap er h as fo cu se d o n u nsu perv is e d c la ssi- f ic a ti o n, b ut a v era g e s e m an tic o rie n ta tio n c o uld b e s u pple m en te d by oth er fe a tu re s, in a su perv is e d c la ss if ic ati o n sy ste m . T he o th er fe atu re s c o u ld b e b ase d on th e pre se n ce or ab se n ce of sp ecif ic w ord s, as is co m mon in m ost te x t cla ssif ic atio n w ork . T his c o uld y ie ld h ig h er a ccu ra cie s, b ut th e i n te n t h ere w as to stu dy th is o ne fe atu re in is o la - t io n, to s im plif y th e a n aly sis , b efo re c o m bin in g it w ith o th er f e atu re s.

T ab le 5 sh ow s a h ig h c o rr e la ti o n b etw een th e a v era g e s e m an tic o rie n ta tio n a n d th e s ta r ra tin g o f a re v ie w . I p la n to e x perim en t w ith o rd in al c la ss i- f ic a ti o n o f re v ie w s in th e fiv e sta r ra tin g sy ste m , u sin g th e a lg o rit h m o f F ra n k a n d H all ( 2 001). F or o rd in al c la ssif ic atio n, t h e a v era g e s e m an tic o rie n ta - t io n w ou ld b e s u pp le m en te d w ith o th er fe atu re s in a s u perv is e d c la ssif ic atio n s y ste m .

A lim ita ti o n o f P M I-IR is th e ti m e re q uir e d to s e n d q uerie s to A lt a V is ta . In sp ecti o n o f E quatio n ( 3 ) s h ow s th at it ta k es f o u r q uerie s to c alc u la te th e s e m an tic orie n ta tio n of a phra se . How ev er, I c ach ed a ll q uery r e su lts , a n d s in ce th ere is n o n eed t o r e calc u la te h its (“ p oor” ) a n d h its (“ ex celle n t” ) f o r e v ery p hra se , e a ch p hra se re q uir e s a n a v era g e o f s lig h tl y le ss th an tw o querie s. A s a co urte sy to A lta V is ta , I u se d a f iv e s e co nd d ela y b etw een q ue- r ie s.8 T he 4 10 re v ie w s y ie ld ed 1 0,6 58 p h ra se s, s o t h e to ta l ti m e re q uir e d to p ro cess th e c o rp us w as r o ugh ly 1 06,5 80 s e co nds, o r a b out 3 0 h ours .

T his m ig h t a p pear to b e a s ig n if ic an t lim ita tio n, b ut ex tr a p ola ti o n of cu rre n t tr e n ds in co m pute r m em ory c ap acit y s u ggests th at, in a b ou t te n y ears , t h e a v era g e d esk to p c o m pute r w il l b e a b le t o e asily s to re an d se arc h A lt a V is ta ’s 350 m illio n W eb p ag es. T his w ill r e d u ce th e p ro cessin g tim e to le ss t h an o n e s e co nd p er r e v ie w .

6 Applic a tio n s T here are a varie ty of pote n ti a l ap plic atio ns fo r a u to m ate d re v ie w ra ti n g. A s m en tio ned in th e in - 8 T his l in e o f r e se arc h d ep en d s o n t h e g o od w ill o f t h e m ajo r s e arc h e n gin es. F or a d is c u ssio n o f t h e e th ic s o f W eb r o bots , s e e h ttp ://w w w.r o bots tx t.o rg /w c/r o bots .h tm l. F or q u ery r o bots , t h e p ro po se d e x te n ded s ta n dard f o r r o b ot e x clu sio n w ould b e u se fu l. S ee h ttp ://w w w.c o nm an .o rg /p eo ple /s p c/r o bo ts 2 .h tm l. t r o ducti o n, o ne a p pli c ati o n is to p ro vid e s u m mary s ta tis ti c s fo r se a rc h en gin es. G iv en th e query “ A ku m al tr a v el re v ie w ”, a s e arc h e n gin e c o uld re - p ort, “T here are 5,0 00 hit s , of w hic h 80% are t h um bs u p a n d 2 0% a re t h um bs d ow n.” T he s e arc h r e su lts c o u ld b e s o rte d b y a v era g e s e m an tic o rie n - t a ti o n, s o t h at t h e u se r c o uld e asily s a m ple t h e m ost e x tr e m e re v ie w s. S im ila rly , a s e arc h e n gin e c o uld a llo w th e u se r to s p ecif y th e to p ic a n d th e r a tin g o f t h e d esir e d r e v ie w s ( H ears t, 1 992). P re li m in ary e x perim en ts in d ic ate th at s e m an ti c o rie n ta tio n is a ls o u se fu l fo r s u m mariz a tio n o f re - v ie w s. A p osit iv e re v ie w c o uld b e s u m mariz e d b y p ic k in g o ut th e s e n te n ce w ith th e h ig h est p ositi v e s e m an tic o rie n ta tio n a n d a n eg ativ e re v ie w c o uld b e s u m mariz e d b y e x tr a ctin g th e s e n te n ce w ith th e l o w est n eg ati v e s e m an tic o rie n ta ti o n. E pin io ns a sk s it s re v ie w ers to p ro vid e a sh ort d esc rip tio n o f p ro s a n d c o ns f o r th e r e v ie w ed i te m .

A pro /c o n su m mariz e r co uld be ev alu ate d by m easu rin g th e o verla p b etw een th e r e v ie w er’s p ro s a n d c o ns a n d th e p hra se s in th e re v ie w th at h av e t h e m ost e x tr e m e s e m an tic o rie n ta tio n.

A noth er pote n ti a l ap plic ati o n is fil te rin g “ fla m es” fo r new sg ro ups (S pertu s, 1997). T here c o uld b e a th re sh o ld , s u ch th at a n ew sg ro up m es- s a g e is h eld f o r v erif ic atio n b y th e h um an m odera - t o r w hen t h e s e m an tic o rie n ta ti o n o f a p hra se d ro ps b elo w th e th re sh old . A r e la te d u se m ig h t b e a to ol f o r help in g acad em ic re fe re es w hen re v ie w in g j o urn al a n d c o nfe re n ce p ap ers . I d eall y , r e fe re es a re u nbia se d a n d o bje c ti v e, b ut so m etim es th eir c riti - c is m c an b e u nin te n tio nally h ars h . I t m ig h t b e p os- s ib le to hig h li g h t passa g es in a dra ft re fe re e’s r e p ort, w here th e c h oic e o f w ord s s h ould b e m odi- f ie d t o w ard s a m ore n eu tr a l t o ne.

T ong’s (2 001) sy ste m fo r d ete c ti n g a n d tr a ck - i n g o pin io ns in o n-lin e d is c u ssio ns co u ld b en efit f ro m th e u se o f a le arn in g a lg o rit h m , in ste ad o f ( o r i n ad d it io n to ) a han d-b uilt le x ic o n. W it h au to - m ate d re v ie w ra tin g (o pin io n ra tin g), ad verti s e rs c o uld tr a ck ad vertis in g cam paig n s, poli ti c ia n s c o uld tr a ck p ubli c o pin io n , re p orte rs co uld tr a ck p ubli c re sp onse to cu rr e n t ev en ts , sto ck tr a d ers c o uld tr a ck f in an cia l o pin io ns, a n d tr e n d a n aly ze rs c o uld t r a ck e n te rta in m en t a n d t e c h nolo gy t r e n ds. 7 Con clu sio n s T his p ap er in tr o du ces a s im ple u nsu perv is e d le arn - i n g a lg o rith m fo r ra tin g a re v ie w a s th um bs u p o r dow n. T he alg o rit h m h as th re e ste p s: (1 ) ex tr a ct p hra se s c o nta in in g a d je c ti v es o r a d verb s, (2 ) e sti- m ate th e s e m an ti c o rie n ta ti o n o f e ach p hra se , a n d ( 3 ) cla ss if y th e re v ie w b ase d o n th e av era g e se - m an tic o rie n ta tio n o f th e p h ra se s. T he c o re o f th e a lg o rit h m i s t h e s e co nd s te p , w hic h u se s P M I-IR t o c alc u la te s e m an tic o rie n ta tio n ( T urn ey , 2 001). I n ex perim en ts w it h 410 re v ie w s fro m E pin - i o ns, th e a lg o rith m a tta in s a n a v era g e a ccu ra cy o f 7 4% . It a p pears th at m ovie r e v ie w s a re d if fic u lt to c la ss if y , b ecau se th e w hole is n ot n ecessa ril y th e s u m o f th e p arts ; th us th e a ccu ra cy o n m ovie re - v ie w s is a b out 6 6% . O n th e o th er h an d, fo r b an ks a n d a u to m obile s, i t s e e m s t h at t h e w hole i s t h e s u m o f th e parts , an d th e ac cu ra cy is 80% to 84% .

T ra v el r e v ie w s a re a n i n te rm ed ia te c ase .

P re v io us w ork o n d ete rm in in g th e s e m an ti c o ri- e n ta tio n o f ad je ctiv es h as u se d a co m ple x alg o - r it h m th at d oes n ot re ad ily e x te n d b ey o nd is o la te d a d je c ti v es t o a d verb s o r l o nger p hra se s ( H atz iv assi- l o glo u an d M cK eo w n, 199 7). T he sim plic ity of P M I-IR m ay e n co ura g e f u rth er w ork w ith s e m an tic o rie n ta tio n.

T he li m ita ti o ns o f th is w ork in clu d e th e tim e r e q uir e d f o r q uerie s a n d, f o r s o m e a p plic ati o ns, t h e l e v el o f acc u ra cy th at w as ach ie v ed . T he fo rm er d if f ic u lt y w ill b e e lim in ate d b y p ro gre ss in h ard - w are . T he la tte r d if f ic u lt y m ig h t b e a d dre ss e d b y u sin g se m an tic orie n ta ti o n co m bin ed w ith oth er f e atu re s i n a s u perv is e d c la ssif ic a ti o n a lg o rit h m .

A ck now le d gem en ts T han ks to Jo el M arti n an d M ic h ael L it tm an fo r h elp fu l c o m men ts .

R efe ren ces A gre sti, A . 1 996. A n in tr o ductio n to ca te g o ric a l d ata a naly sis . N ew Y ork : W ile y . B rill, E . 1 994. S om e a d van ce s in tr a n sfo rm atio n-b ase d p art o f sp eech ta g gin g. P ro ceed in gs o f th e T w elfth N atio nal C onfe re n ce on A rtific ia l In te llig en ce (p p .

7 22-7 27). M en lo P ark , C A : A AAI P re ss.

C hurc h , K .W ., & H an k s, P . 1989. W ord asso cia tio n n o rm s, m utu al in fo rm atio n an d le x ic o gra p hy. P ro - c eed in gs o f th e 2 7th A nnual C onfe re n ce o f th e A C L ( p p. 7 6-8 3). N ew B ru n sw ic k , N J: A CL.

F ra n k, E ., & H all, M . 2 001. A s im ple a p pro ach to o rd i- n al c la ssif ic atio n. P ro ceed in gs o f th e T w elf th E uro - p ea n C onfe re n ce on M ach in e Lea rn in g (p p. 145- 1 56). B erlin : S prin ger-V erla g .

H atz iv assilo glo u, V ., & M cK eo w n, K .R . 1 99 7. P re d ic t- i n g th e se m an tic o rie n ta tio n o f a d je ctiv es. P ro ceed - i n gs o f th e 3 5th A nnua l M eetin g o f th e A C L a nd th e 8 th C onfe re n ce o f th e E uro p ea n C hapte r o f th e A C L ( p p. 1 74-1 81). N ew B ru nsw ic k , N J: A CL.

H atz iv assilo glo u, V ., & W ie b e, J.M . 2 000 . E ffe cts o f a d je ctiv e o rie n ta tio n a n d g ra d ab ilit y o n s e n te n ce s u b - j e ctiv it y . P ro ceed in gs o f 1 8th In te rn atio n al C onfe r- e n ce o n C om puta tio nal L in guis tic s. N ew B ru nsw ic k , N J: A CL.

H ears t, M .A . 1 992. D ir e ctio n-b ase d te x t in te rp re ta tio n a s an in fo rm atio n access re fin em en t. In P . Ja co bs ( E d.) , T ext- B ase d In te llig en t Syste m s: C urre n t R e- s e a rc h a nd P ra ctic e in In fo rm atio n E xtr a ctio n a nd R etr ie va l. M ah w ah , N J: L aw re n ce E rlb au m A sso ci- a te s.

L an d au er, T .K ., & D um ais , S .T . 1 99 7. A so lu tio n to P la to ’s p ro ble m : T he la te n t s e m an tic a n aly sis th eo ry o f th e acq uis itio n, in d uctio n, an d re p re se n ta tio n o f k no w le d ge. P sy ch olo gic a l R ev ie w , 1 04, 2 1 1-2 40. S an to rin i, B . 1 995. P art- o f- S p eech T aggin g G uid elin es f o r th e P en n T re eb ank P ro je ct (3 rd re v is io n, 2nd p rin tin g). T ech nic al R ep ort, D ep artm en t o f C om pute r a n d I n fo rm atio n S cie n ce, U niv ers it y o f P en nsy lv an ia .

S pertu s, E . 1997. S m okey : A uto m atic re co gn itio n of h o stile m essa g es. P ro ceed in gs o f th e C on fe re n ce o n I n nova tiv e A pplic a tio ns o f A rti fic ia l I n te ll ig en ce ( p p.

1 058-1 06 5). M en lo P ark , C A : A AAI P re ss .

T ong, R .M . 2 001. A n o pera ti o nal s y ste m fo r d ete ctin g a n d t r a ck in g o pin io ns i n o n-li n e d is c u ssio ns. W ork in g N ote s o f th e A C M S IG IR 2 001 W ork sh op o n O pera - t io nal T ext C la ssific a tio n (p p. 1 -6 ). N ew Y ork , N Y:

A CM .

T urn ey , P.D . 2001. M in in g th e W eb fo r sy no nym s:

P M I-I R v ers u s L SA o n T O EFL . P ro ceed in gs o f th e T w elfth E uro pea n C onfe re n ce o n M ach in e L ea rn in g ( p p. 4 91-5 02). B erlin : S prin ger-V erla g .

W ie b e, J .M . 2 00 0. L earn in g s u b je ctiv e a d je c tiv es fr o m c o rp ora . P ro ceed in gs o f th e 1 7th N atio na l C onfe r- e n ce on Artific ia l In te llig en ce. M en lo Park , C A :

A AAI P re ss.

W ie b e, J .M ., B ru ce, R ., B ell, M ., M artin , M ., & W ils o n, T . 2 001 . A c o rp us stu d y o f e v alu ati v e a n d sp ecu la - t iv e la n guag e. P ro ceed in gs o f th e S eco nd A C L S IG o n D ia lo gu e W ork sh op o n D is c o urse a nd D ia lo gue.

A alb org , D en m ark .