Sentiment analysis seeks to determine the general attitude of a writer, given some text they have written. For example, given the movie review “The film was a breath of fresh air” a sentiment analysis

C an C om pute rs T hin k? ( C SC 1 06) F all 2 022 P ro gra m min g P ro je ct 4 D ue: T uesd ay, 1 1/1 5/2 022 ( 1 0:0 0p m ) S en tim en t a n aly sis s e ek s t o d ete rm in e t h e g en era l a ttitu de o f a w rite r, g iv en s o m e t e x t t h ey h av e w ritte n . F or e x am ple , g iv en t h e m ov ie r e v ie w “ T he fi lm w as a b re ath o f f re sh a ir ” a s e n tim en t a n aly sis p ro gra m s h ould r e ali z e t h at t h is i s a p ositiv e s ta te m en t a b out t h e m ov ie , w hile t h e r e v ie w “ It m ad e m e w an t t o p oke o ut m y e y e b alls ” e x pre sse s a n eg ativ e o pin io n. I n t h is p ro je ct y ou’ll i m ple m en t a v ery s im ple s e n tim en t a n aly sis s y ste m f o r m ov ie r e v ie w s.

L ea rn in g O bje ctiv es: 1 . D em onstr a te r e ad in g i n put f ro m fi le s. 2 . P ra ctic e u sin g d ic tio narie s. 3 . E xplo re s e n tim en t a n aly sis . R em in der : P ro gra m min g p ro je cts s h ould b e d one i n div id ually . Y ou s h ould n ot l o ok a t a n y one e ls e ’s c o de, s h ow y our c o de t o a n y one e ls e , w rite c o de f o r a n y one e ls e , o r l e t s o m eo ne e ls e w rite c o de f o r y ou. P le ase s e e t h e s y lla b u s o r N ex us f o r w hat t o d o w hen y ou n eed h elp w ith y our w ork .

M ov ie r e v ie w d ata se t Y ou c an d ow nlo ad t h re e fi le s w ith m ov ie r e v ie w s f ro m N ex us. T he m ov ie r e v ie w s c o m e f ro m R otte n T om ato es a n d t h eir s e n tim en t h as b een m an ually r a te d o n a s c ale f ro m 0 t o 4 . ● 0 - n eg ativ e ● 1 - s o m ew hat n eg ativ e ● 2 - n eu tr a l ● 3 - s o m ew hat p ositiv e ● 4 - p ositiv e E ach fi le i s f o rm atte d a s f o llo w s:

0 Devoid of any of the qualities that made the first film so special .

4 The Bai brothers have taken an small slice of history and opened it up for all ...

3 Ramsay and Morton fill this character study with poetic force and buoyant ...

1 You emerge dazed , confused as to whether you ’ve seen pornography or documentary .

T here i s o ne r e v ie w p er l i n e. E ach l i n e s ta rts w ith t h e r e v ie w ’s s e n tim en t s c o re , f o llo w ed b y t h e w ord s o f t h e r e v ie w . T he w ord s h av e a lr e ad y b een p re -p ro cesse d s o t h at a ll w ord s a n d p unctu atio n s y m bols a re s e p ara te d b y a b la n k c h ara cte r.

Y ou a re g iv en t h e f o llo w in g t h re e fi le s w ith m ov ie r e v ie w s: ● m ovie_reviews_training.txt c o nta in s 6 129 m ov ie r e v ie w s. U se t h is t o b u ild u p a d ic tio nary o f s e n tim en t s c o re s f o r e ach w ord . ( S ee f u rth er e x pla n atio n b elo w .) ● m ovie_reviews_mini.txt c o nta in s o nly t h e fi rs t 1 5 m ov ie r e v ie w s f ro m m ovie_reviews_training.txt .

U se t h is fi le t o t e st o ur y our f u nctio n f o r b u ild in g a w ord s e n tim en t d ic tio nary o n a m ore m an ag eab le fi le . ● m ovie_reviews_dev.txt c o nta in s 8 00 r e v ie w s. U se t h is t o e v alu ate h ow w ell y our s e n tim en t a n aly sis s y ste m w ork s a n d t o fi ne-tu ne i t. G en era l A ppro ach W e c an e stim ate t h e s e n tim en t o f a w ord b y a v era g in g t h e s e n tim en t s c o re s o f t h e r e v ie w s t h at t h e w ord a p pears i n . F or e x am ple , i f t h e w ord t e rrib ly a p pears i n t w o r e v ie w s w ith s c o re s o f 0 a n d o ne r e v ie w w ith a s c o re o f 3 , t h en t h e e stim ate d s e n tim en t s c o re f o r t h e w ord t e rrib ly w ould b e 1 ( th e a v era g e o f 0 , 0 , a n d 3 ).

O nce w e h av e s e n tim en t s c o re s f o r a b u nch o f w ord s, w e c an u se t h ose s c o re s t o p re d ic t t h e s e n tim en t o f a n ew m ov ie s r e v ie w b y a v era g in g t h e s e n tim en t s c o re s o f a ll t h e w ord s i n t h e m ov ie r e v ie w .

1 W ord S en tim en t D ic tio nary D ow nlo ad t h e s ta rte r fi le s e n tim en t_ an aly sis .p y a n d t h e t h re e fi le s w ith m ov ie r e v ie w s d esc rib ed a b ov e f ro m N ex us.

W rite a f u nctio n m ake_word_sentiment_dictionary .

T he f u nctio n s h ould t a k e t h e n am e o f a fi le w ith m ov ie r e v ie w s ( in t h e f o rm at d is c u sse d a b ov e) a s a p ara m ete r, a n d i t s h ould r e tu rn a w ord s e n tim en t d ic tio nary . T he k ey s i n t h is d ic tio nary s h ould b e i n div id ual w ord s, a n d t h e v alu es s h ould b e ( s m all) d ic tio narie s t h at a sse m ble t h e f o llo w in g i n fo rm atio n a b out t h e w ord : ● t h e s u m o f t h e s e n tim en t s c o re s o f t h e m ov ie r e v ie w s i n w hic h t h e w ord a p pears ( k ey : “ t otal s core ” ) ● t h e n um ber o f m ov ie r e v ie w s i n w hic h t h e w ord a p pears ( k ey : “ c ount ” ) ● t h e a v era g e s e n tim en t s c o re o f t h e m ov ie r e v ie w s i n w hic h t h e w ord a p pears ( k ey : “ a verage s core ” ) F or e x am ple , h ere i s w hat a n e x cerp t o f t h e w ord s e n ti m en t d ic tio nary m ig ht l o ok l i k e:

{’nice’: {’total score’: 38, ’count’: 14, ’average score’: 2.7142857142857144}, ’little’: {’total score’: 312, ’count’: 154, ’average score’: 2.0259740259740258}, ’story’: {’total score’: 538, ’count’: 251, ’average score’: 2.143426294820717}, ’process’: {’total score’: 41, ’count’: 17, ’average score’: 2.411764705882353}, ...} N ote :

c o nv ert a ll w ord s t o l o w er c ase b efo re a d din g t h em t o t h e d ic tio nary . Y ou c an u se t h e b u ilt- in s tr in g m eth od . lower() f o r t h is p urp ose .

2 P re d ic tin g S en tim en t S co re s f o r R ev ie w s N ow a d d a f u nctio n p redict_sentiment_score .

G iv en a m ov ie r e v ie w a s a s tr in g a n d a w ord s e n tim en t d ic tio nary , t h is f u nctio n s h ould r e tu rn a n e stim ate d s e n tim en t s c o re f o r t h e r e v ie w . I t s h ould e stim ate t h e s c o re b y a v era g in g t h e s c o re s o f a ll t h e w ord s i n t h e r e v ie w .

F or e x am ple , g iv en t h e r e v ie w " This m ovie i s a wesome ! " a n d t h e w ord s e n tim en t d ic tio nary c re ate d f ro m t h e d ata i n m ovie_reviews_training.txt , t h is f u nctio n s h ould r e tu rn r o ughly 2 .4 1.

A m ov ie r e v ie w m ay c o nta in s o m e w ord s t h at a re n ot i n t h e w ord s s e n tim en t d ic tio nary t h at y ou’v e c re ate d . F or t h ose w ord s, a ssu m e t h at t h ey a re n eu tr a l ( i.e . t h at t h ey h av e a s c o re o f 2 ).

3 C la ssif y in g M ov ie R ev ie w s A cco rd in g t o T heir S en tim ent I f y ou t e st a f e w o f t h e r e v ie w s f ro m t h e m ovie_reviews_dev.txt fi le , y ou w ill s e e t h at w e a re n ot a b le t o t o ta lly a ccu ra te ly p re d ic t t h e s e n tim en t s c o re s. M ost p re d ic tio ns y our f u nctio n m ak es a re c lo se r t o t h e a v era g e t h an t h e a ctu al s c o re s o f t h e r e v ie w s. H ow ev er, i f a ll w e a re i n te re ste d i n i s w heth er o r n ot a m ov ie i s r a te d p ositiv ely , t h at m ay n ot m atte r. W e m ay s till b e a b le t o u se t h e p re d ic te d s c o re s t o d ecid e w heth er w e s h ould g o s e e a m ov ie o r n ot.

A dd a f u nctio n i s_positive t h at t a k es a m ov ie r e v ie w a n d a w ord s e n tim en t d ic tio nary , t h en c la ssifi es t h e r e v ie w a s e ith er p ositiv e ( i.e . h um an s w ould g iv e i t a s c o re o f 3 o r 4 ) o r n ot. T he f u nctio n s h ould r e tu rn a b oole an v alu e ( i.e . T ru e o r F als e ). T ry o ut y our c la ssifi catio n f u nctio n o n a f e w m ov ie r e v ie w s f ro m t h e m ovie_reviews_dev.txt fi le . T he f u nctio n w on’t g et i t r ig ht f o r a ll r e v ie w s, b u t i t s h ould g et i t r ig ht m ore o fte n t h an n ot.

4 F in e T une Y our S yste m N ow u nco m men t t h e d efi nitio n o f t h e f u nctio n e valuate a s w ell a s t h e l i n e i n t h e t e stin g a re a t h at c alls i t. T his w ill r u n y our s e n tim en t a n aly sis s y ste m o n a ll 8 00 m ov ie r e v ie w s f ro m t h e fi le m ovie_reviews_dev.txt a n d p rin t o ut s o m e i n fo rm atio n o n h ow m an y o f t h em w ere c la ssifi ed c o rre ctly .

I t s h ould b e f a ir ly e asy t o c la ssif y m ore t h an 6 0% c o rre ctly . W ith f u rth er fi ne-tu nin g o f i s_positive , y ou m ay m an ag e t o g et a n a ccu ra cy a s h ig h a s 7 5% .

N ote :

t h e fi ne-tu nin g o f i s_positive i s o p tio n al , b u t e valuate a n d t h e l i n e t h at c alls i t s h ould b e u nco m men te d i n y our s u bm is sio n.

W hat t o S ubm it Y ou n eed t o s u bm it t h e fi le s entiment_analysis.py . B efo re s u bm ittin g, c h eck t h at y our c o de m eets t h e f o ll o w in g r e q uir e m en ts : 1 . A re y our fi le s p ro perly c o m men te d ? T his s h ould i n clu de t h e f o ll o w in g: a . A h ead er c o m men t, w hic h i n clu des y our n am e a n d a b rie f d esc rip ti o n o f t h e p ro gra m . b . C om men ts w ith in t h e b ody o f t h e c o de t o f u rth er c la rif y h ow t h e p ro gra m w ork s. 2 . A re y our p ro gra m s f o rm atte d n eatly a n d c o nsis te n tly ? D o y ou u se s o m e w hite sp ace t o h elp a h um an r e ad er u nders ta n d t h e l o gic al o rg an iz atio n o f y our c o de?

H in t : i t m ay b e h elp fu l t o t h in k o f t h is a s b re ak in g y our c o de u p i n to “ p ara g ra p hs.” 3 . H av e y ou c le an ed u p a n y c o de s n ip pets y ou n o l o nger n eed ? a . T he fi nal v ers io n o f t h e p ro gra m s h ould n ot c o nta in a n y l i n es o f a ctu al c o de t h at a re c o m men te d o ut t o k eep t h em f ro m r u nnin g. S uch s n ip pets s h ould b e r e m ov ed b efo re s u bm ittin g. O nce y ou h av e c h eck ed t h ese t h in gs, s u bm it y our fi le s t o P ro je ct 4 o n G ra d esc o pe. 5 G ra d in g g uid eli n es ● C orre ctn ess: p ro gra m s d o w hat t h e s p ecifi catio ns r e q uir e . ● P ro gra m l o gic : u se l o ops w here a p pro pria te , v aria b le s w here a p pro pria te , a n d f u nctio ns w here a p pro pria te . A ls o , y our c o de d oesn ’t c o nta in l i n es o f c o de t h at d on’t c o ntr ib u te t o t h e p ro gra m . ● C om men ts : c o de i s p ro perly c o m men te d . T here s h ould b e a h ead er c o m men t t h at i n clu des t h e a u th or’s n am e a n d d esc rib es t h e p ro gra m ’s p urp ose . A dditio nal c o m men ts i n t h e c o de s h ould h elp c la rif y h ow t h e p ro gra m w ork s. ● O rg an iz atio n: u se w hite sp ace t o i n dic ate t h e l o gic al s tr u ctu re o f t h e c o de. ○ L in es o f c o de t h at l o gic ally b elo ng t o geth er a re c lo se t o geth er i n t h e fi le . ○ I m port s ta te m en ts a re a t t h e v ery t o p… ○ f o llo w ed b y f u nctio n d efi nitio ns… ○ t h en f o llo w ed b y t h e r e st o f t h e c o de.