[PDF] Movie prediction based on movie scripts using Natural Language

Previous PDF Next PDF

Movie prediction based on movie scripts using Natural Language Movie prediction based on movie scripts using Natural Language

The classification of the movies based on their summary or script involves a lot of work for the streaming platforms as they need to go through the entire movie 

Classifying Movie Scripts by Genre with a MEMM Using NLP-Based Classifying Movie Scripts by Genre with a MEMM Using NLP-Based

04-Jun-2008 Despite the large body of genre classification in other types of text there is very little involving movie script classification. A paper by ...

Predicting Emotion in Movie Scripts Using Deep Learning Predicting Emotion in Movie Scripts Using Deep Learning

movies scripts are becoming great importance in film industry. First we collected html documents that contain movie scripts and parsed them to obtain movie ...

Conceptual Software Engineering Applied to Movie Scripts and Stories

17-Dec-2020 The examples presented include examples from Propp's model of fairytales; the railway children and an actual movie script seem to point to the ...

Measuring Character-based Story Similarity by Analyzing Movie

The dialogues were extracted from the movies' scripts collected from the Internet Movie Script Database (IMSDb) 1. Since the scripts are structured documents 

Conceptual Software Engineering Applied to Movie Scripts and Stories

19-Dec-2020 The examples presented include examples from Propp's model of fairytales; the railway children and an actual movie script seem to point to the ...

Violence Rating Prediction from Movie Scripts

In this work we propose to character- ize aspects of violent content in movies solely from the lan- guage used in the scripts. This makes our method applicable.

The Effect of Using Movie Scripts as an Alter- native to Subtitles

ABSTRACT: This research was conducted to investigate the effect of using movie scripts on improving listening comprehension.

Violence Rating Prediction from Movie Scripts

In this work we propose to character- ize aspects of violent content in movies solely from the lan- guage used in the scripts. This makes our method applicable.

Sentiment Analysis on Adventure Movie Scripts

As a multifarious exposition of senti- ments expressed in movies that's why movie scripts are the film transcripts storehouses and hold in excess of 1100 ...

Classifying Movie Scripts by Genre with a MEMM Using NLP-Based

04-Jun-2008 In this project we hope to classify movie scripts into genres based on a ... very little involving movie script classification.

Movie prediction based on movie scripts using Natural Language

The classification of the movies based on their summary or script involves a lot of work for the streaming platforms as they need to go through the entire movie 

From None to Severe: Predicting Severity in Movie Scripts

07-Nov-2021 MPAA ratings of the movies leveraging movie script and metadata. (Martinez et al. 2019) fo- cused on violence detection using movie scripts.



Violence Rating Prediction from Movie Scripts

In this work we propose to character- ize aspects of violent content in movies solely from the lan- guage used in the scripts. This makes our method applicable.

Personality Prediction of Narrative Characters from Movie Scripts

Figure 1: An example excerpt from “The Matrix” movie script. Blue utterances are mapped to the character Mor- pheus's scene descriptions red are his 

Predicting Emotion in Movie Scripts Using Deep Learning

Recent film production costs are growing to several hundred million dollars and hence

Joint Estimation and Analysis of Risk Behavior Ratings in Movie

To address this limitation we propose a model that estimates content ratings based on the lan- guage use in movie scripts

Exploiting Structure and Conventions of Movie Scripts for

Abstract. Movie scripts are documents that describe the story stage direction for actors and camera

Measuring Character-based Story Similarity by Analyzing Movie

26-Mar-2018 The dialogues were extracted from the movies' scripts collected from the Internet Movie Script Database (IMSDb) 1. Since the scripts are ...

Browse the Best Free Movie Scripts and PDFs Screenplay Database

7 jui 2020 · Here are the best free movie scripts online A library of some of the most iconic and influential screenplays you can read and download 

Movie Scripts Screenplays and Transcripts - SimplyScripts

Links to movie scripts screenplays transcripts and excerpts from classic movies to current flicks to future films

50 Best Screenplays To Read And Download In Every Genre

24 août 2021 · Read as many movie scripts as you can and watch your screenwriting ability soar The best screenplay writers put everything right there on the 

Movie scripts - PDF - Screenplays for You

Movie scripts - PDF - Screenplays for You 13 Ghosts by Neal Marshall Stevens (based on the screenplay by Robb White) revised by Richard D'Ovidio

Script PDF - Free screenplays ready to download

Find the perfect movie script example ready to download If you'd like learn how to write a screenplay you'll find dozens of examples here - all in true 

Where To Download Movie Scripts: 10 Great Sites

13 avr 2023 · Need movie scripts? Here are ten websites for aspiring screenwriters to download screenplays from all genres

[PDF] Film Scripts

This is an example of a film script What you are reading now is known as "action description" which describes what is going on in the scene visually This 

The Internet Movie Script Database (IMSDb)

Our site lets you read or download movie scripts for free Reading the scripts All of our scripts are in HTML format so you can read them right in your web 

Browse - The Script Lab

Browse Our Script Library ? Formats Feature; Feature Film; Half-Hour TV; Miniseries; One-Hour TV; Short; Spec Script; TV Movie

131 Sci-Fi Scripts That Screenwriters Can Download and Study

1 mai 2023 · Ken Miyamoto shares 131 Sci-Fi screenplays that you can use as roadmaps to creating your own science fiction cinematic stories

  • How do I find full movie scripts?

    Per the Netflix Help Center: “Netflix only accepts submissions through a licensed literary agent, or from a producer, attorney, manager, or entertainment executive with whom [they] have a preexisting relationship.”Any idea that is submitted by other means is considered an “unsolicited submission.”
  • Does Netflix read scripts?

    In a screenplay, one page roughly equates to one minute of screen time. This means that as a general rule of thumb, screenplays typically run from 90 to 120 pages long. Screenplays are made up of many scenes, and each scene can be as short as half a page or as long as ten pages.
  • How many pages is a full movie script?

    Start with the film websites like Stage32, Mandy, Production Hub, Coverfly, Inktip, the ISA (International Screenwriting Organization), and other websites for screenwriters. Then move on to freelancing websites like Upwork and Fiverr.

Bachelorof Sciencein ComputerScience


Movieprediction basedon movie scripts

using NaturalLanguage Processing andMachineLearning Algorithms

BhargavChinna pottu

GovardhanArikatla FacultyofComputing,BlekingeInstitute ofT echnology ,371 79Karlskrona, Sweden This thesisis submittedto theF aculty ofComputing atBleckingeInstituteofT echnology in partial fulfilmentofthe requirements forthedegreeo fBachelorof Sciencein ComputerScience. The thesisis equivalen tto10weeksof fulltime studies. The authorsdeclare that theyarethesole authorsof thisthesi sand thatthey hav enot used anyso urcesotherthan thoselistedinthe bibliography andide ntified asreferences. Theyfurther declare thatthey hav enotsubmittedthisthesis atan yotherinstitutionto obtaina degree.



BhargavChinnap ottu

E-mail: bhch20@student.bth.se


E-mail: goar20@student.bth.se


Suejb Memeti,Senior Lecturer

Departmentof ComputerScience

Facultyof ComputingInternet: www.bth.se

Blekinge Instituteof Tec hnologyPhone :+46 45538 5000

SE-371 79Karls krona,SwedenFax: +46455 3850 57


Background:Natural LanguagePro cessing(NLP)isa fieldin artificialin telligence whichdeals withthe communication bet weenhumans andcomputers.Itmakes the computersunderstand theh umanl angu agetextandperformdifferentprograms with thatdata. NLPapproac hes areusedtoconvert thetext inh umanlanguage to computer understandablen umberstoperformtheop erations .NLPtechniques can beused inthe fieldof machine learningfor prediction,whi chmotivated usto proceed with ourthesis ofpredicting them ovie namebased onthedatasetofmovie scripts, whichrequires naturallanguage textprepro cessing. Objectives:The objectiveofthisth es isis toimplementamodelthatpredicts the name ofa movie usingtheinputof randomtext which hasa similarmeaning to the textfrom thescript dataand obtainth eaccuracy in theprediction ofthemovie name. Methods:Literature studymethod isusedtoiden tifythe suitablealgo rithms that can beusedfor training themo delandexperimen tmetho disusedto findthe accuracy ofthe model inthepredictionof movie namewhic hin volv esgathering dataset ofmovie scripts,preprocessingof thedata, trainingthemodel withprepro ces sed data anddifferen tclassificationalgorithmsiden tifiedfrom th eliteraturestudy, and predicting themo vienameofrandom sentences fromthe develop edmodels. Results:Algorithms identifiedfromthe literaturestudy includeRandom Forest, Logistic Regression,Naiv eBayes classifier,SupportVectorMac hinecan beusedfor the predictionof movie nameand outof allthemodels,the mo deltrained using NaiveBa yesclassifierandSupport Vector Machine algorithmperformedw ellinthe prediction ofthe movie name,giventhe setof randomsentencesas input andthe modeltrained usingRandom Forest Classifierhas lessperformed comparedto the remaining models. Conclusions:Modelsare trainedusing 4classification algorithmsRandom Forest, Logistic Regression,N aiveBayesclassifier, SupportVectorMachine identified from the literaturereview. For arandom sampleof paraphrasedtextinputs,allthemo dels are testedin aimto getthe appropriatemo viename. Outof allthe models,themo del trained usingNaiv eBayes classifierandSupportVector Machine algorithmobtain ed the highaccuracy . Keywords:Natural LanguageProcessing, MachineLearning,Classification algorithms,




Wew ouldliketo thankoursuperv isor ,Suejb Memetifor hissupportandencouragement for ourthesis throughhis guidanceand feedback. Finally, we wouldlike toexpress our gratitudeto allour friendsand family.


BhargavChinnap ottu






List ofFigures vi

List ofT ablesvi

1 Introduction1

1.1 Aimsand Objectiv es..... .. .. .. .. .. .. .. .. .. .. .. 2

1.2 ResearchQuestions. .. .. .. .. .. .. .. .. .. .. .. .. .. .2

1.3 Scopeofthe thesis. .. .. .. .. .. .. .. .. .. .. .. .. .. .2

1.4 Overview.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 3

2 Background4

2.1 MachineLearning. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 4

2.1.1 Supervisedlearning. .. .. .. .. .. .. .. .. .. .. .. .4

2.1.2 Unsupervisedlearning. .. .. .. .. .. .. .. .. .. .. .4

2.1.3 Reinforcementlearning. .. .. .. .. .. .. .. .. .. .. .5

2.2 Classification. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 5

2.3 Algorithms. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. 5

2.3.1 SupportVector Machine.. .... .. .. .. .. .. .. .. .5

2.3.2 NaiveBay esclassifier.... .. .. .. .. .. .. .. .. .. .6

2.3.3 RandomF orestalgorithm.. .. .. .. .. .. .. .. .. .. 6

2.3.4 LogisticReg ression.... .. .. .. .. .. .. .. .. .. .. 7

2.4 NaturalLanguage processing .... .. .. .. .. .. .. .. .. .. .8

3 RelatedW ork10

4 Method12

4.1 LiteratureStudy .. .. .. .. .. .. .. .. .. .. .. .. .. .. .12

4.2 Experiment... .. .. .. .. .. .. .. .. .. .. .. .. .. .. .13

4.2.1 WorkingEnvironmen t..... .. .. .. .. .. .. .. .. .13

4.2.2 DataCo llection....... .. .. .. .. .. .. .. .. .. .14

4.2.3 DataPrepro cessing... .. .. .. .. .. .. .. .. .. .. .14

4.2.4 Trainingthemo del. .... .. .. .. .. .. .. .. .. .. .16

4.2.5 PerformanceMetrics. .. .. .. .. .. .. .. .. .. .. .. 17

4.2.6 Testingthemo del. .... .. .. .. .. .. .. .. .. .. .. 17


5 Results18

5.1 Resultsfrom theLiterature Study. .. .. .. .. .. .. .. .. .. .18

5.2 ExperimentalResults.. .. .. .. .. .. .. .. .. .. .. .. .. .21

5.2.1 LogisticReg ressionResults.... .. .. .. .. .. .. .. .. 22

5.2.2 NaiveBay esresults.... .. .. .. .. .. .. .. .. .. .. 23

5.2.3 SupportVector Machineresults. .... .. .. .. .. .. .. 24

5.2.4 RandomF orestclassifierresults. .. .. .. .. .. .. .. .. 25

5.3 Comparisionof results. .. .. .. .. .. .. .. .. .. .. .. .. .2 6

6 Analysisand Discussion29

7 Conclusionsand Future Work33


List ofFigures

2.1 SupportVector machine.. .... .. .. .. .. .. .. .. .. .. .6

2.2 RandomF orestClassifier.. .. .. .. .. .. .. .. .. .. .. .. .7

2.3 LogisticRegression.. .. .. .. .. .. .. .. .. .. .. .. .. .. 7

5.1 Predictionof movie nameusingLogisticRegression .. .. .. .. .. 22

5.2 Confusionmatrix forLo gisticRegression .... .. .. .. .. .. .. 22

5.3 Predictionof movie nameusingNaive Bay esclassifier .... .....23

5.4 Confusionmatrix forNaiv eBa yes.. ...... .. .. .. .. .. .. 23

5.5 Predictionusing LogisticRegression Support Vector Machine.. ..24

5.6 Confusionmatrix forSupp ortV ectorMachine. .... .... .. .. 24

5.7 Predictionof movie nameusingRandomF orestclassifier .. .. ...25

5.8 Confusionmatrix forRandom Forest classifier. .... .. .. .. .. 25

5.9 Comparisonof accuracyplot .. .. .. .. .. .. .. .. .. .. .. .27

6.1 TextPreprocessi ng..... .. .. .. .. .. .. .. .. .. .. .. .30


List ofT ables

5.1 Resultsfrom LiteratureStudy .. .. .. .. .. .. .. .. .. .. .. 21

5.2 Comparisonof predictions. .. .. .. .. .. .. .. .. .. .. .. .27

5.3 Comparisonof accuracy. .. .. .. .. .. .. .. .. .. .. .. .. 28


Chapter 1


In thecurren tworld,mo viesarethe bestsourceof entertainmen t.Notonlyfor entertainmentbutalso, theyare ama jorsource ofcommerce, marketing, andbenefits in theeducation. Withthe growth ofdifferen ttechnologies, online streamingof moviesand TVsho wshas becomewidelyp opularand thereexistseveral streaming platforms likeNetflix,Amaz onPrime Video,andYouT ube. Theen tir eworldis excited aboutthego od streamingplatformswiththebest userin terfaceand suggestion systems. Searchsuggestionsh av ebeenamajorissuethat canhel pto seekthe attentionof theusers, asit would be difficultfor peopletoremem berevery movie name. Therefore,the suggestionor recommendationsystem forthe movies had becomea popular serviceover theau dienceinthesociety toget moviesuggestions without rememberingallthemo vienames. Severalw orksdealwithclassifying textb yusing NaturalLanguage Processing (NLP). Someof thew orksinclu declassifyingthew ebsites,books by genresor authors[33], andclassifying thelyrics ofthe songin todiff eren tgenres[32]. The classification ofthe mov iesbasedontheirsummaryorscriptin volv esa lotof work for thestreaming platformsas theyneed togo throughth een tiremo viescript or summary manually.Therearerelated wo rksthat includethe modelthatpredictsthe rating ofthemo viebased ontheIMDB website using machine learningtechniques[6], the modelthatpredi cts themoviegenrebasedon thep lotsummariesof themo vie using machinelearning methods[12]andthe modeltopredict thegenre ofthe moviebased onscripts using NLPtec hniques[5].Alltheprevious systemsare developedusingdifferen tartificial intelligencetechnologies such asmachinelearning classification, machinelearningclu stering,neuralnetw orks,andNLPtec hniques.As perthe knowledge, thepreviousmodels dealwith predictingthe ratingofthemo vie, predicting thegenre ofthe mo vie,a ndnoworksrelated tothepredictionof movie name basedon movie scripts. Topredict themo viename usingthehuman input,requires ah uman-computer interactionmo del.Inthisthesis toprepare amo delthat understa nds andin terprets the humanlanguageto thecom puter, we haveexperimented withNLPtechniques to extract thefeatures fromth eh umanlanguagetextand classifytheminto differen t movieclasses usingmac hine learningclassificationalgorithms. 1

Chapter 1.In troduction2

The goalof thisthesis isto identify thesuitable machine learningclassification algorithms thatcanbe usedfor predictionandconduct theex perimen tto se lect the modelwiththe best predictionaccuracy includingthecollectionof ad ataset, preprocessingof thedataset before trainingthe modelusingNLP techniques such as tokenization,stemmingand preparea bagof words model, trainthe model for the samedataset ofmo viescripts withallidentified algorith msand selectthe model with bestpredictionaccuracy amongthem.

1.1 Aimsand Objectiv es

The thesisaims todev elopa modelusingNatural LanguagePro cessingthatgives an appropriatemoviesuggestionin whichsomeplot orthe scenarioof themovieis givenas aninput tothe model.

The objectivesofthethesis include:

1. Gatherand createa datas etwith severalmovienamesand theirscripts totrain

the model.

2. Selectthe machine learningalgorithmsthatcan be usedfor trainingthe model.

3. Trainthemo delwith prepareddatasetafterprep rocessing withNLP and

selected classificationalgorithms andv alidatethe resultsof themo delafter testing withan yplotinthe movie inaim toget theappropriatemov ie name.

4. Analyzethe resultsof themo delwith thetest dataset.

1.2 ResearchQuestions

Toaccomplish theaim, thisthesis aimson thefollo wingre sea rch questions: RQ1:Whichclassification algorith mscanbeusedinpredi cti ngthemovie name? Motivation:The motivationforthis research questionis toconducttheliterature study toiden tifythesuitableclassification algorithmsthat canb eused forprediction of movienameb ytraining themodel withmo viescripts data. RQ2:Howaccurate themo deltrained usingNLPinpredicting the movie name? Motivation: Themotiv ationforthisresea rch questionis tofindtheaccuracyof the modelstrainedin theprediction ofmo viename usingNLP textprepro cessing techniquesand several classificationalgorithms.

1.3 Scopeofthe thesis

The focusofthis thesisis todev elopa model thatpredicts thenameofthe movie, when arandom scenariofrom themo vieis given asinput.Themo de lisdevelop ed using preprocesseddatasetwith theNLP techniques andclassifying thescript dataset using machinelearningalgorithms. Asthere isa possibilit yof samescenarios inmore

Chapter 1.In troduction3

than onemovie,the limitationofthedev eloped model canb epredicting thewrong movien amesometimesandwe arenot dealingwith multiclassprediction atpresen t.

1.4 Overview

The thesisis classifiedin todifferen tchaptersand thestructureofthe thesisis describedb elow: In chapter2,the background ofthis thesisconsistsofinformation about themain field of theresearc handitsapplications inthe realw orl d.In ch ap ter3,theinformation aboutall theprevious andrelated wor ksin asimilar fieldandtheresearchgap thatw e focuson inth isthesis isdescribed.Chapter 4describ esthe researchmethod thatw e choosetodo ourresearc h,the implement ationoftheexp eriment, andtheinformation aboutthe different algorithms.Inchapter 5,the results fromtheliteraturestudy and theexp erimentarerepresentedclearly .Chapter 6dealswithth eanalysisofthe results fromthe experimen tandadiscussionabout thethesis. Chapter7 describes the conclusionof thethesis andinformation about thew orkthat wewould like to do inthe future.

Chapter 2


2.1 MachineLearning

Machinelearning[3] isa subsetof artificialin telligencethat aimson trainingcomputers to learnfrom thedata anddev elopwith the knowledgeofdata. Machine learning applications becomemoreefficien twh entheyha vemoredatatolearn andimpro ve with theiruse. Machine learningalgorithmsareused totrain themo delto develop patterns andcorrelations bet weenthefeaturesinlargedatasetsand tomak epredictions based onthe knowledge oftraining.Thedifferen tcategories ofmac hinelearning algorithms aresup ervisedlearning,unsupervised learning,reinforcemen tlearning.

2.1.1 Supervisedlearning

Supervisedlearning isa meth od oftrainingthemodelwithlabeled datasetsand is used topredict thetest databased onthe traineddatasets. Insup ervisedlea rning, the modelistrained by feedingthe modelwithinput dataand asw ellasoutput data, the modellearnsfrom thetraining da taand aftertra ining,thealgorithmpredicts new databased onthe learningfrom training.The goalof super vise dlearning is to developap attern oraprocedurethatpredicts thenew testdata basedonthe analysis oftraining da tathatalreadyhastheclass label. Ins upervised learning, the inputla beleddataactsasthe referenceto predictthe testdata correctly[18]. Supervisedlearning isclassified into tw otypes suchasclassification,regression.In classification, thealgorithm predictsthe classo fthe newtest dataandinregression, the algorithmpredicts the realnumb erof thenewtestdata.Supervisedlearningis used inman yreal-worldapplications suchasimage classification,spam detection, risk assessment[25].

2.1.2 Unsupervisedlearning

Unsupervisedlearning[10] isa method oftraining themodelusing algorithmsto analyze andcluster unlabeled datasetswithoutany reference.It islik elearningnew things withthe human brain.Inunsupervised learning,only inputdata isprovided to themo delatthetime oftraining butnot theoutput da ta.Unsu pervised learn ing aims tolearn thestructure ofthe da taseta nd predictthetestdatabasedon the similarities andc haracterizethedatain aunique format.Unsup ervisedlearning can beused, whenthere isn oprior labeled datasetfortraining andin morecomplex task processing.Itrequires minimum human supervisioncomparedto supervised 4

Chapter 2.Bac kground5

learning. Unsupervisedlearningis categorizedin tot wo typessuc hasclustering, association.Unsup ervisedlearningisused inman yarea sthat includemark etbasket analysis, patternrecognition, identif yingaccident-proneareas,andman ybusiness models[28].

2.1.3 Reinforcementlearning

Reinforcementlearning isan approach inwhic hthemac hineslearn bycommunicating with theen vironment.Inthisapproach, the machine learnsbyp erformingdifferent operationsin which therewillbe rewards andpunishmen tforeach step.During the trainingof themo del,there willbea reward forev eryappropriateactionand a punishmentfor every inappropriateaction.The maingoal ofreinforcementlearning is tofind thestrategy such thatit maximizesthenum ber ofrew ards.In reinforcement learning, themac hineworkson itsownwithout any supervision[35]. Variousapplications of reinforcementlearning includeindustrial automationinrobotics, developmentof games, marketing,andadv ertising[19].

2.2 Classification

Classification isthe method ofsupervisedlearning thatis usedintheprediction of discrete data.In classification,the machine learnsfrom thelabeleddata andclassifies the datain todifferentclasses. Classificationcanbe binaryor multi-class classification based onthe classesin thetraining dataset.In classificatio n, thetrained datais groupedin todifferenttarget classesandthete st datais predictedbased onthe target classes. Classificationalgorithmsareapplied indifferen treal-w orldapplications such as sentimentanalysis,spamclassification, anddo cument classification[29].

2.3 Algorithms

The classificationalgorithms usedfor trainingthe model inour the sisareSupp ort VectorMac hine,NaiveB ayesclassifier, RandomForestclassifier,LogisticRegressio n algorithms.

2.3.1 SupportVector Machine

The SupportVector Machine(SVM)algorithm isasupervised learningalgorithm that canb eusedforb othregression andclassification problems.Themainaim of theSupp ortVectorMac hinealgorithmisto findthebest possible hyp erplane that uniquelyclassifies allthe datap oints plottedin theN-dimensionalspace.The hyperplanecanb ein differentdimensions,whi ch isbas ed onthenumber ofindep endent features inthe datasetand theb esth yperplane ischosenconsidering thelargest margin orseparation bet weenthedataclas ses.Supp ortV ectorMachinecanbeused for linearand non-linearclassification problemsb yusing different kernel functions and isefficien tinhigh-dimensiona lapproac hes.Thedivisionoftheh yperplaneis representedin figure2.1. Chapter 2.Bac kground6Figure 2.1:Su pportVectorMac hine

2.3.2 NaiveBay esclassifier

NaiveBa yesclassifierisaclassification algorithmthat depends onthe Bay estheorem with anassumption ofindep endenceb etweenthe datapoints.Itassumesthatthe presence ofone datap oint isnotcorrelatedtothe presenceof anotherdata poin t. There arethree kindso fNaiv eBayes approachessuch asGaussian,Multinomial, Bernoulli. GaussianNaiv eBayes approachisappliedforthe classificationprob lems whichis assumedto be anormal distribution,theMultinomialapproac his applied for discretedata andBernoulli isused forclassifying, whenthe featurev ectorsare binary.Naiv eBayes techniquesareapplied inreal-worldapplications such astext classification, recommendationsystems. GaussianNaiv eBa yesdistributionfun ction is:

P(xiy) =1 =p2ff2exp((xiy)22ff2y)where

ff yandy are theassumptions oflik elihoo d.

2.3.3 RandomFor estalgorithm

Random Forestisone ofthe mostfrequen tlyus edmac hinelearning approaches,which unites theoutput ofdifferen tdecision treesintoa singleoutput. Itis easytoapplyin classification andregression problems.This approach isused toobtainmoreefficien t results whenind ividualtreesarenotcorrelatedwi thone anotherand largerthe total of treesin theforest. Itdep endson theapproac hofensemble learning,whic his a concept ofcom biningvariousclassifiersto obtainasolutionto acomplex problem and toobtain theb estaccuracy .TheRan domForestapproac hcon tainstherandom subsets ofsev eraldecisiontreesand thefinal decisionis thea verage decisionof the subsets togiv egood predictionaccuracy.Themaininputs ofthe RandomForest algorithm arethe num beroftreestocombine,the num ber offeatures toclassify,and

Chapter 2.Bac kground7

the sizeof ano dethat needstospecify before trainingthe model. Thetreestructure of theRandom Forest algorithmisrepresented infigure 2.2.Figure 2.2:Random Forest Classifier

2.3.4 LogisticRegression

Logistic Regressionalgorithm isused inthe classificationto find thecategory of the dependentfeaturewiththeset ofindep endent features.It iscategorized into differentt ypessuchasbinary ,multinomial,and ordinallog isticregressionbased on the dependentfeatures.Itpredicts thelik elihoo dof thetest dataandclassifiesit intoone ofthe classesof dependen tfeatures. Onlydiscrete datacanbep red icted using LogisticRegressionwhich differsfrom LinearRegression,which canb eus ed to predict thecon tinuousdata.LogisticRegressioncan be appliedin many applications likepredicting spamclassification, theeffect ofthe disease(low/high/medium) [26]. The logisticcurv eisrepresented infigure 2.3.Figure 2.3:Logistic Regression

Chapter 2.Bac kground8

2.4 NaturalLanguage process ing

Natural LanguagePro cessing(NLP)isa branch ofartificial intelligence thatassists computers inunderstanding andexplaining human languagetext orspeec hthat containsa la rgeamountofstructured,semi-structured, andunstructureddata.NLP defines theimp ortantpartsofhuman languageto computersand allowscomputersto interactwith humans intheirlanguage,whic hin volv esa lotofprocessing. NLPcan beapplied totranslation, search engineoptimization, andfiltering[7]. Thea vailable searchsu ggestionswhenusingthebuilt-in search barin mostof theapplications are basedon NLPcon tent categorizationandtopicmodeling[22].Th eemails are categorized dueto the useofBay esianspam filteringand astatisticalmodelthat compares thes ubjectoftheemail withspam wordstoiden tifyspam mails[27]. Customer feedbackandreviewsabout any applicationor organizationcanb edetermined using sentimentanalysis,which predictsthe user"sfeelingsabout themb yextracting information fromv arioussources[40].NLPtec hniquesinclude:

Named Entityrecognition

This isa po pulartechniqueof NLPthatisusedininformationextraction.This approachtak esthesentence ofthe textasinp utand identifies allthe nounspresent in thein puttext.Itis widelyused inapplications like newscon tent categorization, searchengines toretriev einformation easily[7].


Tokenizationisthe process ofsplitting thedatainto tokens like characters, word s, sentences,n umbers.Tokenizationisused foreffectivestorages pacefor thedata and decreases searchdegree.It isapplied inmost informationretriev alsystems[21].

Stemming andlemmatization

Stemming isthe process ofdecreasingallthe words inthe inputtext into theirbase or rootformb yremo vingthesuffixesand makingiteasyfor themo deltraining. Lemmatization isused toobtain theprop erv ocabulary wordfor eachwordinthe input textb ytransformingthemto root formb yunderstanding themeaning,parts of speechofthew ord.

Bag ofW ords

The bagof words modelisused intextpreprocessing toextract thefeatures fromthe data totrain themac hinelearning models.It counts theoccurrenceofeach word in the sentence,andthey canb erepresen tedin theformofv ectorsusingvectorizat io n. The bagofwords model isusedinsev eralapplicationslike emailfiltering, documen t classification[41].


Sentimentanalysisis oneof themost usednatural languagetec hniquesto predict the emotionor sentime ntfromthetext.Itisused tofind theemo tionof thetext

Chapter 2.Bac kground9

in anydocumen t,socialmedia,newsand classifythemas positi ve, negative, or neutral. Sentimentanalysisworks best withthesubjectiv edata writtenb yh umans and predictsthe emotion[40].

Natural Languagegeneration

It isthe process ofconverting thera wdataintonaturallangu age text.Natural language generationtec hniquesareusedin organizationsthat contain alarge amount of dataand conv ertitintonaturallanguage makingit easierforthema chine to understand thepatterns. Applicationsof naturallanguage generationare documen t clustering, realization,con tentdeterminationsystems,andw ay-finding systems[8].

Chapter 3

Related Work

Aziz Rupawalaetal.[31]ha ve prepareda model topredictthemoviegenre from plot summariesb ycomparingvarious classificationalgorithms suchasMultinomial NaiveBa yes,RandomForest,Logistic Regression,SGD toobtaintheb estresults in prediction.The authorsmainly focused onselection ofbestsu itab lealgorithms, comparison betweendifferentclassificationsand selectedthealgorithmwhic hgiv es the maximumdesiredoutput inprediction byanalyzingeac hmo delafter theexperiment. Alex Blackstocketal.[5]hav etried toclassify themo vienamebasedongenre using moviescriptsdata by developing aLogistic Regressionmodel.In theirmo del, they hadconsidered featu resextractedfromscriptsusingthe NLPtec hniques.Naiv e Bayesclassifiera nd Markovmodelc lassifierhaveused toanalyzethep erformance metrics intheir implementation. Foreach moviescript,in thetestdataset,the model predicts thepossibilit ythatthemovieb elongsto each genrebased onextracted features anduses thek best scoreto predictgenres,wherek isa hyp er-parameter. Implementationof themo delmo vierecommendersystemusing machinelearning algorithms likeKmeans clusteringan dK nearestneigh boral gorithms was performed byRishabh Ahuja andhiscol leagues. Intheir research,theyhavestudied different typesofmac hinelea rningalgorithmsand afterthestudytheygot aclear picture of everyalgorithmand wherethey canb eapplied. Theauthor proposed system predicts themo viebasedonus er"s preferencesusing differentparameters[2]. WardaRuheen Bristiet al.[6] hav eimplemen teda modelthatpredictsthemo vie IMDB ratingusing machine learningtechniquessuc has Bagging,RandomForest, NaiveBa yes,J48,IB.Thereare somefactors needto considerwhi lepredicting the rating ofthe movie, andintheirresearc hthe authorsconsidered thefactors suchas budget, actor,and directorof them ovie topredict themovieIMDBrating.Finally , they concludedwiththe resultsa yingsanction andbudget hada hig heffect onthe moviesuccess. The researcharticle[37] shows theimplemen tationofah ybridmo vierecommender system usingsen timentanalysisonsparkplatform toimpro ve theaccuracy ofmo vie recommendation systems.The authorstates thatcomprehensiv ecom binationof emotions, reviews,anduserpreferencecanhelp torecommend theb estmo vie. They haveimplementedthe modelwiththe com binationofconten t-ba se dfiltering methodand collaborativ efilteringtocreateah ybridmo vierecommender systemand 10

Chapter 3.Related Work 11

sentimentanalysisto enhancethe accuracyin theprediction results. In theresearc h[12],theauthor develop eda model forpredictingthemovie genre basedon plotsummaries. Thisarticle describe sthe implementation ofseveral machinelearning algorithmslik eNaiv eBayes, RecurrentNeuralNet worksandWord2V ec + XGBoostfortextcla ssification andProbab ilitythresholdapproach,K- binary transformation forgenre selectionto predictthe movie genrebased onsummaries. The experimentisperforme dwith morethan250,000movieswhic hconcluded that the GatedRecurrent Units(GRU) neuralnet workswithprobabilit ythresholdapproach reachesthe best outcomeonthe testsample. In thearticle[20 ],theimplementatio nof aconversationalmovie search system using ConditionalRand omFieldsisdescrib ed.The authorsdev elopedamo delthat parses thespok eninputintoseman ticclassesusingCond itionalRandomFields and therebysearc hingforthe movie intheindexeddatabase withthehelpof recognized semanticclasses. Intheir paper, authorsmen tionedtheuseof topic modelsinthe input extension,v ocabularylearninganddifferent searching techniques forefficient searchingof thedatabase. Vasujain [15]has prepareda model thatpredicts themo viesuccessusing sentimen t analysis fromt witterdata.Theygathered datafrom thet witterab outthe several aspectsthat definesthe movie popularit yandmanually labelledthetrainingdatasets intothree classesas hit,flop, av erage.The testsamples arepredictedfromthetrained modelwhic hclassifiedthemin tohit, flop,a verageclasse s. Theyha vealsoconsidered the associationbetweent weettimeandtweetn umber. In the[4] researc h,theauthorsdeveloped ah yb ridapproachfor theclassification of emotionsin tohappy, fear,anger,surprise,disgust,sadclasses basedon speec h and textdata. Theresearc hersin tendtoanalyzethe speechand theconsequen ttext of thesp eakertopredicttheemotions of thesp eaker. Fo rtheimplementa tionof the model,theresearc hersused NLPtechniquesli ke pos tagging,stopword removal for thefeature"s extractionand Support Vector Machineclassifierto classifythe emotions basedon both speechand text.Theyaimtoimprov ethe efficiencyof the emotion classificationb yconsideringboth theaudio andtextfe aturev ectorsas a single featurev ectorwhichis thenpassedtothe classifier. Fromthe abo veidentifiedworks,thereare developmentsin thefieldofmo vie success prediction,mo vierecommendationsystems,mo vier ating predictionsand some relatedfields. Inthis thesisw eare developing amodelfor theprediction of moviename basedon themo viescripts, which isnotimplemented before.

Chapter 4


Literature studyandExp eriment methodsarechosento answertheresearc hquestions. In theliterature study, wehav estudieddifferent machinelearningalgorithmsand identifiedsev eralclassificationalgorithmsthat canb eused topredict themo vie name basedon as cri ptandlearnabouttheNLPtec hniquesto implement thedata preprocessingb eforetrainingthemo del.The datais gathered fromthe IMSDb source, whichcon tainsamoviescriptdatabase andis storedinthefile toprepare a dataset.The experimen tisperformedwiththe gathereddataset andalgorithms identifiedfrom theliterature study.

4.1 LiteratureStudy

Toiden tifytheclassificationalgorithms thatcan be usedfor theclassification ofmo vie script datain todifferentmo vies,wec hooseSystematicLiterature Review(SLR)[39] as ourresearc hprocedure thathelpstofindthew orks inthe samefield an danalyze them usingstandard procedures. ˆIdentifiedthe key wordslike movieprediction,classification algorith ms,Natural Language Processing,which aremainfieldsof ourthesis.

ˆBy usingthe identified keywords, searchedforthe sourcesofrelatedw orksinIEEE, GoogleScholar,Science DirectandtheArxiv publicrep ositories.

ˆFromthe resultsobtained fromthe searc hof key words,gatheredsomeresearch articles thatare related toNaturalLanguagePro cessing,mo vierecommendation or predictionw orks,andclassificationalgorithms usedfor theprediction.

ˆAfter collectingall therelated works, theinclusion andexclusioncriteriaisquotesdbs_dbs9.pdfusesText_15
[PDF] movie theater attendance 2019

[PDF] movie theater attendance by year

[PDF] movie theater conference

[PDF] movie theater demographics

[PDF] movie theater industry statistics

[PDF] movie theater magazine

[PDF] movie theater revenue

[PDF] movie theater statistics

[PDF] movie theater trade group

[PDF] movie ticket sales statistics

[PDF] movie titles alphabetical

[PDF] movie titles list

[PDF] movies 2016 comedy action

[PDF] movies 2017 imdb comedy

[PDF] movies about journalists