Exploring CEFR classification for German based on rich linguistic PDF

ULP stages – CEFR levels Overview. Stage Japanese Arabic. Mandarin. French German

Common European Framework of Reference for Languages Self

I can understand the main points of clear standard speech on familiar matters regularly encountered in work school

Exploring German Multi-Level Text Simplification

simplification levels along the Common Euro- pean Framework of Reference for Languages. (CEFR) simplifying standard German into lev- els A1

Investigating standards in GCSE French German and Spanish

The linking of GCSE grades to the CEFR levels across components within Spanish and German is very consistent with productive skills being at a lower CEFR level

Syllabus Cambridge IGCSE German 0525

The aims are to enable students to: • develop the language proficiency required to communicate effectively in German at level A2 (CEFR Basic User) with

Assigning CEFR Ratings to ACTFL Assessments

between the ACTFL Proficiency Guidelines and the CEFR and the tests based on English French

Exploring CEFR classification for German based on rich linguistic

29 sept. 2013 correlating with the proficiency levels. 2 / 21. CEFR classification for German. Julia Hancke. Detmar Meurers. Introduction.

Exploring CEFR classification for German based on rich linguistic

At the same time there is increasing interest in a more comprehensive empirical characterization of the relevant linguistic properties of the CEFR levels. The

The way into the German labour market - Visa and entry procedure

German language skills level A2 (CEFR). Proof of enrolment in a qualifying training programme. Proof of nancial means. Equivalence or comparability of the

CEFR Levels for Cambridge IGCSE® English as a Second

Framework of Reference for Languages (CEFR). CEFR levels of IGCSE E2L by skill. Syllabus 0511. Reading. Writing. Listening. Speaking. Grade A.

ExploringCEFRclassification for Germanbasedon richlinguisticmodeling

JuliaHancke, DetmarMeurers

{jhancke,dm}@sfs.uni-tuebingen.de TheissueTheCommonEuropean Framew orkofReference forLanguages(CEFR)hasgaineda leadingroleas aninstrumentof referenceforthe certificationoflanguage proficiency .Atthe same time,thereis increasinginterest inamore comprehensive empiricalcharacterizationoftherele vant linguisticpropertiesof theCEFR levels. Theresearchreported onin thispaperapproaches thisissueby studyingwhichlinguistic properties reliablysupportthe classificationof shortessaysin termsofCEFR levels. Complementingthe workonEnglishcriterialfeatures andlearnerlanguage characteristicsthatis startingto emerge (Hawkins&Buttery2010; Yannakoudakis etal.2011),wefocuson identifyinglearner language characteristicsofdif ferentlev elsofGermanproficiency. CorpususedTheempiricalbasis ofour researchconsistsof 1027professionallyrated freetext essaysfromCEFR examstak enbysecond languagelearnersofGerman.Eache xamlevel(A1 to C1)isrepresented byabout200 texts, varyingbetween 8and366 wordsin length(meanlengthof

121words). Thedatahasbeencollected bytheproject MERLIN-Multilingual Platformfor the

EuropeanReferenceLe vels:InterlanguageExploration inContext(http://merlin-platform.eu). FeaturesexploredWedefinedabroadset of3821 featureswhichcan beautomaticallyidentified usingcurrentNLP tools.We primarilyusecomple xitymeasuresfrom SecondLanguageAcquisi- tionresearchto modelle xicalandmorphological richnessandsyntactic sophistication: Atthelexicallevel ,westarted byadaptingthe featuresdiscussedfor EnglishbyLu (2012)andMc- Carthy&Jarvis(2010)for German.T omeasurethe depthofle xicalknowledge, weimplemented anumberof featuressuggested byCrossley etal.(2011). Wee xtractedfrequency scoresfromthe lexicaldatabasedlexDB(http://dle xdb.de). Wecomputedfeaturesofle xicalrelatednessusingGer- maNet7.0(http://www .sfs.uni-tuebingen.de/lsd), alexical-semanticresourceforGerman,similar toWordNet (Miller1995)forEnglish.We addedshallow measuresofspelling errorsin termsof thenumberof contentword typesnotfound indlexDB andthemisspelledwords foundbyGoogle SpellCheck(v ersion1.1,https://code.google.com/p/google- api-spelling-java). Ourmorphologicalfeatures forGermancapture thelearner' suse ofmood,case, andwordforma- tion.We automaticallyextractedtensepatternsfrom theRFTagger (Schmid&Laws2008) output andincludedfrequenc yratiosof thesepatternasfeaturesforour classifier. Thetensefeatures might allowmoredetailedinsightsinto thetensesthe learnersusedat eachof thelev els. Atthesyntacticlevel ,ourfeatures aremostlyinspired bythemeasures usedforthe analysisof syntacticcomplexity inEnglish(Lu2010).Ho wever ,Germansyntactically differsfrom English insev eralrelevantrespects.Fore xample,Germanallowssubjectlesssentences.Thus,whilein generaltheintention behindthe EnglishSLAcomple xitymeasurescan beexpressed intermsof theGermansynt acticstructure andcategories,theprocessof adaptinganddefining syntacticcom- plexityfeaturesforGermanis far fromtrivial. Asbasicsyntactic vocabulary forGerman,wemade useofthe Negratreebank annotationscheme(Skut etal.1997). Weaddeddependency-basedf eaturesofsyntacticcomple xitythat werepreviouslyusedinsecond &Hartrumpf2007; Vor derBrücket al.2008;Dell'Orlettaetal.2011). Weautomaticallyextractedparse treerules fromtheparsetreesproducedby theStanford Parser, followingBriscoeetal. (2010)andY annakoudakiset al.(2011),who usedasimilar featurebased ontheoutput oftheRASP parser. Weused frequencyratios oftheseparse treerulesasfeaturesfor ourclassifier. Complementingthelinguistic syntactic analysis,wealso implementedanumberofshallowerlan- guagefeatures .Unigram,bigram andtrigramlanguage modelscorespro videstatisticalcompar - isonstoa linguisticallysimpler modelbasedon anews websiteforchildren (http://news4kids.de) andamore complexmodel basedon anewswebsiteforadults (http://www.n-tv .de). NLPtoolsused Toautomaticallyidentifythele xical,morphological,and syntacticfeatures,we employarangeofNLP toolsand resourcesincludingApache OpenNLP(http://opennlp.apache. org),RFTagger(Schmid& Laws2008),theStanfordP arser(Rafferty &Manning 2008)withthe standardGermanmodel trainedon theNEGRAcorpus (http://coli.uni-saarland.de/projects/sfb378/ negra-corpus),theSRILMLanguageModeling Toolkit(Stolck e2002),and thelexical database dlexDB(http://dlexdb.de).F ordependencyparsingweusedtheMATEdependencyparser(Bohnet

2010),withthe standardmodelfor German(Seeker &Kuhn 2012)trained ontheTIGERcorpus.

Beforetaggingand parsing,a Java APIforGoogle SpellCheckw asusedtoreduceproblemscaused byspellingerrors. ExperimentalsetupOnthebasis ofthe3821 automaticallyderiv edfeatures,we trainedaclas- sifierusingthe SequentialMinimal Optimization(SMO)Algorithm asimplementedin theWEKA toolkit(Hallet al.2009). Wesplit thedatasetinto atraining andtestsetbyrandomlyassigning

2/3ofthe samplesfromeach classto thetrainingset (721samples) and1/3to thetestset (306

samples).Asan additionalmethodfor evaluation weusedten-fold cross-validation onthewhole dataset. ResultsThefollowing tableprovidesanov erviewof theperformanceoftheclassifier forthefi ve level(A1-C1)CEFRclassificationtask:

AccuracyontestsetCrossvalid.onalldata

Randombaseline20%

Majoritybaseline32.9%33.0%

SMO(allfeatures) 57.2%64.5%

SMO(bestfeatures) 62.7%

Theclassifier trainedwithallfeatures achieves anaccuracyof57.2% withtheseparate trainingand testsetand anaccurac yof64.5% whenusingcross-v alidationonalldata.Comparedto amajority baselineofclassifying allsamples asthelar gestclass,this isanimpro vementof 24.3%and31.5% respectively. Investigatingthenotabledifferencebetweenthetest setandthe cross-validationresults, weidenti- fiedtwo issues.Lookingattheresults ofeachindi vidualcross-validation foldrev ealedthatthere is considerablevariance intheresults(10.7%betweenthe bestandw orstperforming fold).Ho wever , theworst cross-validationfoldstillhad abetterresultthanourtest set.Thiscould beanef fect totheslightly largeramount oftrainingdata availableinthecross validation procedure.Another reasonforthe comparablypoorperformance onourtest setcouldlie inthe uneven distributionof examtypes(asopposedto essaygrades) acrossthedif ferentCEFRle vels. Examiningtheperformance ofindividual featuregroupswith holdoutestimationre vealedthatthe lexical(60.5%)andmorphological(56.8%) featureswere themostsuccessful predictorsof the CEFRlev el.Thesyntacticfeaturesandlanguagemodelingscores werenotv erysuccessfulpredic- torstaken ontheirown(53.6% and50.0%),b utthesyntactic featuresclearlyimprovedthe classi- ficationincombination withotherfeatures groups.Parse rulefeaturesand tensefeatures werethe leastpredictiv efeaturegroups(49.0%and38.5%),howev er,further experiments showedthat their indicativepowerimproves whentheyareencodedas binaryinsteadofasfrequency-basedfeatures. Thebestmodel was obtainedbycombining allfeaturegroupsandusingWEKA 'sCfsSubsetEval, acorrelation-basedmethod forfeature selection.Itincluded asetof 34featuresconsisting of syntactic,lexical, languagemodelandmorphologicalindicatorsand resultedina classification accuracyof62.7%onthe testset.

References

Bohnet,B.(2010). Top Accuracyand FastDependencyParsing isnotaContradiction.InProceedingsofthe24th InternationalConference onComputationalLinguistics(COLING).Beijing,China, pp.89-97. Briscoe,T., B.Medlock&O.Andersen(2010). AutomatedassessmentofESOL freete xtexaminations .Tech. rep.,

UniversityofCambridgeComputerLaboratory.

Crossley,S.A.,T.Salsb ury,D. S.McNamara& S.Jarvis(2011).Predictinglexical proficiencyinlanguagelearners

usingcomputationalindices. LanguageTesting28,561-580.

Dell'Orletta,F., S.Montemagni&G.Venturi (2011).READ-IT: AssessingReadability ofItalian Texts withaView

toTe xtSimplification.InProceedingsofthe2ndW orkshoponSpeec handLangua gePr ocessingforAssistive

Technologies.pp.73-83.

Hall,M.,E. Frank,G. Holmes,B.Pf ahringer,P .Reutemann& I.H.W itten(2009).The WEKADataMiningSoftware: AnUpdate.In TheSIGKDDExplor ations.vol. 11,pp.10-18.

Hawkins,J.A.&P .Buttery(2010). CriterialFeaturesin LearnerCorpora:Theory andIllustrations.EnglishProfile

Journal.

Lu,X.(2010). Automaticanaly sis ofsyntacticcomple xityinsecondlanguagewriting.InternationalJournal ofCorpus

Linguistics15(4),474-496.

Lu,X.(2012). TheRelationshipof LexicalRichness totheQuality ofESL Learners'OralNarrati ves.TheModern

LanguagesJournalpp.190-208.

McCarthy,P.&S.Jarvis(2010). MTLD,vocd-D,andHD-D:A validationstudy ofsophisticated approachestole xical

diversityassessment.BehaviorResearc hMethods42(2),381-392.URL https://serifos.sfs.uni-tuebingen.de/svn/

Miller,G.(1995).WordNet: alexical databaseforEnglish. Communicationsofthe ACM38(11),39-41.URL http://aclweb.org/anthology/H94-1111.

Rafferty,A.N.&C.D. Manning(2008).P arsingthreeGerman treebanks:lexicalized andunlexicalized baselines.In

ProceedingsoftheWorkshop onPar singGerman.Stroudsbur g,PA,USA:AssociationforComputationalLinguis- tics,PaGe '08,pp.40-46.URLhttp://dl.acm.or g/citation.cfm?id=1621401.1621407. Schmid,H.& F.La ws(2008). EstimationofConditionalProbabilitiesW ithDecisionTreesandan Applicationto Fine-GrainedPOST agging.InCOLING'08Pr oceedingsofthe 22ndInternationalConferenceonComputational Linguistics.Stroudsbur g,PA:AssociationforComputationalLinguistics,v ol.1,pp.777-784.URLhttp://www . Seeker,W.&J.Kuhn (2012).MakingEllipsesExplicitinDependenc yConv ersionfora GermanTreebank.InIn Proceedingsofthe8th InternationalConference onLanguag eResources andEvaluation,3132-3139. Istanbul, Turkey:EuropeanLanguageResources Association(ELRA). Skut,W., B.Kreen,T.Brants &H.Uszk oreit(1997).An AnnotationSchemeforFreeWord OrderLanguages.In

ProceedingsoftheFith Conferenceon AppliedNatural Language.Washington, D.C.URLhttp://www.coli.uni-sb .

Stolcke,A.(2002).SRILM- anextensible languagemodelingtoolkit. InProceedingsofICSLP.Denv er,USA,vol.2,

pp.901-904.URL http://www.speech.sri.com/cgi- bin/run-distill?papers/icslp2002-srilm.ps.gz.

VorderBrück,T. &S. Hartrumpf(2007).A semanticallyorientedreadabilitychecker forGerman.In Z.Vetulani

(ed.),Proceedingsofthe3r dLanguag e&T echnologyConference.Pozna´n,Poland:W ydawnictwo Pozna´nskie,pp.

270-274.URLhttp://pi7.fernuni- hagen.de/papers/brueck_hartrumpf07_online.pdf.

VorderBrück,T., S.Hartrumpf& H.Helbig(2008). AReadabilityCheckerwithSupervised Learningusing Deep SyntacticandSemantic Indicators.Informatica32(4),429 - -435.

Yannakoudakis,H.,T.Briscoe& B.Medlock(2011). Anewdatasetandme thodforautomatically gradingESOLte xts.

InProceedingsofthe49th AnnualMeetingof theAssociationfor ComputationalLinguistics:Human Language Technologies-Volume1.Stroudsbur g,PA,USA:AssociationforComputationalLinguistics, HLT'11,pp.180-189.

URLhttp://aclweb .org/anthology/P11-1019.pdf.Corpusavailable:http://ilexir.co.uk/applications/clc-fce-dataset.

quotesdbs_dbs8.pdfusesText_14

[PDF] cefr german online

[PDF] cefr germany

[PDF] cefr global scale

[PDF] cefr global scales

[PDF] cefr grammar profile

[PDF] cefr illustrative scales

[PDF] cefr in malaysia primary school

[PDF] cefr in usa

[PDF] cefr journal

[PDF] cefr language levels and cambridge exams

[PDF] cefr language levels french

[PDF] cefr language levels hours

[PDF] cefr language levels pdf

[PDF] cefr language levels table

[PDF] cefr language levels test

[PDF] Exploring CEFR classification for German based on rich linguistic

ULP stages – CEFR levels Overview Stage Japanese Arabic

Common European Framework of Reference for Languages Self

Exploring German Multi-Level Text Simplification

Investigating standards in GCSE French German and Spanish

Syllabus Cambridge IGCSE German 0525

Assigning CEFR Ratings to ACTFL Assessments

Exploring CEFR classification for German based on rich linguistic

Exploring CEFR classification for German based on rich linguistic

The way into the German labour market - Visa and entry procedure

CEFR Levels for Cambridge IGCSE® English as a Second

JuliaHancke, DetmarMeurers

121words). Thedatahasbeencollected bytheproject MERLIN-Multilingual Platformfor the

2010),withthe standardmodelfor German(Seeker &Kuhn 2012)trained ontheTIGERcorpus.

2/3ofthe samplesfromeach classto thetrainingset (721samples) and1/3to thetestset (306

AccuracyontestsetCrossvalid.onalldata

Randombaseline20%

Majoritybaseline32.9%33.0%

SMO(allfeatures) 57.2%64.5%

SMO(bestfeatures) 62.7%

References

UniversityofCambridgeComputerLaboratory.

Technologies.pp.73-83.

Journal.

Linguistics15(4),474-496.

LanguagesJournalpp.190-208.

270-274.URLhttp://pi7.fernuni- hagen.de/papers/brueck_hartrumpf07_online.pdf.