[PDF] Information, Communication & Society CRITICAL QUESTIONS FOR



Previous PDF Next PDF







A INTRODUCTION TO INFORMATION AND COMMUNICATION TECHNOLOGY

Information Communication Technology (ICT) is a term that describes types of technology that are used specifically for communications It is like Information Technology, but ICT focuses more on technologies that deal with communication, like cell phones, the Internet and wireless networks, things (Young, 2012)



Introduction to Information and Communication Technology in

Information and Communication Technology (ICT) is a major challenge to our educational system This book is designed for use by PreK-12 preservice and inservice teachers, and by teachers of these teachers It provides a brief overview of some of the key topics in the field of Information and Communication Technology (ICT) in education



ERM Component: Information and Communication

The fourth component of the framework is Information and Communication Information and Communication is vital for an entity to achieve its obj ectives Entity management needs access to relevant and reliable information related to internal as well as external events Information and Communication has three principles, and to help mana gement



2 Information and Communications Technology (ICT)

through information or communication programs using a range of ICTs 4 Reduce child mortality 5 Improve maternal health 6 Combat HIV/AIDS, malaria, and other diseases Reduce infant and child mortality rates by two-thirds between 1990 and 2015 Reduce maternal mortality rates by three-quarters between 1990 and 2015 Provide access to all who need



Information, Communication & Society CRITICAL QUESTIONS FOR

Information, Communication & Society Vol 15, No 5, June 2012, pp 662–679 ISSN 1369-118X print/ISSN 1468-4462 online # 2012 Microsoft



Information and communication technologies for development

2 Information and communication technologies for development 2 New ways of using IT technologies and communication systems are transforming lives across the world There had been concerns that the digital divide was accelerating the global class structure, creating wider gaps between the haves and have-nots If this was the case 10 years ago,



NOTES Information Communication Technology (ICT)

Information can be easily accessed 6 Features of table, query, form and report Table – stores a collection of information about specific topic Query –request for a specific data from a database Software and Data Security Form – interface to enter information Report – summary of information from the database 5 10 Cyber law acts in



NOTES Information And Communication Technology (ICT)

Faster communication speed Lower communication cost Reliable mode of communication Effective sharing of information Paperless environment orderless communication 6 Negative impact of IT on the 2 negative impacts : Social problems Health problems 7 Definitions omputer Ethics omputer ethics is a system of moral standards or values used as a



Communication: The Process, Barriers, And Improving Effectiveness

Communication is the process of transmitting information and common understanding from one person to another In this article, I discuss the communication process, barriers to communication, and improving communication effectiveness _____ The study of communication is important, because every administrative function

[PDF] dut information communication option communication des organisations programme

[PDF] dut information communication option communication des organisations avis

[PDF] programme dut information communication option métiers livre patrimoine

[PDF] licence science pour l'ingénieur evry

[PDF] licence science pour l'ingénieur grenoble

[PDF] programme licence spi

[PDF] science pour l'ingénieur débouché

[PDF] licence science pour l'ingénieur bordeaux

[PDF] cours d'electricite pdf

[PDF] emploi master sciences de l éducation

[PDF] master sciences de l'éducation forum

[PDF] production écrite sur la joie

[PDF] quoi faire avec un master science de l'éducation

[PDF] décrire les sentiments de peur

[PDF] doctorat sciences de l'éducation débouchés

This article was downloaded by: [108.20.246.51]

On: 09 June 2012, At: 12:56

Publisher: Routledge

Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Information, Communication &

Society

Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/rics20

CRITICAL QUESTIONS FOR BIG

DATA danah boyd a & Kate Crawford b a Microsoft Research, One Memorial Drive, Cambridge,

MA, 02142, USA

b Microsoft Research, One Memorial Drive, Cambridge,

MA, 02142, USA E-mail:

Available online: 10 May 2012

To cite this article: danah boyd & Kate Crawford (2012): CRITICAL QUESTIONS FOR BIG DATA, Information, Communication & Society, 15:5, 662-679 To link to this article: http://dx.doi.org/10.1080/1369118X.2012.678878

PLEASE SCROLL DOWN FOR ARTICLE

Full terms and conditions of use: http://www.tandfonline.com/page/terms- and-conditions This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub- licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae, and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand, or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material. Downloaded by [108.20.246.51] at 12:56 09 June 2012 danahboyd& KateCrawford

CRITICALQUESTIONSFOR BIGDATA

Provocationsforacultural,

technological,andscholarly phenomenon Theer aofBigDatahasbegun.Computer scientists,physicists ,economists,mathemati- foraccesstothemassivequa ntitie sofinform ationproducedb yandabout people,things, lyzinggeneticseq uences,socialmediain teractions,healthrecords,phon elogs,govern- mentrecords, andotherdigitaltr acesl eftbypeople.Significantquestionsemerge. Willlarg e-scalesearchdatahelpuscreatebetter tools, services,andpubl icgoods?Or lyticshelpusunde rstandonlin ecommu nitiesandpoliticalmovements?O rwillitbeused cationand culture,or narrowthe paletteofre searchoption sandalterwhat'research' means?Give ntheriseofBigData asasocio-technicalphenomenon, wearguethatit isne cessarytocriticallyinterro gateit sassumptionsandbiases.Inthisart icle,weoffer sixpro vocationstosparkconversationsabouttheissuesofBigData: acultural,techno- mythologythatprovokesexte nsiveutopia nanddystopianrhetoric. KeywordsBigData;anal ytics;socialmedia; communicationstudies; socialnetworksites ;philosophyofscience; epistemology;ethi cs;Twitter (Received10December2011;final versionreceiv ed20March 2012) Technologyisneithergoodnorbad;nor isitneutral ...technology'sinter- actionwiththe socialecologyis suchthat technicalde velopments frequently haveenvironmental,social,and humanconsequencesthatgofarbe yond the immediatepurposes ofthetechnicalde vicesandpractices themselves. (Kranzberg1986,p .545) Information,Communication&Society Vol.15,No.5,June 2012,pp.662 -679 ISSN1369-118Xprint /ISSN1468-4462 online#2012Microsoft Downloaded by [108.20.246.51] at 12:56 09 June 2012 Weneedtoopena discourse- wherethereis noeffective discourse now- aboutthev aryingtemporalities,spatialities andmaterialitiesthatwe might representinour databases,witha viewto designingformaximum flexibility andallowing aspossibleforanemergent polyphony andpolychrony .Raw dataisboth anoxymoron andabad idea;tothe contrary,datashouldbe cookedwithcare. (Bowker 2005,pp.183-184) Theeraof BigData isunderway .Computerscientists, physicists,economists, mathematicians,politicalscientists, bio-informaticists,sociolog ists,andother scholarsareclamoringforaccess tothemassiv equantitiesofinformationpro- ducedby andaboutpeople,things,and theirinteractions.Div erseg roups argueaboutthe potentialbenefitsand costsofanal yzinggeneticsequences, socialmediainteractions, healthrecords,phone logs,gov ernment records,and otherdigital tracesleftbypeople.Significant questionsemerge. Wi lllarge- scalesearch datahelpuscreatebetter tools,services, andpublic goods?Or willitusher inane wwav eofpr ivacyincur sionsandinvasive marketing?Will dataanalytics helpusunderstandonline communitiesandpolitical movements? titiesof datatransform howw estudyhumancommunicationand culture,or narrowthepaletteofresearchoptions andalterwhat 'research'means? BigDatais, inmanywa ys,apoor term.As Manovic h(2011)observes,it has beenusedin thesciencesto refertodata setslargeenough torequire supercom- puters,butwhatonce requiredsuch machines cannow beanalyzed ondesktop computerswithstandardsoftware.Thereis littledoubtthat thequantitiesofdata nowavailable areoftenquitelarge,butthatisnotthedefining characteristic of thisnew dataecosystem.Infact,someof thedataencompassed byBig Data(e.g. allT wittermessagesaboutaparticular topic)arenot nearlyaslarge asearlier datasetsthat werenot consideredBigData (e.g.censusdata).Big Datais less aboutdatathat isbigthan itisabout acapacityto search,agg regate,and cross-referencelarge datasets.

WedefineBigData

1 asacultural, technological, andscholarl yphenomenon thatrestson theinter playof: (1)Technology:maximizing computationpowerandalgor ithmicaccuracy to gather,analyze,link, andcomparelargedatasets. (2)Analysis:drawing onlargedatasetsto identifypatterns inorderto make economic,social,tec hnical,andlegal claims. (3)Mythology:thewidespread beliefthatlarge datasetsoffer ahigherfor mof intelligenceandkno wledgethatcan generateinsightsthatwerepre viously impossible,withtheauraof truth,objectivity ,andaccurac y. Likeothersocio-technicalphenomena, BigDatatr iggersbothutopianand dys- topianrhetoric. Ononehand,BigDatais seenasa pow erfultoolto address

CRITICALQUESTIONSFOR BIGDATA663

Downloaded by [108.20.246.51] at 12:56 09 June 2012 varioussocietalills,offeringthepotential ofnew insightsinto areasasdiverseas cancerresearch, terrorism,andclimatec hange.Ontheother,Big Dataisseen as atroubling manifestationofBigBrother,enabl inginvasi onsof privacy, decreased civilfreedoms,and increasedstateand corporatecontrol. Aswithall socio-tech- nicalphenomena,the currentsof hopeandfear oftenobscurethemorenuanced andsubtleshifts thatareunderwa y. Computerizeddatabasesarenotne w.The USBureau oftheCensus deployedtheworld'sfirst automatedprocessingequipment in1890-the punch-cardmachine(Anderson 1988).Relationaldatabasesemergedinthe

1960s(Fry& Sibley1974). Personal computingandtheInternet have madeit

possibleforawiderrange ofpeople- includingscholar s,marketer s,go vern- mentalagencies,educational institutions,andmotiv atedindividuals- to produce,share,interact with,andorganizedata.Thishas resultedinwhat SavageandBurrows(2007)descr ibeasa crisisinempiricalsociology.Data setsthatw ereonceobscure anddifficulttomanage-and, thus,only ofinterest tosocialscientists -are nowbeing aggregatedand madeeasily accessibleto anyonewhoiscurious,regardlessof theirtraining. Howwehandlethe emergenceofaneraofBig Dataiscr itical.Whilethe phenomenonistaking placein anenvironment ofuncertainty andrapid change,currentdecisionswillshape thefuture.With theincreasedautomation ofdatacollection andanalysis -asw ellasalgor ithmsthatcanextractand illus- tratelarge-scalepatter nsinhuman behavior-itisnecessary toask which systemsaredr ivingthese practicesandwhichareregulating them.Lessig (1999)argues thatsocialsystems areregulatedb yfourforces: market,la w, socialnorms, andarchitecture-or, inthecase oftechnology,code .When it comestoBigData, thesef ourforce sarefrequentlyat odds.Themarket sees BigDataas pureopportunity: marketers useitto targetadvertising, insurance providersuseittooptimizetheir offerings,and WallStreet bankers useitto readthe market.Leg islationhasalreadybeenproposed tocurbthecollection andretentionof data,usually overconcer nsaboutpr ivacy(e.g.the USDo NotTrac kOnlineActof2011).Featureslike personalizationallowrapid accesstomore relevantinfor mation,butthe ypresentdifficultethicalquestions andfragmentthe publicin troublingwa ys(Pariser2011). Therearesome significantandinsightful studiescurrentl ybeingdone that involveBigData,butitisstill necessarytoask criticalquestions aboutwhat all thisdatameans, whogetsaccess towhatdata, howdata analysisis deploy ed, andtowhat ends. Inthisar ticle,weoffersix provocations tosparkcon versations aboutthe issuesofBig Data.We aresocialscientists andmediastudies scholars whoarein regularconv ersationwith computerscientistsand informatics experts.Thequestionsthatwe askarehard oneswithouteasy answer s,although oftensur prisingtothosefromdifferentdisciplines.Dueto ourinterestin and experiencewithsocialmedia,our focushereis mainlyonBig Datain social

664INFORMATION,COMMUNICATION& SOCIETY

Downloaded by [108.20.246.51] at 12:56 09 June 2012 mediacontext.That said,we believethat thequestionsw eareaskingarealso importanttothosein otherfields. Wealso recognizethatthe questionswe are askingarejust thebeginning andwe hopethatthis articlewillsparkother sto questiontheassumptions embeddedinBig Data.Researcher sin allareas- includingcomputerscience ,business,and medicine-havea stake inthecompu- potentialwithinmultipledis ciplines.Webelieve thatitistimetostartcritic ally interrogatingthisphenomenon,itsassumptions, anditsbiases .

1.Big Datachangesthe definitionof knowledge

Intheearl ydecadesof thetwentiethcentury,Henry Ford deviseda manufactur- ingsystemof massproduction,using specializedmachinery andstandardized products.Itquicklybecame thedominantvision oftechnologicalprogress. 'Fordism'meantautomationand assemblylines;fordecadesonward, this becametheor thodoxyof manufacturing:outwithskilledcraftspeopleand slowwork,inwith anewmachine-madeera (Baca2004).But itwas more thanjust anew setoftools .Thetwentiethcenturywas markedby Fordismat acellular level:it producedanewunderstanding oflabor, thehumanrelationship towork, andsocietyatlarge. BigDatanot only refersto verylargedatasetsand thetoolsandprocedures usedtomanipulate andanalyze them,butalso toacomputational turninthought andresearch (Burkholder1992).JustasF ordchanged theway wem adecar s- andthen transformedw orkitself-BigDatahas emergedasystemofknowledge thatisalread ychang ingtheobjectsofknowledge ,whilealsohavingthepow erto informhowwe understandhumannetworksand community.'Changethe instruments,andyouw illchange theentiresocialtheorythat goeswith them',Latour(2009) remindsus (p.9). BigDatacreates aradicalshift inhow wethink aboutresearch. Commenting oncomputationalsocial science,Lazer etal.(2009)argue thatitoffer s'the capacitytocollect andanalyze datawithan unprecedentedbreadthand depth andscale'(p .722).It isneitherjustamatter ofscale norisit enoughtoconsider itinte rmsofp roximity,orwha tMoret ti(2007)referstoasdistantorclos e analysisoftexts.Rather, itisa profoundchange atthelevelsof epistemology andethics. BigDatareframeskey questionsaboutthe constitutionofkno wledge, theprocessesof research,ho wwe shouldengagewithinformation, andthe natureandthe categorization ofreality. JustasDuGayand Pryke(2002) note that'accountingtools ...donotsimpl yaid themeasurementofeconomic activity,theyshapethe realitytheymeasure'(pp .12- 13),so BigDatastak es outnew terrainsofobjects,methodsof knowing,anddefinitionsofsocial life. Speakingin praiseofwhat heterms 'ThePetab yteAge',Ander son,Editor- in-ChiefofWi red,writes:

CRITICALQUESTIONSFOR BIGDATA665

Downloaded by [108.20.246.51] at 12:56 09 June 2012 Thisisa worldwhere massiveamounts ofdataandappliedmathematics replaceev eryothertoolthatmightbebroughttobear .O utwithev ery theoryof humanbehavior ,fromlinguistics tosociology.Forgettaxonomy, ontology,andpsychology. Whokno wswhypeopledowhattheydo? The pointisthe ydo it,andwecantrac kandmeasure itwithunprecedented fide- lity.Withenough data,thenumbersspeakfor themselves.(2008) Donum bersspeakforthems elves?Webelie vetheansw eris'no'.Significantly, Anderson'ssweepingdismissalofal lothertheoriesanddiscipli nesisatell:it revealsanarrogantundercur rentin manyBigData debateswhereotherforms ofanal ysisaretooeasilysideline d.O thermethodsfo rascertainingwh ypeople dothing s,writethings,or makethings arelostinthesheerv olume of craft.AsBerry(201 1,p. 8)writes,BigData provides'destab lisingamounts of knowledgeandinformationthatlackthe regulatingfo rceofphilosophy'.Ins tead ofphilosophy-whichKantsawastheratio nalbasi sforallin stitutions-'compu- "epoch"asanewhist oricalcon stellat ionofinte lligibility'(Berry2011,p.12 ). Wemustaskdifficultquestions ofBigData' smodels ofintellig ibilitybefore theycrystallizeintone worthodox ies.If wereturntoF ord,hisinnovationwas usingtheassemb lylineto breakdowninterconnected,holistictasks into simple,atomized,mechanisticones .Hedid thisbydesigningspecialized tools thatstrongly predeterminedandlimitedtheaction oftheworker. Similarly, thespecializedtools ofBig Dataalsoha vetheir own inbuiltlimitationsand restrictions.Forexample,Twitter andFacebookareexamplesof BigData sourcesthat offerverypoor archivingand searchfunctions. Consequently, researchersaremuchmorelik elytofocus onsomethinginthepresentor immediatepast- trackingreactions toanelection, TVfinale,ornaturaldisaster -becauseof thesheer difficultyorimpossibility ofaccessingolder data. Ifwe areobservingtheautomation ofparticular kindsofresearchfunctions, thenwe mustconsidertheinbuiltflaws ofthemac hinetools. Itisnot enoughto simplyask,asAnderson hassuggested'what cansciencelear nfromGoogle?', buttoask how theharves tersofBigDatamight changethe meaningoflearning, andwhatne wpossibilitiesand newlimitationsmaycome withthesesystems of knowing.

2.Claims toobjecti vityandaccurac yaremisleading

'Numbers,numbers,number s',writesLatour(2009).'Sociology hasbeen obsessedbythe goalofbecoming aquantitativescience'. Sociologyhasne ver reachedthisgoal,inLatour' sview ,becauseof whereit drawstheline betweenwhatisandis notquantifiable knowledgein thesocialdomain.

666INFORMATION,COMMUNICATION&S OCIETY

Downloaded by [108.20.246.51] at 12:56 09 June 2012 BigDataoffer sthehumanistic disciplinesanewway toclaim thestatusof quantitativescienceandobjectiv emethod. Itmakes manymoresocialspaces quantifiable.Inreality,w orkingwithBig Dataisstillsubjective ,andwhatit quantifiesdoesnot necessarilyha vea closerclaimonobjective truth-particu- larlywhenconsidering messagesfromsocial mediasites.Butthereremainsa mistakenbeliefthatqualitative researchers areinthe businessofinterpreting storiesandquantitative researchers areinthebusinessofproducing facts.In thisway ,BigDatarisksre-inscribingestablished divisionsinthe longr unning debatesaboutscientific methodandthe legitimacy ofsocialscience andhumanis- ticinquiry. Thenotion ofobjectivityhasb eenace ntralquestionforthephilosophyof scienceandearly debatesaboutthe scientificmethod(Durkheim 1895). Claimsto objectivitysuggestan adherencetothe sphereofobjects, tothings asthey existinandforthemselves. Subjectivity,on theotherhand, isview ed withsuspicion, coloredasit iswithvarious formsof individualandsocial con- ditioning.Thescientificmethodattempts toremov eitselffrom thesubjective domainthroughthe applicationofa dispassionateprocesswhereb yhy potheses areproposedand tested,eventual lyresultingin improvements inknowledge. Nonetheless,claims toobjectivityare necessarilymade bysubjects andare basedon subjectiveobser vationsandchoices. Allresearcher sareinterpretersofdata.AsGitelman (2011)obser ves,data needtobe imaginedas datainthe firstinstance,andthisprocess ofthe imagin- ationofdata entailsaninter pretativebase:'e verydisciplineand disciplinaryinsti- tutionhasits ownnor msandstandards fortheimaginationofdata'. As computationalscientists have startedengaginginactsofsocialscience ,thereis atendency toclaimtheirworkas thebusinessof factsand notinterpretation. Amo delmaybemathemat icallyso und ,anexperimentmayseemvalid,butas soonasa researcher seekstounder standwhatitmeans,theprocess ofinterpret- ationhasbegun. Thisisnot tosay thatallinter pretationsare createdequal,but ratherthatnot allnumbers areneutral. Thedesigndecisions thatdetermine whatwillbe measuredalsostem from interpretation.Forexample,in thecaseofsocialmediadata, thereis a'data cleaning'process:making decisionsabout whatattributes andvar iableswill be counted,andwhic hwillbe ignored.Thisprocessisinherentlysubjectiv e. As

Bollierexplains,

Asalarge massofra winformation, BigDatais notself-explanatory. Andyet thespecific methodologiesfor interpretingthedataareopen toallsor tsof philosophicaldebate. Canthedatarepresentan'objectiv etruth' oris any interpretationnecessarilybiasedb ysomesubjectivefilterorthe way that datais'cleaned?'. (2010,p. 13)

CRITICALQUESTIONSFOR BIGDATA667

Downloaded by [108.20.246.51] at 12:56 09 June 2012 Inadditionto thisquestion,there istheissue ofdataer rors. Largedatasets from Internetsourcesareoftenunreliab le,proneto outagesandlosses, andthese errorsandgapsaremagnifiedwhenmultiple datasetsare usedtogether. Socialscientistsha vea longhistoryofaskingcriticalquestionsabout thecollec- tionofdata andtryingto accountforany biasesintheir data(Cain& Finch1981; Clifford& Marcus1986). Thisrequiresunderstandingtheproper tiesandlimits ofadata set,regardlessof itssize. Adataset may have many millionsofpiecesof data,butthis doesnot meanitis randomorrepresentative. Tomak estatistical claimsabouta dataset,w eneedto knowwhere datais comingfrom;itissimi- larlyimportanttokno wandaccountforthew eaknessesinthat data.Further- more,researchers mustbeabletoaccountforthe biasesintheir interpretationofthedata.T odoso requiresrecognizingthat one's identity andperspectiv einformsone'sanalysis(Behar& Gordon1996). Toooften,BigDataenabl esthepractice ofapophenia:seeing patterns where noneactually exist,simplybecauseenormous quantitiesofdata canofferconnec- tionsthatradiate inall directions.In onenotable example,Leinw eber(2007) demonstratedthatdata miningtechniques couldshow astrong butspuriouscor- relationbetween thechangesinthe S&P500stoc kindexandbutterproduction inBangladesh. Interpretationisatthecenter ofdataanal ysis.Regardless ofthe sizeofa data,it issubjectto limitationandbias .Wi thoutthosebiases andlimitations beingunderstood andoutlined,misinterpretationisthe result.Dataanal ysisis mosteffective whenresearcherstakeaccount ofthecomplex methodological processesthatunderlie theanalysis ofthatdata.

3.Biggerdata arenot always betterdata

Socialscientistsha velongargued thatwhatmakestheirw orkr igorousisrooted intheirsystematic approach todatacollection andanalysis(McCloskey 1985). Ethnographersfocusonreflexivelyaccountingfor biasintheir interpretations . Experimentalistscontrolandstandardize thedesignof theirexperiment. Surveyresearchersdrilldown onsamplingmechanismsandquestionbias.Quan- titativeresearchers weighupstatisticalsignificance.Thesearebut afewofthe waysinwhichsocial scientiststryto assessthevalidityofeac hother' swork. JustbecauseBig Datapresentsus withlargequantities ofdatadoes notmean thatmethodological issuesarenolongerrelevant. Understanding sample,for example,ismoreimpor tantnow thanever. Twitterprovidesanexample inthecontextofastatistical analysis.Because it iseasyto obtain-or scrape-T witterdata,sc holarsha ve usedTwitter to examineawide variety ofpatterns (e.g.moodrhythms(Golder &Mac y

2011),mediae ventengagement (Shammaetal.2010),political uprisings

(Lotanetal .2011),and conversational interactions(Wu etal.2011)).While

668INFORMATION,COMMUNICATION& SOCIETY

Downloaded by [108.20.246.51] at 12:56 09 June 2012 manyscholarsare conscientiousaboutdiscussingthelimitationsofTwitterdata intheirpubl ications,the publicdiscoursearoundsuc hresearch tendstofocuson theraw numberoftweetsav ailable. Evenne wscoverageof scholarshiptendsto focusonho wm anymillionsof'people' werestudied(Wang2011). Twitterdoesnotrepresent'all people',andit isaner rorto assume'people' and'Twitter users'aresynonymous: theyareavery particularsub-set. Neitheris thepopulationusing Twitter representativeofthe globalpopulation.Norcanwe assumethataccounts andusers areequivalent. Someusers havemultiple accounts,whilesome accountsare usedbymultiple people.Somepeople neverestablishan account,andsimplyaccessT witterviathe web. Someaccounts are'bots'that produceautomatedcontent withoutdirectly involving aper son. Furthermore,thenotionofan'active' accountisprob lematic.While someusers postcontentfrequently throughTwitter ,others participateas 'listeners'(Craw- ford2009,p .532).T witterInc.hasrevealed that40 percentofactiveusers sign injustto listen(Twitter 2011).Thev erymeaningsof 'user'and'participation' and'active' needtobecriticall yexamined. BigDataand wholedataare alsonotthe same.Wi thouttaking intoaccount thesampleof adata set,thesize ofthedata setismeaningless .For example,a researchermayseek tounderstandthetopicalfrequenc yoftw eets,yet ifTwitter removesalltweetsthatcontain problematicw ordsorcontent-such asrefer- encestopor nography orspam-fromthestream,thetopicalfrequency wouldbeinaccurate.Regardless ofthe numberoftweets,itis notarepresenta- tivesampleasthedata isske wedfrom thebeg inning. Itisalso hardtounder standthesample whenthesource isuncer tain. TwitterInc.makesa fractionof itsmaterialavailabl etothe publicthrough its APIs. 2 The'firehose'theoreticall ycontains allpublictweetse verposted and explicitlyexcludesanytweet thatauser choseto makeprivate or'protected'. Yet,somepublicl yaccessible tweetsarealsomissingfromthefirehose. Although ahandfulof companieshav eaccessto thefirehose, veryfewresearchersha vethis levelofaccess.M osteitherha veaccesstoa'gardenhose' (roughly10percentof publictweets),a'spr itzer'(roughlyonepercentof publictw eets),orhaveused 'white-listed'accountswhere theycould usetheAPIs togetaccesstodifferent subsetsofcontent fromthe publicstream. 3

Itisnot clearwhattw eetsare

includedinthese differentdata streamsorsampling themrepresents. Itcould bethatthe APIpullsa randomsampleof tweetsor thatitpulls thefir stfew thou- sandtweets perhourorthatitonl ypullstw eetsfrom aparticular segmentofthe networkgraph.Without knowing,itisdifficult forresearchersto makeclaims aboutthequality ofthedata thatthey areanalyzing .Are thedatarepresentative ofalltw eets?No, becausetheyexcludetweets fromprotectedaccounts. 4

Butare

thedatarepresentative ofallpub lictweets? Perhaps, butnotnecessar ily. Twitterhasbecomea popularsourcefor miningBigData, butworking with Twitterdatahasserious methodologicalc hallengesthatare rarelyaddressedby thosewhoembrace it.When researchers approacha dataset,the yneedto

CRITICALQUESTIONSFOR BIGDATA669

Downloaded by [108.20.246.51] at 12:56 09 June 2012 understand-andpublicly accountfor- notonly thelimitsofthedata set,but alsothelimits ofwhich questionsthey canaskof adatasetandwhat interpret- ationsareappropr iate. Thisisespeciall ytr uewhenresearchers combinemultiplelargedatasets. Thisdoesnot meanthatcombining datadoesnot offervaluab leinsights - studieslikethose byAcquistiand Gross(2009)are powerful, asthe yrev eal howpublicdatabasescan becombinedtoproduceserious privacyviolations, suchasrevealing anindividual's SocialSecuritynumber.Yet,as JesperAnder son, co-founderofopen financialdatastore FreeRisk,explains:combining datafrom multiplesourcescreates uniquechallenges. 'Everyone ofthosesources iserror- prone...Ithinkw earejust magnifyingthatproblem[whenw ecombinemul- tipledatasets]' (Bollier2010,p .13). Finally,duringthiscomputationalturn, itisincreasinglyimportant torecog- nizethe valueof 'smalldata'.Researchinsightscan befoundat anylevel, includ- ingatv erymodest scales.Insomecases,focusingjust onasingle individualcan beextraordinarily valuable.Take, forexample,theworkofV einot(2007),who followedoneworker-a vaultinspector atahydroelectricutilitycompany -in ordertounder standtheinfor mationpracticesofablue-collar work er.Indoing thisunusualstud y,V einotreframedthedefinitionof'informationpractices'a way fromtheusual focuson early-adopter, white-collarwork ers,to spacesoutsideof theofficesand urbancontext.Her worktells astorythat couldnot bediscov ered byfarmingmillionsof FacebookorTwitteraccounts, andcontributes tothe researchfieldina significantway ,despitethe smallestpossible participant count.Thesize ofdatashould fittheresearc hquestionbeing asked; insome cases,small isbest.

4.Taken outofcontext,BigData losesitsmeaning

Becauselargedata setscanbe modeled,dataare oftenreducedto whatcanfit intoam athematicalmodel.Y et,takenoutofcontext, datalose meaningand value.Theriseofsocialnetw orksitesprompted anindustry-drivenobsession withthe'social graph'. Thousandsofresearc hershavefloc kedto Twitter and Facebookandothersocial mediatoanalyze connectionsbetween messagesand accounts,making claimsaboutsocial networks.Y et,therelations displayed throughsocialmedia arenotnecessar ilyequivalent tothesociog ramsand kinshipnetworks thatsociologistsandanthropologists have beeninvestigating sincethe1930s (Radcliffe-Brown 1940;Freeman2006). Theabilitytorepresent relationshipsbetw eenpeopleasagraph doesnotmean thatthey convey equiv- alentinformation. Historically,sociologistsandanthropologistscollecteddataabout people's relationshipsthroughsur veys, interviews,observations,andexperiments. Usingthisdata, they focusedondescr ibingpeople's'personalnetw orks'-the

670INFORMATION,COMMUNICATION&S OCIETY

Downloaded by [108.20.246.51] at 12:56 09 June 2012 setofrelationships thatindividualsde velopand maintain(Fischer 1982).These connectionswere evaluatedbasedona seriesofmeasuresdevelopedo vertime toidentifyper sonalconnections. BigDataintroducestwone wpopulartypesof socialnetworks derivedfromdatatraces: 'articulatednetworks'and'behavioral networks'. Articulatednetworksare thosethatresultfrompeoplespecifying theircon- tactsthroughtec hnicalmechanisms likeemailorcellphoneaddress books, instantmessaging buddylists,'Friends'lists onsocialnetworksites, and'Fol- lower'listsonothersocialmediagenres .Themotiv ationsthatpeople have foraddingsomeone toeac hofthese listsvary widely,buttheresult isthat theselistscan includefr iends,colleagues,acquaintances, celebrities,fr iends- of-friends,publicfigures, andinterestingstrangers. Behavioralnetworksareder ivedfromcommunicationpatterns, cellcoordi- nates,andsocial mediainteractions(Onnela etal.2007;Meiss etal.2008).These mightincludepeople whotextmessage oneanother, thosewhoare taggedin photostogetheron Facebook,people whoemailone another,andpeoplewho arephysically inthesamespace,at leastaccordingto theircell phone. Bothbehavioral andarticulatednetworksha veg reatvaluetoresearc hers, butthey arenotequivalenttoper sonalnetworks .For example,althoughcon- tested,the conceptof'tie strength'isunder stoodtoindicate theimportance ofindividualrelationships (Granovetter 1973).Whenmobile phonedata suggestthatw orkers spendmoretimewithcolleaguesthantheirspouse,this doesnotnece ssarilyimp lythatcolleaguesaremoreimportant thanspouses . Measuringtiestrengththrough frequencyor publicar ticulationisa common mistake:tiestrength-and manyof thetheories builtaround it-is asubtle reckoninginhowpeopleunderstand andvalue theirrelationshipswithother people.Notever yconnectionis equivalenttoeveryotherconnection,and neitherdoesfrequenc yofcontact indicatestrengthofrelationship.Fur ther, theabsenceof aconnectiondoes notnecessarily indicatethata relationship shouldbemade. Dataarenot generic.There isvalue toanalyzingdataabstractions, yet retainingcontextremains critical,par ticularlyfor certainlinesofinquiry . Contextis hardtointer pretatscale andeven hardertomaintainwhendata arereducedto fitintoa model.Manag ingcontextin lightof BigDatawill be anongoingc hallenge.

5.Just becauseitisaccessible doesnotmak eit ethical

In2006,a Harvard-based researchg roupstartedgatheringtheprofilesof 1,700 college-basedF acebookuserstostudy howtheirinterestsandfr iendships changedovertime(Le wisetal .2008).These supposedlyanon ymousdata werereleasedtothew orld,allowing otherresearcher sto exploreandanalyze

CRITICALQUESTIONSFOR BIGDATA671

Downloaded by [108.20.246.51] at 12:56 09 June 2012 them.What otherresearchersquickl ydiscov eredwasthatitwaspossibleto de- anonymizepartsofthe dataset:compromisingtheprivacy ofstudents,none of whomwere awaretheirdataw erebeingcollected(Zimmer2008). Thecasemade headlinesand raiseddifficultissues forscholar s:whatis the statusofso-called 'public'data onsocialmedia sites?Canitsimply beused, withoutrequesting permission?What constitutesbestethicalpracticefor researchers?Privacycampaignersalready seethisasak eybattlegroundwhere betterprivacy protectionsareneeded.Thedifficultyis thatpr ivacy breaches arehard tomake specific-is theredamagedoneatthetime? Whatabout20 yearshence?'Anydataonhuman subjectsinevitabl yraiseprivacyissues, and therealr isksofabuse ofsuchdataaredifficult toquantify'(Nature ,citedin

Berry2011).

InstitutionalRevie wBoards(IRBs)-andotherresearchethics committees -emergedin the1970sto overseeresearc honhuman subjects. Whileunques- tionablyproblematicin implementation(Schrag2010),thegoal ofIRBsis to provideaframework foreval uatingtheethicsofaparticularline ofresearch inquiryandto makecer tainthatc hecksandbalancesareput intoplaceto protectsubjects .Practiceslike'informed consent'andprotecting theprivacyquotesdbs_dbs12.pdfusesText_18