[PDF] Big Data: Uses and Limitations



Previous PDF Next PDF







London : An evolving city

C The drawbacks of big cities This photo must have been taken quite a few years ago as the clothes of the policeman are old-fashioned and the photo is in black and white The photo may have been taken in London as we can see an urban area and there is a “bobby”: a British policeman



Big Data: Uses and Limitations

3 Definitions of Big Data (or lack thereof) • Wikipedia: “Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-



Too Big To Fail: The Pros and Cons of Breaking Up Big Banks

re the nation’s biggest banks too big? Many people think so Some econo - mists and policymakers have called for breaking up the largest banks and strictly limiting how large banks can become 1 U S banks, on average, have grown increasingly larger over time, while the total number of banks has declined As the chart



THE PROS AND CONS OF USING BIG DATA IN AUDITING: A SYNTHESIS

THE PROS AND CONS OF USING BIG DATA IN AUDITING: A SYNTHESIS OF THE LITERATURE Abstract With corporate investment in Big Data of $34 billion in 2013 growing to $232 billion through 2016 (Gartner 2012), the Big 4 professional service firms are aiming to be at the forefront of Big Data implementations



YouTube TV has some nifty features - and some big drawbacks

some big drawbacks 5 April 2017, by Anick Jesdanun In this Tuesday, Feb 28, 2017, file photo, YouTube CEO Susan Wojicki speaks during the introduction of YouTube TV at YouTube Space LA in Los



Multicenter assessment of the Brain Injury Guidelines and a

The BIG do have some drawbacks The guidelines have only been validated at the institution at which they were developed Although both prospective and retrospective analyses have been completed, further independent validation is required before the guidelines can be widely implemented 12 13 The BIG are often vague in defining specific aspects of



The Impact of the Railroad - California State University

annals of California history as the Big Four Judah also was instrumental in securing government aid for the construction of the railroad Building the railroad was a monumental undertaking The greatest challenge was laying rails through the heart of the Sierra Nevada After six years of toil, the railroad was completed with



Using Workplace Assessments: Pros and Cons

Many personality tests, for example, are based on the ˝The Big Five ˛ theory of personality that was developed almost 100 years ago, originally for use in the armed forces ˝Personality ˛ (at least in the assessment world) can be thought of as an individual ˇs typical or preferred way of behaving, thinking and feeling Big 5 measures



VLF & ULF Signals, Receivers & Antennas - thenackcom

• 50Hz interference is a big problem (audible) • Filter out broadcast station (overload & detection) – easy to achieve – Can use filters in a purpose-designed receiver, get about 40dB attenuation at 50Hz and 20kHz, or – Record signals and process on a PC with Spectrum Lab , Spectran , or

[PDF] global city london

[PDF] london global city pdf

[PDF] global cities spaces and exchanges

[PDF] la performance globale de l'entreprise pdf

[PDF] performance globale définition

[PDF] mesure de la performance globale des entreprises

[PDF] reynaud 2003 la performance globale

[PDF] qu'est-ce que la performance globale

[PDF] bourguignon 1995

[PDF] concept de performance globale

[PDF] performance globale de l'entreprise définition

[PDF] repérage sur une sphère

[PDF] couple diaphragme vitesse et sensibilité iso

[PDF] ouverture vitesse et iso le triangle d'exposition

[PDF] tableau ouverture vitesse a imprimer

Big Data: Uses and Limitations

Nathaniel Schenker

Associate Director for Research and Methodology

National Center for Health Statistics

Centers for Disease Control and Prevention

Presentation for discussion at the meeting of the

NCHS Board of Scientific Counselors

September 19, 2013

2

CONTENTS

Definitions of Big Data (or lack thereof)

Advantages and disadvantages of Big Data

Skills needed with Big Data

Current and potential uses of Big Data (not including administrative data) in the Federal Statistical System

Robert Groves's COPAFS presentation

Some recent work at NCHS on blending data

Lessons learned from work at NCHS on blending data

Cukier and Mayer-Schoenberger (2013)

Some Questions for Discussion

3

Definitions of Big Data (or lack thereof)

Wikipedia: "Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on hand database management tools or traditional data processing applications."

Horrigan (2013): "I view Big Data as nonsampled data, characterized by the creation of databases from electronic

sources whose primary purpose is something other than statistical inference." Rodriguez (2012): "For years, statisticians have been working with large volumes of data in fields as diverse as astronomy, bioinformatics, and data mining. Big Data is different because it is generated on a massive scale by countless online interactions among people, transactions between people and systems, and sensor-enabled machinery." 4

Arbesman (2013, "Five myths about big data")

o Myth 1: "'Big data' has a clear definition." 5

Advantages and disadvantages of Big Data

+ Big + Timely + Predictive (sometimes) + Cheap (?) - Unknown population representation - Issues of data quality - Typically not very multivariate (at the person level) - Privacy and confidentiality issues - Difficult to assess accuracy and uncertainty 6

Skills needed with Big Data

(Rodriguez 2012)

Management and processing of distributed data

New tools for data analysis and visualization

o

E.g., unstructured text data

7 Current and potential uses of Big Data (not including administrative data) in the Federal Statistical System

Current

o

Bureau of Labor Statistics (Horrigan 2013)

Web scraping to obtain prices for various goods and services Use of retail scanner data in research on distributions of items within expenditure classes 8

Potential

o NCHS:

EHRs; pilot tests in National Health Care Surveys

(http://www.cdc.gov/nchs/dhcs.htm) o

Bureau of Labor Statistics (Horrigan 2013)

Replacement of traditional data collection from

establishments by corporate data from parent company o

Bureau of Economic Analysis (Lohr 2013)

Use of data from Intuit on small businesses for national income accounting o

Census Bureau (Capps and Wright 2013)

Auxiliary data for stratification, improving survey estimates, compensating for nonresponse, small-area estimation, ...

Helping to check estimates

More timely, preliminary estimates (to be revised using survey data) 9

Robert Groves's COPAFS presentation

(COPAFS 2013) Two extreme approaches one could take with respect to Big Data

1. Replace existing measures with big data indicators

o Tools becoming available, but there are issues of:

Quality; e.g., coverage error

Inability to examine subgroups due to lack of

multivariate nature

2. Assume that the present system will endure and win out

over Big Data o

Perhaps optimal now, but what happens when big data become so prevalent and are used so widely in business that they cannot be ignored?

Groves's view: Traditional survey data (although challenged) are not going away, and

Big Data are too powerful to ignore

o Only choice: pursue path of blending Big Data and survey data 10

Some recent work at NCHS on blending data

Combining information from complementary surveys (the National Health Interview Survey and the National Nursing Home Survey) to extend coverage (Schenker, Gentleman,

Rose, Hing, and Shimizu 2002)

o Big Data analogue: Combining Big Data with data from a smaller survey to adjust for non-coverage in Big Data Combining information from the Behavioral Risk Factor Surveillance System and the National Health Interview Survey via Bayesian modeling for small-area estimation (http://sae.cancer.gov) o Big Data analogue: Combining Big Data with data from a smaller survey to adjust for nonresponse and non -coverage in Big Data via modeling 11 Bridging the transition from single-race reporting to multiple- race reporting in the census using information from the

National Health Inte

rview Survey o

Big data analogue: Adjusting for a change in reporting systems for Big Data using information from a smaller

survey with data collected under both reporting systems Enhancing the scientific value of surveys by linking their data with administrative and other data o Big Data analogue: Linking survey data with Big Data Probably more feasible at area level than at person level New project, joint with the Census Bureau: Using information from the American Community Survey to create predictors in small -area estimation for outcomes measured in NCHS surveys o Big Data analogue: Using local summaries of Big Data as predictors in small-area estimation 12

Lessons learned

from work at NCHS on blending data Can yield gains, especially when data systems being blended have complementary strengths

Comparability is key

Methods can become "obsolete" quickly

Need to use care in dealing with different sample designs

Try to find good predictors

Sharing information among multiple organizations can require a lot of work (and cost?) Important to safeguard privacy and confidentiality Important to educate secondary users on methods used and limitations of results 13

Cukier and Mayer-Schoenberger (2013)

With Big Data, "three profound changes in how we approach data o "... collect and use a lot of data rather than settle for small amounts or samples ..." o "... shed our preference for highly curated and pristine data and instead accept messiness: in an increasing number of situations, a bit of inaccuracy can be tolerated, because the benefits of using vastly more data of variable quality outweigh the costs of using smaller amounts of very exact data o "... in many instances, we will need to give up our quest to discover the cause of things, in return for accepting correlations. ... "Big data helps answer what, not why, and often that's good enough." [My view: Need to take the above statements with a big grain of salt. But Big Data could indeed be very useful in combination with survey data, e.g., as predictors for small-area estimation.] 14

Some Questions for Discussion

1. What Big Data sources could provide information useful to

NCHS? a. For enhancing the information we provide b. For decreasing our costs

2. How could NCHS use the sources identified in Question 1 to

improve its work?

3. How could we assess the quality of potential sources of Big

Data?

4. In what situations would NCHS be willing to sacrifice accuracy and data quality to obtain much more data?

5. Should we form a Big Data working group?

a. Within NCHS? Interagency? Elsewhere? 15

References

Arbesman, S. (2013), "Five Myths about Big Data," Washington Post, August

16, 2013. (

Capps, C., and Wright, T. (2013), "Toward a Vision: Official Statistics and Big

Data," Amstat News, August 2013, 9-13.

COPAFS (2013), Minutes of the March 1, 2013 COPAFS quarterly meeting. Cukier, K., and Mayer-Schoenberger, V. (2013), "The Rise of Big Data," Foreign

Affairs, May/Juen 2013, 28-40.

Horrigan, M.W. (2013), "Big Data: A Perspective from the BLS,"

Amstat News,

January 2013, 25-27.

Lohr, S. (2013), "More Data Can Mean Less Guessing About the Economy,"

New York Times, September 7, 2013.

Rodriguez, R.N. (2012), "Big Data and Better Data," Amstat News, June 2012, 3-4. Schenker, N., Gentleman, J.F., Rose, D., Hing, E., and Shimizu, I.M. (2002), "Combining Estimates from Complementary Surveys: A Case Study Using Prevalence Estimates from National Health Surveys of Households and

Nursing Homes,"

Public Health Reports, 117, 393-407.

quotesdbs_dbs5.pdfusesText_9