Data Processing Flow - Measuring Broadband America Data April 2012

Replicating the Test Results for the July 2012 Report

The process flow below describes how the raw collected data was processed for the production of the Measuring Broadband America 2012 Report. Researchers and developers interested in replicating or extending the results of the Report are encouraged to review and comment on the below.

Raw Data:	Raw data for the chosen period is collected from the measurement database. The ISPs and products that panelists were on are exported to a ‘unit profile’ file, and those that changed during the period are flagged
Clean Data:	Data is cleaned. This includes removing measurements when a user changed ISP/product during the period. Anomalies and significant outliers are also removed at this point. See the data cleansing document validated-data-cleansing-april-2012.docx for more information
Per-Unit Results(CSV):	Per-unit results are generated for each metric. It is at this point that the 7-11pm averages are computed and the trimmed mean calculated for each metric. The SQL scripts used here are contained in sql-scripts-processing--apr-2012.tar.gz.
SPSS Processing:	The per-unit CSV data is processed by SPSS scripts (available at https://s3.amazonaws.com/fcc-april-data/SPSS-scripts-20120718.zip), coupled with the unit profile data. This process removes ISPs/products with low sample sizes and computes statistical averages for the remainder that can be used in the report.
Excel Tables & Charts:	Summary data tables and charts in Excel are produced from the statistical averages. These are used directly in the report.

Download

data-processing-flow-apr-2012.docx
alternate mirror

Bureau/Office:

Engineering & Technology