05-436 / 05-836 / 08-534 / 08-734 Usable Privacy and Security
Print your homework out and submit it in person at the start of class (3:00pm) on Thursday, February 26th. Homework will not be accepted after 3:00pm on that day.
- Part 1 (50 points): A handful of papers you have read so far in this class have used statistics to support their arguments. In particular, for this problem, you should consider the following five papers: Woodruff et al. (January 22nd); Shay et al. (January 29th); Harbach et al. (February 12th); Schechter et al. (February 12th); and Bravo-Lillo et al. (February 17th).
Skim back through these five papers and note what different statistical tests they use. Then, create and submit a table that shows how these five studies used different statistical tests. Use the following format for your table, where each row represents an analysis you found in one of the papers:
- Column 1: Which paper? (e.g., Woodruff et al.)
- Column 2: What is the name of the statistical test they used? (e.g., ANOVA)
- Column 3: What is the question they're trying to investigate using statistics? (e.g., Is privacy concern correlated with age?)
- Column 4: What format is their data in? (e.g., survey responses rating agreement on a 7-point Likert scale. They binned these responses into two groups: "agree" and "not agree")
- Column 5: How did they report the results of the test in the paper? (e.g., [χ2=168.07, p<.001]
- Column 6: (If applicable) Do they make any interesting comments about why they chose this test? (e.g., The data are not normally distributed.) Similarly, note if they performed any kind of correction for multiple testing.
Group the rows of your table based on the type of analysis the statistics are being used to support. It may be easiest to split your table into multiple smaller tables. For example, if one paper uses statistical test Foo to compare Likert-scale survey questions and another paper uses statistical test Foo or statistical test Bar to compare survey questions where the structure of the data or comparison seems similar, group those together.
Note that even if a particular paper uses a particular test multiple times, you only need to report it once in your table. For example, if Shay et al. use the XYZ test 15 times and the ABC test 32 times, you only need to report once in your table that Shay et al. use the XYZ test (pick representative answers for each column of your table) and once that Shay et al. use the ABC test.
- Part 2 (50 points): We have provided you the following dataset about deaths on the Titanic. The first link is to the raw data. The second explains the dataset; its appendix explains the column values.
Using whatever tool you prefer, conduct two different (appropriate) statistical tests to analyze this data.
- The raw data
- Robert J. MacG. Dawson. The "Unusual Episode" Data Revisited. Journal of Statistics Education v.3, n.3 (1995)
You should turn in a single paragraph containing all of the following information:
- What were you analyzing?
- What statistical test did you use, and how did you choose it?
- What software did you use to analyze the data?
- Give the result of the test (including p value).
- Briefly interpret your result.
If you don't have prior experience with any statistical software, we highly recommend you go through Blase's R tutorial, which includes lots of sample code.
- Part 3 (9-unit students should not do this part. 12-unit students will receive between 0 and 30 points for this part): Write a 3-7 sentence summary and short "highlight" for one optional reading assigned for each of the following classes (2 optional readings total): February 24th and February 26th.