On Friday, I introduced the computation part of our data analysis project. I was very excited about this and created an example using Google spreadsheets. Even though I think another tool would be more powerful, I stuck with spreadsheets since most of the students are completely unfamiliar with anything else.

What we want the students to do is to take a question from the survey we conducted and break it down not just by how many people answered it a certain way, but also by a piece of demographic data. So, they might look at the question of whether people expect to have children and see whether more women or men expect to have children. To to that, you need to make a statement like “if ‘yes’ [to children question] and ‘male'”. And you have to do that for all combinations. I walked through my example in the class and eyes glazed over. Admittedly, I went fairly quickly, but these are mostly seniors, and I would hope they would have some experience with formulas in Excel or Google spreadsheets. But no. Nothing wrong with that, really, but something I want to correct going forward. I do know that one of our math teachers teaches some simple formulas during a single class period, but it’s out of context and they never–as far as I know–return to it.

In order for our students to complete this project, they *have *to use formulas. Well, they could do it by hand, but that would be so time consuming and crazy. So I’m thinking I need to run a workshop for the teachers on ways they can incorporate this skill and I need to find out more about where it could be used.

I was talking about this with Mr. Geeky, and he pointed out that most people are not good at this kind of analysis. They don’t even think to ask questions that drill down into the data, questions like, “What is the income breakdown? Or gender breakdown? Or racial breakdown?” They don’t know the difference between mean and median and how important looking at both might be. I often use the classic example of a bar where the average (mean) income of the customers is $40k. Bill Gates walks in and now the average income is over $1 million. Now the average income has become meaningless as something that tells you anything about the customers in the bar. One thing that computing offers is ways to slice data quickly so that you can start to see questions to ask and you can start trying to answer them with the data. This makes me even more convinced that this assignment is an important one. I’m looking forward to its outcome.

My GRAD students in a data driven social science often come in with no excel skills. When I first started here, they knew excel, but it seems like those skills are no longer used in high school or college, even though our private employers demand them.

I totally agree with Mr. Geeky that most people don’t understand basic statistical analysis like this. This is a totally important assignment — I’m glad you’re doing it!

I’ve been a computer scientist for 4 decades and a bioinformatician for 1.8 of them. My eye would glaze over on being asked to use a spreadsheet to do that sort of data analysis—it is a terrible tool for the purpose. It would be better to help them learn to use tools that actually are suitable for data analysis (R, Python programming, Matlab, …). Almost anything is better than Excel.

I love that you are doing this with the students. It is the kind of analysis I think everyone should be able to do and that not many can. But, I do think it’s probably a stretch for the kids. You might need to step back to baby steps to get them started.

Regarding spreadsheets as a tool — yes, it’s terrible, but it’s what people have access to. And it’s flexible enough that it can be used to answer a number of similar questions and a number of data sets, say, from the government, and from the bank.

Ditto on the access. We do a fantastic job teaching our students SPSS/STATA, but not all employers use them or are willing to buy licenses. Everyone has Excel.

It is weird your students don’t know mean/median/mode. I had the odd experience this year of (reviewing) covering that in my stats class the same time my 3rd grader was getting them in Everyday Math.

Yes, I would love to teach data using a better tool, but this is a two-week project in a humanities class. It at least gives them a taste of the kind of data analysis they *can* do. And we’re talking high school here. You don’t throw R at a group of high school kids. My choice would be Python and I do do that in my Computer Science class. If CS were required, then I’d probably go that route for this class, too. In fact, I struggled to find the right formulas to do this simple analysis. One or two lines of code in Python is all that’s needed, but in Excel, it’s a multi-step, multi-formula process.

I think it’s the context that makes learning a concept important. When you learn something like statistics without a context, you often forget it, thus the need to review it when it comes up again. The Pythagorean theorem meant little to me until game programming for collision detection.

In another context, it’s like teaching resume writing before you’re even thinking about applying for a job. The rules and strategies don’t make sense until you need them. We do a generally terrible job in education of showing how things apply, of giving students projects that make them use what they know in meaningful ways. I include myself in that. I’m constantly trying, though.

I agree that R is not a good choice for high school—I didn’t care for it much even in grad courses.

“One or two lines of code in Python is all that’s needed, but in Excel, it’s a multi-step, multi-formula process.” That, in a nutshell, is why I don’t use Excel.

I’ve had to teach resume writing as part of a tech writing course that I taught for about 14 years. I agree that students (even college students) are pretty clueless about it until they are actually trying to get a job. I’ve been working pretty hard at trying to create courses that are focussed on students learning to

dothings, rather than onpreparingthem. (You can see this philosophy in the design of my applied circuits class, which I’ve done over 200 blog posts for:http://gasstationwithoutpumps.wordpress.com/circuits-course-table-of-contents/)