Statistic Consulting

 

Rachel M. MacNair, Ph.D. 

I am currently winding down this service. I will finish the clients I have, but for the most part I’m not accepting new clients.

When I started, I was one of the few people doing this work; now it’s very common, and people can generally find someone closer. It’s been a real pleasure to serve. 

======================================
The PBS program NOVA has an episode entitled:

Prediction by the Numbers

This 53-minute program covers some of the basics – p-value and sampling – along with the history of how the ideas came to be.

=======================================

Let me explain . . .

Sample Size

Why does sample size makes a difference?  Suppose you toss a coin ten times, and it comes out 7 heads, 3 tails. Then suppose you toss it 100 times, and it comes up 52 heads, 48 tails. Then you toss it 1,000 times, and it comes up 502 heads, 498 tails. The first time, it was 70% heads, the second 52% heads, and the third time the expected 50% when rounded. Yet in all cases, it was only two off of the 50-50!  Being only two off shows up much more at the smaller number. If you had 70% heads after throwing it 100 or 1,000 times, you’d figure the coin must be loaded. You have enough cases to say that it’s not coming out due to mere chance with an unloaded coin. But you can’t say that with just ten throws — after all, it’s only two off.

 

Standard Deviation

If you take the average of these numbers:

4, 6, 6, 4, 5, 5, 3, 7, 5, 5, 6, 4, 6, 4

you will see that the mean is 5. (I made it simple, with means being 5 in each pair).

But if you have instead these scores:

1, 9, 9, 1, 7, 3, 2, 8, 7, 3, 8, 2

you can see that you still have a mean of 5. But the numbers in the first set were pretty close to 5, whereas the numbers in the second set are all over the place. The mean average is the same, but the standard deviation for that second group is going to be much higher than for the first group.

The standard deviation is the measure that tells you about how much variability there is this way. There are times when the difference being high or low might mean something to what you’re trying to study.

 

Correlations, Causation, and Careful Measurement

 

One of the basic points of statistical reasoning is that if you find two things are correlated with each other, you still don’t know what’s causing what — correlation is not causation. People get into trouble all the time by mixing themselves up on this point.

Remember first the difference between a positive and a negative correlation. When it’s positive, the numbers tend to go up and down together — say, the outdoor temperature and the consumption of iced drinks. When the correlation is negative, then when one goes up, the other tends to go down, and vice-versa — the outdoor temperature and the consumption of hot cocoa.

Now, let’s say we know that there’s a positive correlation between poverty and crime. The first thing we have to figure out is how we measure poverty and crime.

Poverty would generally be measured by income level. We’d actually be measuring income, and seeing that the lower income is associated with higher crime.

This has problems. You could have someone who inherited a big house with plenty of garden space, close enough to places to walk with no car needed. By contrast, someone with the same income who must rent an apartment, scrounge up food and suffer for lack of a car is considerably poorer.

Nevertheless, it would be very difficult to take account of this when making the measurement. Most measurements have these kinds of problems. When measuring large numbers of people, this is probably the best you can do. In social
science, you’ll never be perfect.

So what about measuring crime? Do you take arrests, convictions, or police reports from an area? Each one will have a different impact on your final measure. If poor people get more false arrests than affluent people, then measuring arrests will get you a higher correlation than the reality of crime. Yet how do you know?

Let’s say we’ll measure crime by convictions. Wanting to avoid mere traffic violations, we’ll make it convictions for murder, assault, shoplifting, burglary, and armed robbery.

Now, we find a positive correlation between this measure of poverty and this measure of crime. Did the poverty cause the crime? Did A cause B? There’s some sense to that, in that people who are more financially stretched might be more
likely to take the risks that go with, say, shoplifting. People who can afford the price are more likely to pay it rather than risk arrest, and people so rich that the price is pocket change are even more likely to go ahead and pay for it. This isn’t
always the case. Famous movie stars have been caught shoplifting. But of course this is only a correlation we’re talking about, and we wouldn’t expect it to be 1 to 1. Why it might tend to be true is not puzzling.

Yet the causation may also be the other way around. Maybe crime causes poverty – B causes A. Areas that are over-ridden with crime are less likely to attract stores and businesses that would employ people. Also, stealing a pair of shoes from someone who only has one pair will mean more poverty than stealing a pair from someone who has ten pairs. Thus, crime is causing poverty.

Another possibility is that something else is causing both. This could then make them end up being correlated. In this case, a lack of education, say, may cause both poverty and crime.

Which one is it? Actually, there’s no reason to pick just one. They could all be true at the same time. Poverty and crime cause each other in a feedback loop, and other factors cause them both as well. Reality is complicated. We have
complicated statistics to take all these into account at once, which is why we often go well beyond simple correlations in studying complicated realities.

But there’s an important point here concerning the measure of crime — a bias. While murder and assault go across class lines, the remaining three (shoplifting, burglary, and armed robbery) were crimes that are simply more likely to be done by poor people.

Suppose we replace shoplifting with tax evasion, another way of getting something without paying for it. We replace burglary with embezzlement. Then we replace armed robbery with selling vicious and illegal weapons to brutal dictators, or being an owner of factories who ignores safety regulations — both things that do far more damage and injury than individual instances of armed robbery.

 

Suddenly, we would find that the correlation is reversed. There would be a negative correlation between poverty and crime, or a positive correlation between affluence and crime. That’s because we selected the kinds of crime that one has to be affluent to even be able to do.

 

This is important, because while many people think that proposing a link between poverty and crime will move us to have another reason to get rid of poverty, other people will use the information to have a prejudice against poor people. Prejudice is bad enough by itself, but all the worse when science is used to back it up, and the science is poorly done.

This will also be true, of course, in any kind of study. Whatever the conclusions, it is important to pay attention to how you measured what you said you were measuring. You must watch your reasoning about the direction of causation. Think of alternative ways of explaining your results. Look at how different interpretations could apply.

It’s also important to list the limitations of your study well. This is not a sign of weakness. To the contrary, every study has limitations. The best scholar is the one that can articulate what they are.

Quotations

“There are only two kinds of data. The first kind is Terrible Data: data that are ambiguous, potentially misleading, incomplete, and imprecise. The second kind is No Data. Unfortunately, there is no third kind, anywhere in the world.”

Funder, D. C. (1997). The personality puzzle. New York: W.W. Norton, pp. 32-33

^^^^^

“Statistics is a fascinating study, and the statistician, after mastering a new method for refining data, very naturally is eager to see it used. In view of the crudity which of necessity characterizes most of the instruments used to collect data, statistical refinement to the fourth decimal place may be like putting a razor edge on a hoe, or calculating to the exact second the ending of the Mesozoic era of geological time. It is also possible for psychological sense to be lost in a welter of statistical manipulation. There is no area of psychology where common sense is more needful.”

Clark, W. H. (1958). The Psychology of Religion. New York: MacMillan, p. 44

^^^^^

“All the statistics in the world won’t help you if you asked the wrong question in the first place.”

John Tukey

^^^^^

“If you torture data long enough, it will confess.”

Nobel Prize-winning economist Ronald Coase
In Gordon Tullock, “A Comment on Daniel Klein’s ‘A Plea to Economists Who Favor Liberty'”, Eastern Economic Journal, Spring 2001

=======================================

To contact Rachel M. MacNair, Ph.D.:

Telephone: (816) 753 – 2057               

E-mail:  rachel_macnair @ yahoo.com