Well, cable232 was just plain wrong, and I think you're being a bit too generous to him. He said one can't use math to predict the behavior of a single individual. But he failed to realize precisely what you mentioned: that one can have multiple data points (observations) on a single individual, thus producing a statistically reliable sample (enough degrees of freedom). So I agree mostly with your comments (but see below), but not his. Quite a few individual criminals have been identified by their pattern of behavior.
However, when homing on a perpetrator, classical statistics such as regression modeling are not always necessary; therefore large data samples are not as crucial. In these cases, we are looking for patterns of probabilities (e.g., Bayesian statistics), distances, etc.; not necessarily a single predictive equation such as a regression model.
But when using something like least-squares regression, the key to successful predictive analytics is to have a high enough ratio of observations to variables: 10:1 at a bare minimum; 20:1 is better; and at least 50:1 is even better.
Note that if the ratio of data points to variables is only 1:1, then any data values, no matter what they are, will always give a "perfect" regression-model prediction, but it will be wrong and totally useless. I.e., if I have 10 people measured on 10 variables, then a regression model will always give a perfect result (R-squared = 1.00). But it will be totally bogus. I could use variables such as street address, height, social security number, etc., to "predict" or "estimate" IQ. And if I have 10 such randomly selected variables on any 10 people, the regression will always return a "perfect" but meaningless result.
But in that case, the problem is *not* a lack of data points per se, but rather too low a *ratio* of data points to variables.
reply
share