I am constantly amused with the truther misuse of statistics, which Pat has commented on before. I took stats in grad school last year, and I don't remember it being done this way. In this example James Redford on Infowars.com looks at the 6 cases of people with Arab names similar to those of the 19 hijiackers having connections to US military bases, and calculates the odds of these occurences being mistaken indentities of 1 in 21.7 billion. His misuse of statistics, however, is cringe inducing.
First his methodology. I am not going to repeat everything he did, so click on the link to read it if you want all the details, but to summarize he finds a list of the frequencies of Arab surnames, and comes up with a rough estimate of number of Arabs with the 4 surnames in question and comes up with 570,000 each for the names Ghamdi, Nami, or Atta (although he doesn't actually know because they aren't listed) and 1,860,000 for the surname Omar (which is listed). He then calculates the percentage of Arabs with these surnames out of the total, which he estimates at 300,000,000 Arabs.
Well, this of course is a very rough estimate, and probably overstating the target population, since these hijackers came from countries such as Egypt and Saudi Arabia, which would be heavily represented in the ranks of foreign officers training in the US. Unlike, say, Iraq or Syria. But this is the least of his problems.
What happens next though, gets really hilarious:
For much of this, I am even struggling to figure out was his logic is. For starters, he is ignoring what his base population is. There were not only 6 hijackers, there were 19. Thirteen of the 19 hijackers names were not found in the population he is comparing to, but he is ignoring this. And out of the 6 names he picks, only 3 of them were actually found among the names of the military, the other 3 simply had addresses on their drivers licenses which they themselves picked to be military related. He cannot compare them to the estimated population of military students, since they were never found in this population to begin with.
570000 (number of Arabs in the world with a surname of Ghamdi, Nami, or Atta) / 300000000 (total number of Arabs in the world) = 0.0019 (0.19% of the population of Arabs in the world with a surname of Ghamdi, Nami, or Atta)
1860000 (number of Arabs in the world with the surname of Omar) / 300000000 (total number of Arabs in the world) = 0.0062 (0.62% of the population of Arabs in the world with a surname of Omar)
300000 (total number of Arabs trained on U.S. military bases on U.S. soil) * 0.0019^5 (0.19% of the population of Arabs in the world with a surname of Ghamdi, Nami, or Atta, raised by the power of five, for the five individuals with one of these names) * 0.0062 (0.62% of the population of Arabs in the world with a surname of Omar) = 4.60554414 E-11 1 / 4.60554414 E-11 = ~ 21712960935.8168
Regardless, he is still screwing this up. By raising the percentage of the population to the 5th power (.0062^5) he is not calculating the odds of those names occuring in the population (300,000) he is calculating the odds of those 5 names appearing (in order) in a selected population of 5 people, that is then sampled 300,000 times. The actual method of calculating the odds is rather complicated, and I don't feel like working through the math at the moment, but if .0019 of a population of 300,000 is expected to have certain names, that means we can expect 570 people in that population to have those names. Not to mention the expected 1,860 people who would have the surname Omar (.0062 * 300000). So you can see that the odds of picking 6 out of 19 surnames out of that population, would not be that difficult, and certainly not 1 in 21.7 billion.
This is actually a common misunderstanding of probability. A famous example of this is the birthday paradox. If you have a room of 40 people, and wonder what the odds of 2 people sharing the same birthday are, the instictive thing is to figure it is the number of people out of the number of days in the year, 40 out of 365 (or 366), or a little over 10%. This is incorrect though, the correct probability is over 90%, because you are comparing every person, not against one other person, but against every other person in the room.
Update: I ended up in an exchange of opinion with the author on 9/11 Blogger on this issue. Unsurprisingly, his response was to claim my logic was wrong, claim I did not understand the situation, and then rant on about how the Turks killed lots of Armenians (I am still trying to figure out the relevance of that). In any case I finally decided to sit down and work through the math of this, which I am reposting below if anyone is interested:
Incidently, if you want to actually work this out mathematically, you need to take the exact opposite approach. You need to calculate the odds that there is NOT someone with the surname in the target population. The reason you do this is you only need 1 person to prove the hypothesis false, it doesn’t matter if there is 1 or 500.
So to use your assumptions (I am not saying they are correct, but I am trying to keep this simple so that you understand). You estimated that the odds of a single Arab having the surname of Omar is .0062. If we reverse this, that means the odds of a single Arab NOT having the surname of Omar is .9938.
So if we then take your population of 300,000 Arabs, and line them up, this is how we calculate it. Take Arab #1, the odds are .9938 that he is NOT named Omar. Take Arab #2, the odds are also .9938 that he is not Omar, so the odds that both are not named Omar are .9938 * .9938. Now go to the third guy, we now see the odds of all three not being named Omar are .9938 ^ 3. So the odds for going through all 300,000 of our Arabs and NOT finding a single guy named Omar are .9938 ^ 300000, which is an insanely small number that I can’t even get Excel to calculate.
So rather than being extremely unlikely to find a name as you calculated, the odds of going through and NOT finding a single person with this name are very high.
This is of course a vast over simplification of the real world, since you are not accounting for first names, and the 300,000 population is probably a little high, but the point still stands that your approach is completely opposite the correct way.
Additionally to really calculate this, you would have to individually calculate the odds of each of the 19 hijackers, and then figure out the odds that at least 3 of those 19 names would be found.