Thursday, March 29, 2007

Team Science!



So, anybody want to contribute to science and possibly have a decent belly laugh in the process? I may be able to help. At the very least, this will give me a chance to use blogger 2.0's label feature. So it goes something like this.

When I was in college, one of my chums from campus crusade made a web page that will display a quote from a database of lots of them upon command. You can even, dare I say should even, add your own. At least click through a few of them, otherwise the rest of this entry may not make sense without the context.

So, after spending maybe half an hour cycling through quotes one night this winter break (I got really, really bored over break. Please don't hate me) I got to wondering, just how random is this so-called random quote generator?

There are a couple ways that we could answer this question. The simplest would probably be to just ask Amos how he coded the page.

Lame!

The alternative way would be to determine through experiment and statistical inference whether or not the quote generator is truly random. Much more work, to be sure, but on the other hand its a good project that I'm pretty sure has never been done before, and eventually I'm going to need a plan B presentation topic...

Ho: The random quote generator is, in fact, random.
Ha: The random quote generator is really only a quasi-random quote generator.

So, this is a theoretically easy thing to measure. Most quotes in the database are fairly short, no more than a few lines of text, thus I decided to add the entire Gettysburg address to the bank so that when a person is cycling through the quotes rather quickly it will still be easily recognizable. So, the experiment part is to simply refresh the page an exorbitantly large but known number of times and just record how many times Mr. Lincoln's speech is observed.

We can then perform a Pearson's chi-square test comparing the number we observe to the number we expect to see, given the null hypothesis of randomness. (The consequence of randomness, of course, is that all quotes in the database have a 1/X [X=the total # of quotes] probability of being displayed). Therefore, the expected number of times we see the Gettysburg address pop up is equal to the number n total refreshes in all trials times the probability p of occurrence.

The other hypothesis that I want to test is that the page is programmed to preferentially give the same quote as the previous refresh, but in a different color. I call this effect "color swap dejavu." If this turns out to be statistically significant, we can also infer quasi-randomness because of the weighting given to the last quote in the sequence.

In this case knowing how to do stuff is the easy part. Now the trick is to simply start counting up enough refreshes to produce at least 15 hits. I'm going to need a clicker and a lot of time, and, if you feel so inclined, some help gathering the data. I mean, let's face it, you want to know too, and the sooner we get enough hits, the sooner we can find out (with 95% confidence) what is going on in that thing.
-----------------------------------------------------------------------------
Addendum 3/29/07
I just found some of the old data I took a while back. I swear, it was like 10 days worth of work, and also I swear I don't always use my time this frivolously. Anyhoo, it might help if you're foolish enough to assist in this project!


Ex n q hit? CSDJV
1 100 452 0 0
2 100 452 0 0
3 100 452 0 0
4 100 452 1 0
5 100 452 0 0
6 100 452 0 0
7 100 452 0 1
8 100 452 0 1
9 100 452 0 2
10 100 452 1 0
11 40 452 0 2
12 100 453 0 0
13 100 453 0 2
14 100 453 0 0
15 100 453 1 0




5 comments:

Katrina said...

How can you so callously flip through all the quotes without reading them? They're often quite funny!

Anonymous said...

Topher. I have a lot of desire to see the results of your experiment but absolutely no understanding of how you are getting them. I am going on blind faith that you know what you're doing and, at the very least, will present your conclusions with illustrious fanfare and wit.

Unknown said...

My head hurts.

—b

Victoria said...

i'm leaning towards HA.....simply b/c before I read the whole entry, and I flipped through about 10 quotes I got the same one a couple times....

V

Victoria said...

Hey Chris,

I had to borrow you post label of nerdmongering...it is just that good

Vicki