Monday, February 18, 2013

what do you think, sirs?

Never Show a Good Movie in the Middle of Your Crappy Movie: Modeling the Enjoyableness of MST3K Episodes

Bradford Tuckfield, Bridgette Tuckfield, David Tuckfield, Emma Tuckfield, Madeline Tuckfield, Rebecca Tuckfield, Robert Tuckfield, Sophie Tuckfield

We have identified the criteria that predict the enjoyableness of an MST3K episode.  In this paper we specify a model of the enjoyableness of MST3K episodes as a linear model: Y = B0 + B1X1 + B2X2 + … B18X18+e. The model can be interpreted as follows: Y represents the final evaluated enjoyableness of a particular episode.  Each X variable represents a measure of the degree to which each criterion was met in the episode.  Each B variable represents the coefficient corresponding to each criterion, measuring the extent to which a unit increase in the measured criterion corresponds to increased enjoyableness. To evaluate the model’s validity, we will compare our predictions with other statistics indicating enjoyableness, including user ratings and surveys.  This model will contribute to better understanding of MST3K, the fans it attracts, and the culture of riffing.

I.                   INTRODUCTION

Mike and the bots groan when Raul Julia’s character, the heroically named Aram Fingal, cues up Casablanca at the beginning of Overdrawn at the Memory Bank.  It’s an ill-advised move.  Despite Memory Bank's best efforts, it does not remotely approach the quality of the older film, as it lacks a compelling story, believable dialogue, quality cinematography, or other common criteria used to judge film.  “Never show a good movie in the middle of your crappy movie,” they say to the screen, and it’s hard to argue.
Despite this, Overdrawn at the Memory Bank is arguably one of the best MST3K episodes.  MST3K is unique - any comparison to other shows or films will inevitably fail. Thus, the quality of a given MST3K episode cannot be judged using the same criteria that are commonly used to judge film.
MST3K episodes are of variable quality; this is certain.  But unlike traditional films such as Casablanca, the quality of a particular episode is more difficult to define.  The base movie has to be bad, yes—but it cannot be too bad.  In addition, it has to be bad in specific ways.
We proposed to determine exactly what characteristics the base movie should have in order to make the MST3K episode enjoyable. To make this determination, we estimated an ordinary least squares regression model with fan rating as the dependent variable, and discover that there are indeed specific criteria that directly and significantly correlate with the enjoyableness of a particular MST3K episode.

II.                METHODOLOGY

First, a quick definition of terms: Here, “movie” refers to only the base movie that, combined with riffing and host segments, comprises an episode of MST3K.  It does not include the riffing and host segments or any other MST3K overlay.  “Badness” refers to the movie’s deficiencies when compared to classics such as Casablanca, and specific components of how “bad” a movie are found below.
In our study, we estimated an ordinary least squares regression model with fan rating as the dependent variable. Our model can be stated as follows:

            Fanratingi = α + β1predictor1 +…+ βnpredictorn + εi

where i indexes all rated movies, error terms εi are i.i.d. random draws from a normal distribution, and the predictors are the criteria described below. Because our sample size was relatively small (N=19) and was actually less than our number of criteria (22), we did not estimate a full model with all predictors, but rather (with a few exceptions) estimated single-predictor models for each criterion. We will report point estimates and significance levels for each of these models.
Fan ratings (used as the dependent variable), are from an independent fan survey conducted at the MST3K fan “Discussion Board.”[1]  Participants were asked to rate episodes on a scale of one to ten for enjoyableness.  Although votes per episode varied due to the release status, votes averaged 10.802 per episode, varying between six and nineteen votes.
In order to maximize the variance of the dependent variable, we chose ten of the lowest-ranked episodes and nine of the highest ranked episodes to watch and score according to our criteria.  We watched two movies as a group to informally ensure that our ratings were well-calibrated and that we had a high level of inter-rater agreement.

The movies chosen were:

GOOD (Highest-ranked):
Manos: The Hands of Fate
The Violent Years
Beast of Yucca Flats
Night of the Blood Beast
Jack Frost
Girl in The Gold Boots
Pod People
Final Sacrifice

BAD (Lowest-Ranked):
Lost Continent
Blood Waters of Dr. Z 
Castle of Fu Manchu
Swamp Diamonds
She Creature
Invasion USA

Gamera vs. Zigra
Mighty Jack

We created a score sheet to record measurements.  We included subjective, objective, and meta-data criteria on the score sheet, as detailed below.

A.     SUBJECTIVE CRITERIA (graded on a scale of 1-7)

We determined 5 criteria that were “subjective” to opinion, based on overall plot and execution.

1)     MONOTONY—depended on scene and character changes; visual interest to film.  A movie that scored 1 would have almost no scene or character changes, visual interest, or modulation in soundtrack/dialogue.  A movie that scored 7 would have a great deal of all of these, probably to the detriment of any storytelling.

2)     INTERESTING PREMISE OF MOVIE—referred to novelty and interest of plot, conflict, and stakes. Or in other words, if someone just told you the plot or premise of the film’s plot, how engaged or intrigued would you be?  Is an interesting story promised by the film?  This was ranked on a scale of 1-7, with 1 being cliché or incredibly dull and 7 being fascinating.

3)     DELIVERY OF PREMISE: referred to how well the film acquitted itself in presenting and exploring the conflict inherent in the plot, no matter what or how engaging the premise? 1 meant utter failure to deliver; 7 meant completely successful delivery.

4)     OUTLANDISHNESS (PLAUSIBILITY) of MOVIE: Differing from suspension of disbelief, this related to the overall premise of the movie.  1 meant the movies was completely outlandish and you must suspend all sense of reality to watch; 7 meant high realism.

5)     QUALITY OF FILM:  Covered the quality of the film without the riff.  It took into account traditional film judgment qualities like finesse of direction, realism in character dialogue and interaction, emotional resonance, quality of cinematography.

B.     OBJECTIVE CRITERIA (numbered)

            While watching, we kept score of a number of “objective” criteria whose instances could be tallied.

1)     “OUT OF NOWHERE” OR ANTEATER FACTOR: So named for the moment in “Overdrawn at the Memory Bank” where anteaters are disparaged and the riffers go “Phew!  Huge slam on anteaters out of nowhere!”  We defined this as when an underlying movie does something absolutely out of nowhere—like when Puma Man suddenly leaps into the air and flies, although pumas are not known for their flight capabilities.  Or, when the protagonist of “Cave Dwellers” suddenly finds himself in possession of a working and modern hang-glider. 

2)     SUSPENSION OF DISBELIEF: Defined as an instance where the characters do not react in a normal human fashion, or the world does not react to the laws of physics despite nothing that went previously indicating that they wouldn’t.  For example, when a child goes missing, a father might do a cursory examination of the scenery and then move on.  Or, in Manos, when the hero gets knocked out and tied up, and does not particularly seem put out or ever really mention it happening to his family.

3)     NUMBER OF FAMOUS PERSON REFERENCES:  Referred to references made to or jokes hinging upon references to a famous person.  This was chosen to determine whether referential humor affected the quality of the episode.

4)     FAILURE TO DELIVER:  This was context-based.  We tallied the number of times a movie failed to deliver something promised.  For example, a death-cult that might destroy the world as five pasty guys with ill-fitting hoods.  Or, a promise in a film that if a particular truck exploded, a whole city would be taken out—followed by a very insignificant explosion with a minimum safe distance of maybe ten yards.

5)     FAILED SEXINESS:  Referred to times a movie tried to titillate—failure being endemic in the attempt. 

6)     SUPERFLUOUS TYPES OF REPETITION:  Some movies suffered from a particular problem or repeating certain types of scenes over and over—for example, in Plan Nine, at least ten minutes of film time are devoted to watching a very slow police car make its way to the cemetery.  Or, in Final Sacrifice, we are treated to innumerable shots of feet getting into vehicles.

7)     MOVIE-MAKING ERRORS:  These were reserved for things not done on purpose—sound equipment or costume zippers visible, severe issues with sound or lighting, etc.


We also tracked whether or not there was a known actor in the film, the film’s genre, the film’s length, the episode’s host, the season the episode aired, and the film’s box office gross, if obtainable.
The quality of host segments was considered to be relatively constant and uncorrelated with other observed variables, and thus included in the error term of the estimated regression.  Our intuition was that with practice and feedback, the riffing quality would increase over time. Informally, we found that episodes from Season One were the least enjoyable. This matches our expectations since the show had just started and the riffing had not yet reached a high level of polish. When combined with the unpleasant riffed movie, this made the whole experience almost unwatchable.   Upon further review, this was reflected in the data from the initial survey, where Season One was extremely poorly rated, but seemed to swing upwards as time went on.  We believe this was due to the technicalities of hitting their stride as a show and figuring out what worked and what didn’t, rather than a reflection of the movies chosen or the quality of riffing.
The weighted running averages of these movies are found at website known as the MST3K “Discussion Board.”[2]

III.             RESULTS

The point estimates and significance levels of our estimated models are shown below. There were significant relationships between fan rating and monotony, good premise, and length. There were marginally significant relationships between fan ratings and suspension of disbelief and failure to deliver.

Point Estimate
Good Premise
Delivering on Premise
“Out of nowhere” factor
Suspension of Disbelief
References to Famous People
Failure to Deliver
Attempts at Sexiness
Superfluous Scenes
Technical Errors
Including a Famous Actor
Genre: Horror
Genre: Fantasy
Genre: Crime
Genre: Period
Genre: Sci-fi
Genre: Thriller
Genre: Western
Length in Minutes
Mike rather than Joel
*p<.1; **p<.05; ***p<.01

The figures below show scatterplots and fitted lines for the five significant or marginally significant predictors of fan rating.

Or rather, they would have if I could have gotten the scatterplots and fitted lines to copy onto the blog.  But trust me they were awesome and Bradford did them amazingly.

Based on this data, for the MST3K episode to be most enjoyable, the base movie will 1) not be monotonous; 2) will provide an interesting premise, 3) will contain more, rather than fewer instances of requiring a suspension of belief; 4) will implicitly promise, then fail to deliver important plot points, and 5) will be shorter, rather than longer.

IV.              CONCLUSION

In our lives as American consumers, we are served up a lot of crap, through life and what is presented to us as entertainment.  Life is hard, and sometimes the only movie playing at the local theater is Transformers II.  We find ourselves identifying with Mike and Joel and the robots, forced to consume despair-inducing junk because there is no other choice—and, in fact, it is sold to us as quality.  We are told these things are good, and they present themselves as enjoyable and desirable even though we know they are not.

There is a palpable despair in contemplating this cognitive dissonance.  Is this all there is?, we can feel like asking. Is this supposed to be what resonates?  Is this it?
While one cannot outright reject difficulty and despair, one can confront it rather than giving in.  This is done, as the hosts demonstrate, by banding together with others and calling out absurdity through camaraderie and humor.
Most people’s lives have variable and engaging premises with a certain understanding that variable and engaging things will be delivered—the American Dream.  Yet life can fall into monotony that more often than not fails to deliver anything overly spectacular.  It’s in this strange groove that the best MST3K episodes fall—the episodes cannot be too awful, or they merely incite boredom.  No, there needs to be a palpable despair induced by the promise of entertainment and escape, and then falling so spectacularly short of it.
It seems the best MST3Ks are little microcosms of an average human existence.  But instead of quietly accepting this and being driven mad by the despair (as the Mads hoped Mike and Joel might) it is best confronted through humor and unity.
A good MST3K episode can transcend ages.  Young kids enjoy it just as much as the adults, so the entire family can enjoy them together, and as a family of eight with kids spanning 11 years, we have done just that.  Not only does the family enjoy watching MST3K together, it provides years of continual bonding.  It provides a source of inside jokes for the family.  Comedy can unite even perfect strangers.  For example, when two strangers who know the story of “Alice in Wonderland” bond when, after the first tries to get his broken watch working, the second says "but it was the best butter."  So too does the family or group build bonding ties when in certain situations someone comes out with an appropriate memorable quote, such as "Huge slam on anteaters out of nowhere!," or "‘Sidehackin’ is the thing to do," or "Don't be splayed, don't be splayed!" or "McCloud!" or, even "Rowsdower!"  These become inside jokes for all members of the family, lightens the mood, and creates more unity—just as poignant, in its own way, as “Here’s looking at you, kid.”
We, the authors, have watched Casablanca as a family and in groups, and it is indeed an amazing film with great personal and human resonance and cultural relevancy.  But like many great films, it is a great personal experience, while a good MST3K is a group experience in overcoming despair, and fostering unity and friendship.
Never show a good movie in the middle of your crappy movie; it will ruin the fun.


  1. Ha! I found it! I was hoping this was online, because I lost my hard copy. (And, I just watched 'Overdrawn at the Memory Bank.)