Never Show a Good Movie in the Middle of Your Crappy
Movie: Modeling the Enjoyableness of MST3K Episodes
Bradford Tuckfield, Bridgette Tuckfield,
David Tuckfield, Emma Tuckfield, Madeline Tuckfield, Rebecca Tuckfield, Robert
Tuckfield, Sophie Tuckfield
Contact:
brt@wharton@upenn.edu; tbridgette.day@gmail.com
Abstract
We
have identified the criteria that predict the enjoyableness of an MST3K
episode. In this paper we specify a model of the enjoyableness of MST3K
episodes as a linear model: Y = B0 + B1X1 + B2X2 + … B18X18+e. The model can be
interpreted as follows: Y represents the final evaluated enjoyableness of a
particular episode. Each X variable represents a measure of the degree to
which each criterion was met in the episode. Each B variable represents
the coefficient corresponding to each criterion, measuring the extent to which
a unit increase in the measured criterion corresponds to increased enjoyableness.
To evaluate the model’s validity, we will compare our predictions with other
statistics indicating enjoyableness, including user ratings and surveys. This model will contribute to better
understanding of MST3K, the fans it attracts, and the culture of riffing.
I.
INTRODUCTION
Mike and the bots groan when Raul Julia’s character, the
heroically named Aram Fingal, cues up Casablanca
at the beginning of Overdrawn at the
Memory Bank. It’s an ill-advised
move. Despite Memory Bank's best efforts, it does not remotely approach the
quality of the older film, as it lacks a compelling story, believable dialogue,
quality cinematography, or other common criteria used to judge film. “Never
show a good movie in the middle of your crappy movie,” they say to the screen,
and it’s hard to argue.
Despite this, Overdrawn
at the Memory Bank is arguably one of the best MST3K episodes. MST3K
is unique - any comparison to other shows or films will inevitably fail. Thus,
the quality of a given MST3K episode cannot be judged using the same criteria
that are commonly used to judge film.
MST3K episodes are of variable quality; this is certain. But unlike traditional films such as Casablanca, the quality of a particular
episode is more difficult to define. The
base movie has to be bad, yes—but it cannot be too bad. In addition, it has to be bad in specific
ways.
We proposed to determine exactly what
characteristics the base movie should have in order to make the MST3K episode
enjoyable. To make this determination, we estimated an ordinary least squares regression model with
fan rating as the dependent variable, and discover that there are indeed
specific criteria that directly and significantly correlate with the
enjoyableness of a particular MST3K episode.
II.
METHODOLOGY
First, a quick definition of terms:
Here, “movie” refers to only the base movie that, combined with riffing and
host segments, comprises an episode of MST3K.
It does not include the riffing and host segments or any other MST3K
overlay. “Badness” refers to the movie’s
deficiencies when compared to classics such as Casablanca, and specific components of how “bad” a movie are found
below.
In our study, we estimated an
ordinary least squares regression model with fan rating as the dependent
variable. Our model can be stated as follows:
Fanratingi = α + β1predictor1 +…+ βnpredictorn + εi
where i indexes
all rated movies, error terms εi are i.i.d. random draws from a
normal distribution, and the predictors are the criteria described below.
Because our sample size was relatively small (N=19) and was actually less than
our number of criteria (22), we did not estimate a full model with all
predictors, but rather (with a few exceptions) estimated single-predictor
models for each criterion. We will report point estimates and significance
levels for each of these models.
Fan ratings (used as the dependent
variable), are from an independent fan survey conducted at the MST3K fan
“Discussion Board.”[1] Participants were asked to rate episodes on a
scale of one to ten for enjoyableness.
Although votes per episode varied due to the release status, votes
averaged 10.802 per episode, varying between six and nineteen votes.
In order to maximize the variance
of the dependent variable, we chose ten of the lowest-ranked episodes and nine
of the highest ranked episodes to watch and score according to our
criteria. We watched two movies as a
group to informally ensure that our ratings were well-calibrated and that we
had a high level of inter-rater agreement.
The movies chosen were:
GOOD (Highest-ranked):
Manos:
The Hands of Fate
The
Violent Years
Beast
of Yucca Flats
Night of the Blood Beast
Jack Frost
Girl in The Gold Boots
Pod People
Soultalker
Final Sacrifice
Night of the Blood Beast
Jack Frost
Girl in The Gold Boots
Pod People
Soultalker
Final Sacrifice
BAD (Lowest-Ranked):
Lost
Continent
Blood Waters of Dr. Z
Blood Waters of Dr. Z
Castle
of Fu Manchu
Swamp Diamonds
Gunslinger
She Creature
Hamlet
Invasion USA
Gamera vs. Zigra
Mighty Jack
Swamp Diamonds
Gunslinger
She Creature
Hamlet
Invasion USA
Gamera vs. Zigra
Mighty Jack
We created a score sheet to record
measurements. We included subjective,
objective, and meta-data criteria on the score sheet, as detailed below.
A. SUBJECTIVE CRITERIA (graded on a scale of
1-7)
We determined 5 criteria that were
“subjective” to opinion, based on overall plot and execution.
1)
MONOTONY—depended on scene and character
changes; visual interest to film. A movie
that scored 1 would have almost no scene or character changes, visual interest,
or modulation in soundtrack/dialogue. A
movie that scored 7 would have a great deal of all of these, probably to the
detriment of any storytelling.
2)
INTERESTING PREMISE OF MOVIE—referred to novelty
and interest of plot, conflict, and stakes. Or in other words, if someone just
told you the plot or premise of the film’s plot, how engaged or intrigued would
you be? Is an interesting story promised
by the film? This was ranked on a scale
of 1-7, with 1 being cliché or incredibly dull and 7 being fascinating.
3)
DELIVERY OF PREMISE: referred to how well the
film acquitted itself in presenting and exploring the conflict inherent in the
plot, no matter what or how engaging the premise? 1 meant utter failure to
deliver; 7 meant completely successful delivery.
4)
OUTLANDISHNESS (PLAUSIBILITY) of MOVIE:
Differing from suspension of disbelief, this related to the overall premise of
the movie. 1 meant the movies was
completely outlandish and you must suspend all sense of reality to watch; 7
meant high realism.
5)
QUALITY OF FILM:
Covered the quality of the film without the riff. It took into account traditional film
judgment qualities like finesse of direction, realism in character dialogue and
interaction, emotional resonance, quality of cinematography.
B. OBJECTIVE CRITERIA (numbered)
While
watching, we kept score of a number of “objective” criteria whose instances
could be tallied.
1)
“OUT OF NOWHERE” OR ANTEATER FACTOR: So named
for the moment in “Overdrawn at the Memory Bank” where anteaters are disparaged
and the riffers go “Phew! Huge slam on
anteaters out of nowhere!” We defined
this as when an underlying movie does something absolutely out of nowhere—like
when Puma Man suddenly leaps into the air and flies, although pumas are not
known for their flight capabilities. Or,
when the protagonist of “Cave Dwellers” suddenly finds himself in possession of
a working and modern hang-glider.
2)
SUSPENSION OF DISBELIEF: Defined as an instance
where the characters do not react in a normal human fashion, or the world does
not react to the laws of physics despite nothing that went previously
indicating that they wouldn’t. For
example, when a child goes missing, a father might do a cursory examination of
the scenery and then move on. Or, in Manos, when the hero gets knocked out
and tied up, and does not particularly seem put out or ever really mention it
happening to his family.
3)
NUMBER OF FAMOUS PERSON REFERENCES: Referred to references made to or jokes
hinging upon references to a famous person.
This was chosen to determine whether referential humor affected the
quality of the episode.
4)
FAILURE TO DELIVER: This was context-based. We tallied the number of times a movie failed
to deliver something promised. For
example, a death-cult that might destroy the world as five pasty guys with
ill-fitting hoods. Or, a promise in a
film that if a particular truck exploded, a whole city would be taken
out—followed by a very insignificant explosion with a minimum safe distance of
maybe ten yards.
5)
FAILED SEXINESS:
Referred to times a movie tried to titillate—failure being endemic in
the attempt.
6)
SUPERFLUOUS TYPES OF REPETITION: Some movies suffered from a particular
problem or repeating certain types of scenes over and over—for example, in Plan Nine, at least ten minutes of film
time are devoted to watching a very slow police car make its way to the
cemetery. Or, in Final Sacrifice, we are treated to innumerable shots of feet
getting into vehicles.
7)
MOVIE-MAKING ERRORS: These were reserved for things not done on
purpose—sound equipment or costume zippers visible, severe issues with sound or
lighting, etc.
C. META-DATA
We also tracked whether or not
there was a known actor in the film, the film’s genre, the film’s length, the
episode’s host, the season the episode aired, and the film’s box office gross,
if obtainable.
The quality of host segments was
considered to be relatively constant and uncorrelated with other observed
variables, and thus included in the error term of the estimated regression. Our intuition was that with practice and
feedback, the riffing quality would increase over time. Informally, we found
that episodes from Season One were the least enjoyable. This matches our
expectations since the show had just started and the riffing had not yet
reached a high level of polish. When combined with the unpleasant riffed movie,
this made the whole experience almost unwatchable. Upon further review, this was reflected in
the data from the initial survey, where Season One was extremely poorly rated,
but seemed to swing upwards as time went on.
We believe this was due to the technicalities of hitting their stride as
a show and figuring out what worked and what didn’t, rather than a reflection
of the movies chosen or the quality of riffing.
The weighted running averages of
these movies are found at website known as the MST3K “Discussion Board.”[2]
III.
RESULTS
The point estimates and significance
levels of our estimated models are shown below. There were significant
relationships between fan rating and monotony, good premise, and length. There
were marginally significant relationships between fan ratings and suspension of
disbelief and failure to deliver.
Point Estimate
|
|
Monotony
|
.423***
|
Good Premise
|
.146***
|
Delivering on Premise
|
.004
|
Outlandish-ness
|
-.116
|
“Out of nowhere”
factor
|
.157
|
Suspension of
Disbelief
|
.095*
|
References to Famous
People
|
.033
|
Failure to Deliver
|
.282*
|
Attempts at Sexiness
|
.160
|
Superfluous Scenes
|
-.158
|
Technical Errors
|
-.031
|
Including a Famous
Actor
|
-.091
|
Genre: Horror
|
.395
|
Genre: Fantasy
|
.567
|
Genre: Crime
|
.250
|
Genre: Period
|
-2.723
|
Genre: Sci-fi
|
-.535
|
Genre: Thriller
|
-.946
|
Genre: Western
|
-1.190
|
Length in Minutes
|
-.029**
|
Mike rather than Joel
|
.7403
|
Season
|
.091
|
*p<.1; **p<.05;
***p<.01
The figures below show scatterplots
and fitted lines for the five significant or marginally significant predictors
of fan rating.
Or rather, they would have if I could have gotten the scatterplots and fitted lines to copy onto the blog. But trust me they were awesome and Bradford did them amazingly.
Based on this data, for the MST3K
episode to be most enjoyable, the base movie will 1) not be monotonous; 2) will
provide an interesting premise, 3) will contain more, rather than fewer
instances of requiring a suspension of belief; 4) will implicitly promise, then
fail to deliver important plot points, and 5) will be shorter, rather than
longer.
IV.
CONCLUSION
In our lives as American consumers,
we are served up a lot of crap, through life and what is presented to us as
entertainment. Life is hard, and
sometimes the only movie playing at the local theater is Transformers II. We find
ourselves identifying with Mike and Joel and the robots, forced to consume
despair-inducing junk because there is no other choice—and, in fact, it is sold
to us as quality. We are told these
things are good, and they present themselves as enjoyable and desirable even
though we know they are not.
There is a palpable despair in
contemplating this cognitive dissonance.
Is this all there is?, we can feel like asking. Is this supposed to be
what resonates? Is this it?
While one cannot outright reject
difficulty and despair, one can confront it rather than giving in. This is done, as the hosts demonstrate, by
banding together with others and calling out absurdity through camaraderie and
humor.
Most people’s lives have variable
and engaging premises with a certain understanding that variable and engaging
things will be delivered—the American Dream.
Yet life can fall into monotony that more often than not fails to
deliver anything overly spectacular.
It’s in this strange groove that the best MST3K episodes fall—the
episodes cannot be too awful, or they merely incite boredom. No, there needs to be a palpable despair
induced by the promise of entertainment and escape, and then falling so
spectacularly short of it.
It seems the best MST3Ks are little
microcosms of an average human existence.
But instead of quietly accepting this and being driven mad by the
despair (as the Mads hoped Mike and Joel might) it is best confronted through
humor and unity.
A good MST3K episode can transcend
ages. Young kids enjoy it just as much as the
adults, so the entire family can enjoy them together, and as a family of eight
with kids spanning 11 years, we have done just that. Not only does the
family enjoy watching MST3K together, it provides years of continual bonding.
It provides a source of inside jokes for the family. Comedy can
unite even perfect strangers. For example, when two strangers who know
the story of “Alice in Wonderland” bond when, after the first tries to get his
broken watch working, the second says "but it was the best butter."
So too does the family or group build bonding ties when in certain
situations someone comes out with an appropriate memorable quote, such as
"Huge slam on anteaters out
of nowhere!," or "‘Sidehackin’ is the thing to do," or
"Don't be splayed, don't be splayed!" or "McCloud!" or,
even "Rowsdower!" These become inside jokes for all members of
the family, lightens the mood, and creates more unity—just as poignant, in its
own way, as “Here’s looking at you, kid.”
We, the authors, have watched
Casablanca as a family and in groups, and it is indeed an amazing film with
great personal and human resonance and cultural relevancy. But like many great films, it is a great
personal experience, while a good MST3K is a group experience in overcoming
despair, and fostering unity and friendship.
Never show a good movie in the
middle of your crappy movie; it will ruin the fun.
Ha! I found it! I was hoping this was online, because I lost my hard copy. (And, I just watched 'Overdrawn at the Memory Bank.)
ReplyDeleteHOORAY I'm so glad you did!!!
ReplyDelete