Movie data analysis – movies turning 20

Have you seen the list of 40 movies turning 20 years old?   Coming out in 1993, these movies include Jurassic Park, Robin Hood Men in Tights, Cool Runnings, Mrs. Doubtfire, and Free Willy.  (The full list can be found from the link at the bottom of the post).  Now you may thinking to yourselves, how could spreadsheets possibly be related to movies?  

Spreadsheets can help us answer such questions – provided we have the neat data that movies offer us, such as ratings, gross sales, or number of theaters viewed, just to name a few.    With a spreadsheet and problem-solving skills, we can analyze the data and possibly come to some interesting findings.

Let’s say we have the IMDb ratings for the 5 movies we mentioned above and list them in a spreadsheet.  Using spreadsheets, we can do some basic statistic analysis:

    • What is the highest rating in this data set?
    • What is the lowest rating?
    • What is the average movie rating?

Let’s check out  the functions:  MAXMIN, and AVERAGE.  For an introduction to functions, please click here.  As their names aptly state, the MAX and MIN functions will return the maximum and minimum value, respectively, within a data set.   The output of the AVERAGE function is the arithmetic mean calculated as the sum of all values divided by the total count of values in a given data set.

Discover how these functions  (highlighted in yellow) are applied in the Google spreadsheet below.

[googleapps domain=”docs” dir=”spreadsheet/pub” query=”key=0ArU-OSCYb_YpdHBxY3FHZVN3OU5DbkxJR052MEJNQUE&output=html&widget=true” width=”624″ height=”300″ /]

Now yes, we agree that this example can easily be done by hand or with a calculator.  But as your data set grows, these functions become even more useful.  For instance, let’s say we had wanted to analyze all 40 movies in the list or the top 100 grossing movies of 2012.  Not only can we calculate the maximum and minimum values, and average, as shown above, we can analyze other statistical measures, such as other measures of central tendency (median, mode), dispersion (standard deviation and variance), and rank (percentrank).  You can also use functions to quickly count the number of data points you have (count, counta).

Check all this out in our video below.   Of course, if you wish to hone into a particular topic or concept, search for the video tutorial in our Youtube channel or website.

[youtube=http://youtu.be/7PIz1n8zqEc&w=640&h=385]

Resources:

  • IMDb ranking movie data spreadsheet:  click here
  • Top-grossing 2012 movie data spreadsheet:  click here
  • Buzzfeed’s list of 40 movies turning 20:  click here