A good friend introduced me to a site by Payscale that compares the “value” of a college education through a measure called the ROI (Return on Investment), broadly calculated as the difference between the expected future income stream of a college graduate and the actual cost of attending college.
While this measure is one way to quantify “value,” keep it mind it ignores various intangible benefits of attending college (such as social network, happiness or personal fulfillment, etc). That said, our goal today is not critique this measure. Rather, I’d like to pose a challenge to you.
In our previous post, we introduced a spreadsheet analysis of college data: ABCD: Analyzing Big College Data. This data included the college’s name, location, type, graduation rate, and more. From Payscale, we now have new data, including college ROI.
How can we easily merge both datasets, creating one spreadsheet? In other words, how can we create an all-encompassing “master” spreadsheet that includes both our former data (college name, region, tuition, average professor salary, etc) and new data (ROI) for each college?
Let’s take a step back and ask why we should even consider doing this. In short, including all the data onto one spreadsheet facilitates data analysis! Specifically:
- We can more nimbly copy and paste formulas across the columns or rows of one spreadsheet. For instance, once included on the same spreadsheet, we can calculate the correlation between school rank and ROI.
- It is easier to create a pivot table when all the data is on the same spreadsheet.
- Data visualization (i.e. charts and graphs) is typically easier when all the data is on the same spreadsheet.
Great! So how do we go about doing this? One way is to manually look for the college name in both spreadsheets and copy and paste the new data into the original spreadsheet, which would then become our master spreadsheet. This is a perfectly fine solution when you have a small sample of data or an immense amount of time and patience! But what if that’s not the case? Our original spreadsheet consisted of over 200 colleges, so entering in the ROI from the new database for each college would be time-consuming and laborious!
Fortunately, there is a spreadsheet function that could make our life much easier: the VLOOKUP function. The VLOOKUP function is very appropriate given how the data in our spreadsheet is organized. In a nutshell, the VLOOKUP function looks up a value in the first column of a table and returns a value in the same row from another column in the table.
Discover the VLOOKUP function in our 8-minute video tutorial:
Let’s break our goal into small steps. Once again, our mission is to create a master spreadsheet that includes college data from both our original spreadsheet and a new source.
Step 1 would be to transfer the data from the Payscale site into a spreadsheet. We have already done this step for you by copying and pasting the data into Google spreadsheet. You will see two tabs: the “Master” tab, which is where we will use the VLOOKUP function, and the “Payscale” tab, which incorporates the data from the Payscale site.
Click here to access the spreadsheet (read-only). To edit it, first save it on your Google drive: College Data and VLOOKUP
Step 2: Let’s try out the VLOOKUP function. As an example, we show how to use this function to incorporate the “category” data into “Master” tab from the Payscale tab. As a sample, we only show the first 20 or so colleges in our “Master” tab.
Can you use the VLOOKUP function to pull in annual ROI data?
Keep in mind that while the VLOOKUP function is powerful, it is not perfect. This function is most effective when the value or term you are looking up is exactly the same across both data-sets. As such, there may be situations where an “error” occurs because the names do not match perfectly. This can be remedied by copying and pasting the terms so they match in both data-sets.
Related posts: