How to Steal Baseball Signs with Spreadsheets

Back when I was in 11th grade, I was one of the benchwarmers on the high school baseball team. Aside from needing to be ready to go in for pinch running duties, we also got assigned various tasks like statkeeper, 1st base coach, foul ball retriever and batboy (that one was the worst). Everyone left over got assigned the vague job of sign-stealing. Usually our attempts to steal signs were futile – between the wide array of touches, wipes and tugs, there was just too much going on to really keep track. But what if there were some way of using spreadsheets to enhance our sign-stealing efforts?

During a baseball game, signs are happening everywhere, allowing the managers to communicate to each player without verbally shouting or huddling after every pitch.

Baseball Signs GIF

This great graphic from Edward Tufte illustrates the flow of baseball signs:

There are a number of different baseball sign systems – positive indicator, negative indicator, number patterns, more can be found here. However, for our spreadsheet, we’ll focus on the positive indicator system, which in our experience is relatively more common.

Define Spreadsheet inputs:

One entry for every sign sequence given by the third base coach, and whether anything happened on that play

Define Spreadsheet outputs:

The spreadsheet will help point us to which sign is likely to be the indicator, and then for each indicator, it will point us to which signs are likely to be the real signs for each action (bunt/steal)

Step 1: Create a system for entering in the 3rd base coach’s signs:

In cells B4:C20, we created a code for the various body parts that are touched during the signs. In row 23 and beyond, we have individual entries for the signs given during the game. B23:B34 is what happened on the subsequent play, and C23:N34 is the sign sequence that preceded the play. For our example, we are using the RAND() function to generate random signs.

Step 2: Start decoding – finding the indicator

First we try to figure out what the indicator is. We will compare the signs used when something happened (bunt, steal) with the signs used when nothing happened to see if any sign is more much more frequent in the real signs versus the nothing signs. In theory, almost all of the plays resulting in steals/bunts should have the real indicator included in that sign sequence (however, some players might have the green light to steal without the coach’s orders).

We want to count the number of touches for each body part for when nothing happened, and then count the body part touches for when something happened, and then compare the relative frequencies. There are various ways of doing this, but the way we’ll do it is putting a separate to the table to the right, where each entry is the combination of the Result and the Sign:

Note the $ on the $B23 to keep that one referencing the Result column B when we paste it over – you can get a reminder on references in our video: SUMIF, paste, & Reference.

Now for each sign, we use the COUNTIF function to count how many times it showed up in a sign sequence where Nothing happened:

Then calculate the percentage of total signs each sign is used for the nothing signs:

Then we do the same thing for the signs where something happened (bunt or steal), and note the difference.

We added conditional formatting to help us see which body part shows up more in the real signs than the fake signs. Here, it looks like Right Forearm (6.98% difference), Right Thigh (6.39%), Left Forearm, and Left Wrist are the most promising potential indicators. When the right forearm was touched, something happened 12.3% of the time, while when it went untouched, only 5.3% of the time did anything happen, so it might be our indicator. We are using random data here, but if this were real data, we’d imagine the real indicator would show up almost every time when something happened on the play.

Step 3: Use a similar procedure to figure out which sign is bunt and steal

Now for each indicator, we can use a similar procedure to figure out which body parts most likely correspond to the real sign to steal or bunt. We create an input field in cell L2, for the indicator that we are testing for as the real indicator. For each row, we then use the MATCH() function to see whether the indicator shows up in the row – if it doesn’t, the function returns 0, if it does, it returns the number location of the match:

Okay, bear with us for this next formula – we are doing the same thing as we did with the indicator, but here, we only want to display an output for the signs that are delivered after the indicator is signed, otherwise we display a blank. We add in column numbers at the top and only display the Action+sign if it comes after the match() column number for the indicator that we just calculated:

Now we go back to the output table at the top and use the same procedure as the indicator to count how many of each sign was signed after the indicator we are testing (right forearm) was touched. The real steal and bunt signs should show up frequently when a steal or bunt actually happened and infrequently when it doesn’t. The spreadsheet calculates and displays the difference in a way that points us in the right direction. As a reminder, we are just using random data here, but hopefully in real life something would pop out here that would show up as near 100%.

It’s possible that the indicator we are testing isn’t the real indicator. The beauty of spreadsheets is that we wrote all our logic referencing cell L2, and made it flexible enough so that we could easily test another indicator by changing that cell. Let’s try the right thigh (RT):

Maybe right thigh is the indicator, left forearm is the steal sign, and right thigh is the bunt sign!

You can access the spreadsheets here: