THE BOOK cover
The Unwritten Book
is Finally Written!

Read Excerpts & Reviews
E-Book available
as Amazon Kindle or
at iTunes for $9.99.

Hardcopy available at Amazon
SABR101 required reading if you enter this site. Check out the Sabermetric Wiki. And interesting baseball books.
Shop Amazon & Support This Blog
RECENT FORUM TOPICS
Jul 12 15:22 Marcels
Apr 16 14:31 Pitch Count Estimators
Mar 12 16:30 Appendix to THE BOOK - THE GORY DETAILS
Jan 29 09:41 NFL Overtime Idea
Jan 22 14:48 Weighting Years for NFL Player Projections
Jan 21 09:18 positional runs in pythagenpat
Oct 20 15:57 DRS: FG vs. BB-Ref

Advanced

Tangotiger Blog

A blog about baseball, hockey, life, and whatever else there is.

Wednesday, April 17, 2024

Math behind Vertical Movement

  • This is a Kodai Senga curveball, thrown in the dirt, at 63 mph.
  • This is a Kodai Senga curveball, thrown in the heart of the plate, at 73 mph.

This chart shows the trajectory of these two pitches, as it travels through SPACE (click to embiggen).

Now, let me show you the height location of these two pitches, as it travels through TIME.

As you can see, these two pitches overlap a great deal.

Let's talk about vertical movement. The traditional way to measurement movement is to draw a tangent line from some point (release for example) to the plate, and compare the difference to its actual location. You can see an example here.

In this particular case, the faster curveball would have its tangent line go from release to about 7 feet above the ground. We compare that to its actual location at the plate (about 2 feet), and that difference (about 5 feet) is its vertical drop. This pitch had a vertical drop of 63 inches.

We can break it up into two components: 

  • the effect of gravity
  • everything else

Gravity is easy enough to figure, if you remember your physics kinematic equation: one-half acceleration time-squared. At 0.514 seconds, we get 0.5 x 32.174 x 0.514 x 0.514 = 4.25 feet, or 51 inches. So, 51 inches due to gravity and 12 inches due to everything else. That 12 inches is what we call the Induced Vertical Break (IVB).  (Though we really should account for the effect of drag too.)

The slower curveball would have its tangent line go also from about 7 feet to just below the ground, for a total vertical drop of 93 inches. How does that split into gravity and everything else? You might think gravity is 51 inches, but no. Gravity is not based on distance, but time. At 0.610 seconds, that means gravity pulled this pitch down by 72 inches. The remaining, the Induced Vertical Break (IVB), is 21 inches.

Now, how is it possible that two pitches, thrown with similar spin and trajectory can have two such widely different IVB? Time. You see, we are measuring the Induced Vertical Break not with a common time frame, but with a common distance frame. While both pitches travelled the same amount of distance, the slower one took much longer to get there. So, not only does gravity have more time on pulling the ball down, but the spin of the ball also has more time to push the ball down.

What if we wanted to separate that everything-else, that IVB, into both a spin component and time component, could we do that? Sure thing. First, we have to figure out how much of a time frame we are interested in. I will suggest, mostly for illustrative purposes, that we'll measure the downward movement over a span of 0.3 seconds.

When we do that, the effect of gravity is now identical for both pitches: 17 inches. We've been able to effectively neutralize gravity. And this is true not only for these two curveballs, but for any pitch, no matter how fast or slow.  Gravity will always pull a ball down by 17 inches over a 300 msec time frame. In addition, the vertical drop of the two Senga pitches is also identical at 22 inches. These two balls spin the same, so given the same amount of TIME (not distance) they will also have the same spin-effect.

This is how both pitches look broken down into components, all numbers in inches:

63: vertical drop over full flight

  • 22: vertical drop over 300 msec

- 17: gravity over 300 msec

- 5: induced over 300 msec

  • 41: vertical drop over remaining flight

- 34: gravity over remaining flight

- 7: induced over remaining flight

93: vertical drop over full flight

  • 22: vertical drop over 300 msec

- 17: gravity over 300 msec

- 5: induced over 300 msec

  • 71: vertical drop over remaining flight

- 55: gravity over remaining flight

- 16: induced over remaining flight

Is there a reason to prefer 300 msec over the entire flight? Imagine for example trying to measure IVB on a curveball throw from the catcher to 2B. Because of that amount of distance (or time), that throw would have a massive amount of vertical drop, a large portion would be gravity, but also a good amount would be IVB. Even though the spin of the ball would be no different than throwing from the mound.  A batter does not need to worry about the entire trajectory of the pitch, just a small portion of the flight, inside the decision-making zone.  To that end, we don't need to worry about measuring everything from release or from a common distance.  The decision-making zone just requires a common TIME frame.

It would therefore become a decision point for the analyst and user as to what you want to do with IVB, how you want to compare two different pitches. Are you trying to isolate the effect of the spin of the ball? Or, are you trying to include the speed of the throw as well? What is it that you are trying to isolate?

From my standpoint, we can actually isolate everything. And then the user is free to include or exclude whatever components they want.

***

A few extra math notes. You can estimate the effect of gravity using pitch speed as (523 / Speed) ^ 2

A 63 mph pitch for example would be (523 / 63) ^ 2 = 69 inches (in the above pitch we calculated 72).

And 73 mph pitch: (523 / 73) ^ 2 = 51 inches (same number as we calculated above).

This shorthand is useful especially for those who don't have the time-to-plate handy, but do we have their pitch speed handy.

***

The total amount of break is proportional to the square of the time. In the above two pitches, one took 610 msec to get to the plate, while the other was 514. Divide the two, and we get 1.19. Square it to get to 1.41. Remember how the total vertical drop for the two pitches was 93 and 63 inches? 93/63 is 1.48. We're off by a few inches, mostly because these two pitches are not identical.

We can also look at the 300 msec compared to the full flight of 514 msec. Since 514 msec pitch had a total vertical drop of 63 inches, we'd therefore expect 300 msec to be 300/514 or .584, which we square to .34. And 63 x .34 is 22 inches, the earlier number we calculated.

Similarly, the other pitch is 300/610, or .492 squared, or .24. Which we multiply by 93 to give us 22 inches, the same earlier number we calculated.

***

Anyway, I hope all this shed some light on the Total Vertical Drop and the Induced Vertical Break.

() Comments

Wednesday, April 10, 2024

Re-introducing WOWY NetGoals and NetShots for NHL

In a series of tweets many years ago, I described how to work with plus/minus.  You can see part 4, part 5, and part 6, especially.  One of the problems we have with plus/minus is the mixing of even-strength and specialty team goals (while PPG scored or allowed aren't included, shorthanded ones are).  Thanks to the terrific site from Natural Stat Trick, we can handle that aspect quite easily.  For my purposes, I selected Even Strength, though of course breaking that down into 5-on-5, 4-on-4, 3-on-3, would be helpful.  It's also not clear how goalie-pulled is handled (presumably even-strength), and that should be broken down as well.  But, no biggie here, we're just laying the groundwork.

The main problem with plus/minus is that it is heavily team dependent.  But, we can do something about it, which is what those three parts I linked above describe.  I'll implore you to read those, but I am going to presume some of you won't and will ALSO have an opinion as to why it doesn't work!  Again, please read those.  

If you really insist on not reading it, I'll give you the briefest of illustrations: if the Habs with Saku Koivu is +30 EV goals with him on the ice, and +30 with him off the ice, what can we do about it?  Well, first we have to know the ice time.  Let's assume Saku is on the ice one-third of the time (aka, one part on, two parts off).  So, the +30 off-ice needs to pro-rate down to +15.  The +15 is 5 semi-random Habs.  That makes each Hab player would be +3.  Next, we have Koivu + 4 Habs being +30.  Since we just established that each Hab player is +3, then Koivu + 12 equals 30.  And so Koivu is +18.  So, that's how +30 on, and +30 off converts to +18 for the player.  Again, please read those links.

Back in 2006-2008, I worked for a few NHL clubs, via a consulting company.  One of the stats we created was called NetShots.  You may know it by other terms since then, like Corsi, or Fenwick, or SAT or USAT.  They all do the same thing: count the number of shots on ice taken by the team and allowed by the team, with the player on the ice.  It's just a matter of what to call a "shot", whether you limit yourself to shots ON goal (meaning it's either a save or a goal), or shots AT goal (includes wide shots, or shots that hit the crossbar/post), or even blocked shots.  I'm not here to tell you which is the best, or maybe how to even weight them.  Well, I have written about it in a posted I called Weighted Shots Differential, but I haven't really tested it, and others are free to do so.

So everything that applies to Net Goals can also be applied to Net Shots.  Shots are often preferred for their volume, and for neutralizing the goalie.  That's fine, I'm not here to choose.  I'll just show both.

My purpose here is just to present the work I did last night, basically spending 5 minutes downloading the data, another 20 minutes making the calculations, and 20 minutes writing this blog post.  My hope is others pick up the many pieces here, read though my entire set of articles, and come up with something more comprehensive.  In the meantime, here you go (click to embiggen).

  • McDavid is at +25 WOWY NetGoals and +156 NetShots.  
  • AM34 is +29, +152.  
  • MacKinnon is +31, +150.
  • Hyman: +28, +154.  We're going to have a tough time separating linemates here.
  • Kucherov: +17, +91.  This is after our best effort to handling Bolts
  • Barzal: +1, +158.  This is a huge disconnect between actual goals, and actual shots.  What to believe?
  • Brady Tkachuk: +7, +178.  The leader in WOWY NetShots among forwards, but the goals don't match.  What to do?
  • Jack Hughes: -6, +96.  That's a reverse effect
  • Crosby: +10, +92. He's still going strong
  • Giroux: -10, +87.  Like Jack
  • Jordan Staal: -23, +48.  Whoah, that's quite the disconnect
  • Connor Bedard: -13, -37.  Sorry, dude, nothing to hang your hat on here

How about the D?

  • Forsling: +33, +64.  Good with NetShots, and outstanding on NetGoals.  Why?
  • Ekholm: +27, +152
  • Quinn Hughes: +27, +101
  • DeMelo: +27, +10.  Barely above average on NetShots, outstanding on NetGoals.  Again, why?
  • Evan Bouchard: +22, +194, our leader in WOWY NetShots
  • Dahlin: -1, +182.  Second in NetShots, and average in NetGoals, which is a tremendous disconnect.
(6) Comments • 2024/04/11

Tuesday, April 02, 2024

Bayesian inference: How much new information is contained in a streak?

Suppose a team starts the season 3-0, but you had them entering the season to end 81-81.  What is your new forecast?  If that 3-0 record means nothing at all, then you'd assume they play .500 the rest of the way (in 159 games), and you add 3 wins to that, and you get 82.5 wins.  In other words, they gain +1.5 wins in the final season forecast, based strictly that 3-0 is +1.5 ahead of 1.5-1.5.

But, the pre-season forecast can't carry a weight of an infinite number of prior games, such that adding 3-0 to that means nothing at all.  Suppose that the pre-season forecast has the weight of three full seasons.  Here's what happens to the final season forecast after streaks of 1 to 20 games, and how much information those extra wins gives us.

This three-season weight is an ILLUSTRATION.  We need to figure out that weight.  My expectation is that that weight is going to be somewhere between 1 and 3 seasons of weight.  Aspiring Saberists, Assemble.

(1) Comments • 2024/04/02 • Statistical_Theory

Monday, April 01, 2024

Squeeze play in the ninth while already having an 80% chance of winning?

So, this was a pretty odd play. It's the top of the 9th, tie game. The nominal win% for the home team is .500. (It should be .520, all other things equal, and naturally we need to consider the identity of the players, but let's make it easy, and use the base charts available here.)

The first batter walks. That's normally worth .030 wins, but in this high leverage scenario, it's worth .082 wins. Now the home team chance of winning goes down to .418. The next batter has an almost easy DP, but the 2B misplays it for an error, with runners now at the corners. Terrible fielding by the home team drops their chances of winning all the way down to .194.

Having a runner on 3B with less than two outs is a very powerful thing for the batting team. They are in control. So what do they try? A bunt. And not a safety-squeeze, no. But the other kind. Let's see what the batting team might have been thinking.

Remember, the chance of the fielding/home team winning is down to .194. If they pull off this bunt, we have a runner scoring, and runners on 1B and 2B. That puts the winning % for the home team down to .103. But if the worst happens (as it did), the home team chance of winning goes skyrocketing to .498. In other words, that walk that happened, that error that happened? All of that vanishes, and we are back to where we were when we started the inning.

So, you have a chance to gain +.09 wins with the perfect bunt, and you have a chance to lose .30 wins with a missed bunt. You have to be 77% sure that you will get that bunt.

And in fact, it should be higher. Because one of the outcomes is to score the run, but give up the out, and that sets the win% for the home team to .142, so only +.05 wins for the home team. It's also possible you don't get the worst-case scenario, but just a really bad scenario: you lose the runner, but get the batter on base for a home win% of .421 and so a cost of .23 wins.

All in all, it's probably a play you need to make 85% of the time. Which is why you don't see this play performed. I'd like to say ever? An Aspiring Saberist out there can do the legwork.

See, there are places where guts and instincts have a role. When there's enough uncertainty in the math, if there are enough variables to consider, you can push up an extra .01 or .02 wins here, and push down an extra .02 or .03 wins there. There's uncertainty. But in this particular case, when the breakeven point is already ridiculously high, there's not enough uncertainty in the numbers to allow for guts and instincts to possibly play a role.

There's probably three dozen choices a manager can make throughout the game that could go contrary to the numbers, but be justified based on uncertainty of the numbers, and the guts/instincts of the manager. This play was nowhere close to being one of those that required the manager to step in.

() Comments

Sunday, March 31, 2024

Extra Innings: whatsup?

Home win% in regulation games is 54%, but falls to 52% in extra innings for totally normal reasons, having nothing at all to do with who bats first or last. 

This chart shows the win% season by season since 1969.  I've included the Random Variation lines, which I have nominally set at 2 standard deviations.  This implies we should expect to see 5% of these 55 data points (aka 3) to land outside these two lines.  We see alot more than that. Why, I don't know.  I used a flat 52% expected win%, and maybe it should be 51.5% one year and 52.5% another year.  I'll leave that to the aspiring saberists. (click to embiggen)

Of course, the most striking thing in the chart is what's happened since 2020, with the extra inning placed runner (XIPR).  Though inconveniently, the pattern started the season prior to that.  Anyway, here's how it looks when we group in chunks of five seasons.  The data point for "2010" you see at the bottom refers to seasons "2010 - 2014".  And "2015" is "2015 - 2019".

We should only see, maybe, one point outside the 2SD lines, but we see three, with the one from 2020-present way outside even this standard.  Something is definitely going on with how teams are approaching playing with the XIPR.  I'm sure an aspiring saberist can look into this.  Are there any teams that have figured it out?  I'll leave that up to y'all to show.

(3) Comments • 2024/04/01 • Playing_Approach

Monday, March 25, 2024

Explaining why Overtime and Extra Innings home-advantage is seemingly less than regulation

Let's create a simple basketball game, where regulation is made up of 100 free throws by each team.  The free throw line is several feet further out than the current rules, such that the average free throw percentage is 50%.  Home teams average 51%, while away teams average 49%.

In regulation play, after 100 throws from each, the home team will win 58.4% of the time, and tie 5.4% of the time.

Let's create an Overtime game that is 10 free throws by each team, to act as the tie-breaker.  With much fewer confrontations, the chance that the home team will win outright is much lower than during regulation.  It will actually be 44.7% to win outright, 37.7% to lose outright, and another 17.5% to get into another Overtime period.  The OT win% for such a home team will end up being 54.3%

And that means that the overall win% for home teams will be 58.4% plus 5.4% times 54.3%, or 61.4%.

This is how you can take a team that has a seemingly small home-site advantage (making 51% of free throws instead of 50% at the individual play level), balloon all the way to 61.4% at the game-level.  And that it's only 54.3% in Overtime.  Scoring Confrontations.  That's the reason.

And this explains why in MLB, the home team win 54% of their games, and yet wins only 52% of their Extra Inning games.  In baseball, instead of having 100 scoring confrontations each, they have 9 innings each.  And the home team scores 52% of the runs in each inning.  And when you do that, you end up winning 54% of games.  But still 52% of an inning, like an extra inning.

You can apply this concept to hockey and soccer and you will see you can create a simple model to explain it.  Football is different because of the possession rule.

() Comments

Thursday, March 21, 2024

Goodbye Pythag Wins, Hello Gradient Wins

Whether you use Pythag wins from Bill James or Patriot or the more basic one from Pete Palmer (0.1 wins per run), they all treat runs the same: you add it up at the seasonal level, and proceed.  None of them do it game by game.  

When we use the Palmer method, it works like this, if you try to apply it at the game-level: Win the game by 1 run, and that's worth +.10 wins above average or 0.60 wins.  Similarly, win by outscoring your opponent by 2 runs means that game is worth 0.70 wins.  Winning by 3 means it's worth 0.80 wins, and so on.  Winning by 5 is worth 1.00 wins.  Winning by 10 is worth 1.50 wins.

Losses operate the same way.  Losing by 1 run earns you 0.40 wins, losing by 2 earns you 0.30 wins.  Losing by 5 is 0 wins.  Losing by 10 is negative 0.5 wins.

Add all of these up, and over 162 games, you get your equivalent wins.  Of course, there's no point to doing this game by game, since doing it at the seasonal level as runDiff*0.10 + Games*0.500 gives you the identical answer.

We are of course missing the context of each individual game.  In the Palmer method, winning 9 games by 1 run, and losing one game by 9 runs gives you the same answer: total run differential is 0, and so, converts to the equivalent 5 wins and 5 losses.

But, what if winning by one run is not worth 0.60 wins like Palmer implies, but instead is worth 0.83 wins.  And winning by two runs doesn't add 0.10 wins like Palmer implies (worth 0.70 wins), but 0.09 wins, to give us 0.92 wins.  And winning by 3 runs adds 0.08 wins to give us 1.00 wins.  So, winning by THREE is a full win, while Palmer implies winning by FIVE is a full win.  

Here are the gradient wins each team gets, game by game:

Score Win Lose
1 0.83 0.17
2 0.92 0.08
3 1.00 0.00
4 1.07 -0.07
5 1.13 -0.13
6 1.18 -0.18
7 1.22 -0.22
8 1.25 -0.25
9 1.27 -0.27
10+ 1.28 -0.28

When we do that, how does 2023 look?  The Orioles you may remember had a modest run differential of +129 runs implying +13 wins above average or 94 wins.  The Gradient Wins approach gives them 100 wins.  They actually won 101 games.  What we are doing is giving them more credit for their close games, and not giving full credit to every run in a blowout win or loss.

The Marlins allowed 57 more runs than they scored, implying 75 equivalent wins.  The Gradient Wins approach gives them 80 wins.  They actually won 84.

That illustrative team of 9 games winning by 1, and 1 game losing by 9 is 9 actual wins and 1 actual loss, while Palmer said it's 5 wins and 5 losses.  In the Gradient approach, it comes out to 7.2 wins. 

Is this necessarily better than what Palmer or the Pythag methods suggest?  No.  Or at least, I don't know yet.  But it opens the door for better handling blowouts and close games.  We know winning by one run has to be more than 0.6 wins.  We can't treat each run the same.  We also know that winning by one and by ten runs can't be the same, even though you get one actual win.  Does it make more sense to give a one-run win 0.83 wins and a ten-run win 1.28 wins?  I know I like that more than giving it 0.60 and 1.50 wins as Palmer suggests.  And I know I like it more than giving 1 win and 1 win as actual wins says.

Next step is for Aspiring Saberists to take over.


(19) Comments • 2024/03/31 • Run_Win_Expectancy

Wednesday, March 20, 2024

Statcast: Update to Catcher Framing

We made an update in process, with a big payoff at the pitch-level, with an overall modest impact to the catcher framing.  The current method broke up the regions over the plate into 5 regions, with the prominent one being the Shadow-In (80% called strike rate) and Shadow-Out (80% called ball rate), with adjustments for pitcher and venue.  The new method updates the Shadow Zone process so it is a continuous probability from 0 to 100%, using the specific plate location, with adjustments for bat-side and pitch-hand.  Statcast Data Whiz Taylor did the bulk of the work here.

At the aggregated seasonal level, you won't see much difference.  Current Savant and Steamer at Fangraphs, for 2023, have a correlation of r=0.94.  This will increase to r=0.98 with the new model.  The current Savant process would apply adjustments at the aggregated level.  We did this because we never thought that we'd need to show the strike probability on a pitch by pitch basis.  And since Catcher Framing was one of the very first metrics we created, it languished in this regard.  But thanks to Taylor and their team, a process was built to apply adjustments at each pitch.  By doing that, it will allow us to slice/dice the data the way we do with other data, like Catcher Blocking and Throwing, etc.

Here is how the binned data (100 bins) looks like, comparing the predicted strike rate with the actual called rate. (click to embiggen)

Monday, March 18, 2024

NaiveWAR and WAR2.0: Jacob deGrom

As my side-project into NaiveWAR continues, I'd like to also highlight the work of Sean Smith, the progenitor of WAR at Baseball Reference.

I currently have two versions of NaiveWAR.  The first based solely on a pitcher's Won-Loss record.  And the second based solely on the pitcher's Runs Allowed and IP.  Whether in my version, or from Sean Smith, we present it in the form of Individualized Won-Loss Records (aka The Indys).  My biggest failing in presenting WAR was not including The Indys.  And based on what Sean is doing, he seems to perhaps agree as well.  

There's a good reason this is needed because the discussion over the replacement level was actually mostly noise to what is actually WAR.  That is my fault, as that conversation got away from me, and I didn't have a way to control that. 

Anyway, you can see my two versions on the left (and since this is deGrom, you'll be able to guess which version is which).  And Sean's version is on the right.  Sean of course is doing alot more than what my Naive approach is doing.  And, you can see a tremendous amount of overlap.  Which really means that all that tremendous extra work, necessary work, is ALSO noise to the main discussion point of WAR.  Make no mistake about it: not only is Sean right for doing what he is doing, but I will also be doing an enhanced version (eventually, whenever I have the time).  

But more importantly: the Naive approach is necessary to bring everyone to the wading pool, before we jump into the deep end.  WAR has taken on a life of its own, too easy to dismiss because it's too easy not to learn what it is.  That's why the Naive approach is necessary.  We need folks to get into the wading pool, and then into the shallow end, before we get into the deep end.  And what we see with deGrom above is that the difference between the shallow end (Version 2) and the deep end (Sean's version) may not be that big of a dive.

(11) Comments • 2024/03/20 • WAR

Thursday, March 14, 2024

Statcast Lab: Catcher knee height prior to pitch release

Just the first step in looking at this. The left column is the height of the left knee, the top row is the right knee. 

The most common position for the catcher is for the left knee (the glove knee) to be 3-5 inches off the ground, while the right knee is 17 to 20 inches off the ground.

We do see the catcher often enough with their right knee 3-5 inches off the ground, with the left knee 11-20 inches off the ground.  I should probably split this by bat-side (and maybe pitch-hand).

Having both knees up, 14-19 inches off the ground is the least popular of the setups.

I'll be looking as well to see how the called strike rate is affected based on the catcher stance.

(Click to embiggen)

(9) Comments • 2024/03/15 • Statcast Player Tracking

Sunday, March 03, 2024

NaiveWAR and VictoryShares

In my spare time, I'm working on an open-source WAR, that I call NaiveWAR.  Those of you who have been following me know some of the background on NaiveWAR, notably that it is tied (indirectly to start with) to Win/Losses of teams (aka The Individualized Won/Loss Records).  My biggest failing in developing the WAR framework was not also providing the mechanism for W/L at the same time.  That will be rectified.

The most important part of all this is that it's all based on Retrosheet data, and everyone would be able to recreate what I do.  And it would be totally transparent, with plenty of step by step discussion, so everyone can follow along.  I was also thinking of potentially using this as a way to teach coders SQL.  That's way out in the distance, still have to work things out, but just something I've been thinking about as I'm coding this.  I even have the perfect name for this course, which I'll divulge if/when this comes to fruition.

Interestingly, RallyMonkey, who is the progenitor of the WAR you see on Baseball Reference seems to be embarking on a somewhat similar campaign. You can see alot of the overlap, with tying things to W/L records, with the emphasis on Retrosheet.  The important part of doing that is we'd be able to do it EACH way, with/without tying it to W/L, so you can see the impact, at the seasonal, and career, level. In some respects, he'll go further than I will with regards to fielding, mostly because I have so little interest in trying to make sense of that historical data, given the level of access Statcast provides me.  But also partly because by me not doing it, it opens the doors for the Aspiring Saberists to make their mark, that somewhere between my presentation and Rally's presentation, they'll find that inspiration.

All to say: I dunno what I'm trying to say!

(17) Comments • 2024/03/06 • WAR

Saturday, February 24, 2024

Complete Historical Run Expectancy Chart

This shows the following, for the entirety of the Retrosheet data, broken up into roughly 15-20 year time periods

  • Run Expectancy
  • Run Frequency
  • State Frequency
  • Run Value of HR

(Click to embiggen)

(18) Comments • 2024/03/04 • Run_Win_Expectancy

Thursday, February 22, 2024

When is Replacement Level not the Replacement Level

The concept of Replacement Level (though I prefer the term Readily Available Talent, which you will see makes more sense) is pretty straightforward.  What kind of contribution can you get for the minimal cost?  If you have no farm system at all, that level is roughly a .300 win% level.  That's the Readily Available Talent.  By spending the absolute minimum on the free agent market, you will field a .300 team.  At least theoretically.  

In reality, all clubs have a minor league system.  And they spend millions of dollars on players and player development and player acquisition.  Because those players are now Readily Available Talent for no ADDITIONAL cost (the money spent is already sunk), suddenly, the baseline level player is not a .300 win% talent, but probably closer to .350 win% talent.  While this player would cost you just the league minimum, it did cost you in terms of your minor league setup.  

This is why it gets tricky when you try to decide what the baseline level is.  Furthermore, if you decide to field an entire team only from the minor leagues, well, not all the players will be .350 win% talent.  After your very top prospects, you will start to go below that .350 win% level quite quickly.  So a team of your best 40 minor league players is likely going to win you fewer than .300 win%, probably even down to .250 or .200 win%.

Therefore, while the concept of Readily Available Talent is real, as its where all the decision-making happens, the actual level really requires different baselines for different uses.  Sorry to make unclear something that lacked clarity to begin with.

Sunday, February 18, 2024

Attribution of Player Performance within the Context of the game

This is how y'all see it...

Poll 1:

Steve Young goes 15-15 for 180 yards in the 1st half. The score is somehow 0-0.

Joe Montana plays the 2nd half, goes 3-15 for 36 yards, with all 3 completions being TD passes. The final score is 21-0.

Between Montana and Young, who gets more of the share for that 49ers win?

2 to 1 for Montana

Poll 2:

Young plays 1st half of every game completing 75-100% of his passes in each, averaging 12 yds per completion. Yet never throws a TD, his RB never score a TD

Montana plays 2nd halfs completing 20-40% of passes, averaging 9 yds per completion. Season total 32 TD

Who is 49ers MVP?

Almost 2 to 1 for Montana

Poll 3:

Wade Boggs goes for 4-4. Each time, with a runner on 1B. Those runners, plus Boggs, were left stranded at end of each inning

Bottom of 9th, PH Jim Rice leads off inning, hits walkoff HR, to make it 1-0 for Redsox

Between Boggs and Rice, who gets more of share for that Sox win?

Nearly 3 to 1 for Rice

Poll 4:

Wade Boggs has season for the ages, hitting over .400, OBP over .500. Yet somehow scored only 80 runs and drove in only 60, while batting 2nd.

Jim Rice was below average in BA, OBP, SLG, yet somehow managed to score 90 runs and drive in 110, while batting 6th.

Who was Sox MVP?

Overwhelmingly for Boggs

Poll 5:

Joe Carter plays double-header.

Game 1: 1-4 with 0 RBI

Game 2: 1-4 with 3 RBI

Everything else about what Carter did was same in both games

How do we give Carter attribution: with or without caring how it *directly* impacted that game?

Do both games get same value for Carter?

Almost 2 to 1 for Game 2

Poll 6:

Jack Black is pro blackjack player, regularly wins 1000-3000$ each day. But on this particular day, he lost 2000$.

Alan is his friend at the next table, hitting when he should stick, sticking when he should hit. And yet on this same day, he ends up winning 3000$.

What say you:

Overwhelmingly for Process over Results

() Comments

Saturday, February 17, 2024

NaiveWAR 2023

Four years ago, in a series of tweets, I introduced NaiveWAR, essentially the simplest uber-metric possible.  I finally coded it up last night. Here are the results for 2023 (click to embiggen).  

True to its name, I used the absolute most minimum information I could, and still give plausible, naive, results.  That data is exclusively limited to players who participated in: Runs, Outs, Plate Appearances. And that is it.  I couldn't have made it any more naive and still give plausible results. Cashmere Ohtani is that red dot.

() Comments

Friday, February 16, 2024

Statcast Lab: Do some batters overswing?

On his 30% weakest swings, LHH Luis Garcia (Nationals) generated 2 runs per 100 swings above average.  On his 30% hardest swings, he generated 7 runs per 100 swings below average.  He led MLB in terms of that gap in performance.  Can we say he overswings?  I don't know, we'd have to look at each of his swings to see why the results came out as they did.  But he clearly performed better when his swings were the weakest.

On the flip side are batters who far far exceeded their performance on their hardest swings compared to their weakest swings.  Among this group are Ohtani and Yordan Alvarez, who are each around 13 runs above average on their hardest swings and 4 runs below average on their weakest swings.  (League average is +0.5 and -5.0 runs per 100 swings, respectively.)

Of course, you have to be careful here, since a batter is going to potentially check his swing (unsuccessfully), and so the swing speed is not necessarily some sort of independent variable to his approach.

Click to embiggen.


UPDATE: Here is the distribution in speed, as well as the run values, for Garcia and Ohtani. Obviously, Ohtani is in blue. At 81+ is when Ohtani is doing the damage. Garcia you can see had some success at under 68. However, given the combo of 67+68 is a net negative, it may very well be that that is just before-the-fact cherry-picking. That said, Garcia at 74+ or 76+ is a net negative, and it may very well be that he overswings.

(2) Comments • 2024/02/16

Statcast Lab: Swing Speed Distributions by Ball-Strike Count

Only showing 0-2 v 3-0, plus overall.  Click to embiggen

() Comments

Thursday, February 15, 2024

Statcast Lab: Swing Speed Distributions by Pitch Types

(click to embiggen)

(1) Comments • 2024/02/16 • Bat_Tracking

Wednesday, February 14, 2024

Explaining the reasoning behind the construction of OPS+

Suppose the league OBP is .300, and the number of runs scored per game is 4.0

  • If a team's OBP is .330, that is 10% higher than the .300 league, or 110 in OBP+ parlance. So, 10% more runners roughly means 10% more runs. And so, 10% more runs than 4.0 is 4.4. (Assume team SLG matches league SLG.)

Suppose the league SLG is .400, and the number of runs scored per game is 4.0

  • If a team's SLG is .440, that is 10% higher than .400, or 110 in SLG+ parlance. So, 10% more total bases roughly means 10% more runs. And so, 10% more runs than 4.0 is 4.4. (Assume team OBP matches league OBP.)

Now, if you have BOTH 10% more runners AND 10% more total bases, we'll actually end up with roughly 20% more runs.

  • If you do OBP+ plus SLG+ minus 100, you get 110 + 110 - 100 = 120 in OPS+ parlance
  • If you did OPS/lgOPS, you'll get .770/.700 = 1.1 or 110 in OPS+ parlance

What's better/right?

  • To the extent you want to be pedantic, OPS+ in this illustration should be 110. 
  • To the extent what you care about is associating OPS to runs in a 1:1 manner, then OPS+ should be 120.

wRC+ uses the same process in terms of converting wOBA into runs: 200*wOBA/lgWOBA - 100. The only difference is in the name, with wRC+ being clearer as to its intent, and not directly being linked to wOBA by name (even if it is under the hood), other than that lowercase w.

Choose your path.

() Comments

Tuesday, February 13, 2024

The Math behind the NFL OT Playoff Rule

I Cut, You Choose?  It's not exactly that, but it's close to that.

I'm going to come up with some random numbers. I don't follow football enough to give you good numbers, so I'll just try some random numbers.

In this iteration, I'll assume the chance of NOT scoring is 60%. And when you score, it's just as likely you will TD as FG.

So, let's start. Team 1 has the ball, and 20% of the time has 3 points on the board, and 20% of the time they put 7. Now, let's follow each of those three branches, starting with the scoreless one.

  • If Team 2 is also scoreless, it goes into sudden death. We'll assume Team 2 is more likely to score, so let's make it scoreless 55%, and scoring 45%.
  • With the FG branch: we'll assume here Team 2 is more likely to try for the TD. So, scoreless 65% of the time, FG 10%, TD 25%.
  • Finally the TD branch: Team 2 has to be more aggressive, so chance of scoreless is 70%, with 0% for FG, 15% for 6 points (and a loss) and 15% for 8 points (and a win).

The sudden death calculation is a simple calculation. At a 60% scoreless chance for both teams, then it's 62.5% chance for Team 1 to win their sudden death.

All of this now becomes a straightforward probability distribution calculation. And in this illustration, the win% is 52% for Team 1.

Now, what happens if I change the chance of scoreless down to 50%, and adjust everything off that? Now the chance of Team 1 winning is 51%.

If chance of scoreless is down to 40% for any drive, then team 1 winning is 49%.

Indeed, this is how it looks based on the scoreless rate, from 10% to 90%:

So, it is easy enough to see that when you have to input two specific teams, things can change from this baseline, and so what may show here as 47% can in reality be 52%.

That's the baseline. Now, all we need is for someone to come up with something a bit more intricate, and we'll see... probably the same thing.

So, whoever over at NFL ops who came up with this scheme likely proposed this setup because it's around 50/50, all depending on whatever actual teams are involved.

(1) Comments • 2024/02/16
Page 1 of 186 pages  1 2 3 >  Last ›

Latest...

COMMENTS

Apr 24 15:03
How bad will the A’s be?

Apr 11 13:38
Re-introducing WOWY NetGoals and NetShots for NHL

Apr 02 21:16
Bayesian inference: How much new information is contained in a streak?

Apr 01 21:25
Extra Innings: whatsup?

Mar 31 09:34
Goodbye Pythag Wins, Hello Gradient Wins

Mar 21 11:55
Revenge of the Defense

Mar 20 17:14
NaiveWAR and WAR2.0: Jacob deGrom

Mar 15 17:22
Statcast Lab: Catcher knee height prior to pitch release

Mar 07 09:12
Plesac says to NOT stack your lineup with RHH against LHP

Mar 06 17:40
Improving WAR: Pitching

Mar 06 12:50
NaiveWAR and VictoryShares

Mar 04 15:59
Complete Historical Run Expectancy Chart

Mar 03 11:24
VOZ - Value Over Zero

Feb 16 11:35
Statcast Lab: Do some batters overswing?

Feb 16 09:09
Statcast Lab: Swing Speed Distributions by Pitch Types

THREADS

April 17, 2024
Math behind Vertical Movement

April 10, 2024
Re-introducing WOWY NetGoals and NetShots for NHL

April 02, 2024
Bayesian inference: How much new information is contained in a streak?

April 01, 2024
Squeeze play in the ninth while already having an 80% chance of winning?

March 31, 2024
Extra Innings: whatsup?

March 25, 2024
Explaining why Overtime and Extra Innings home-advantage is seemingly less than regulation

March 21, 2024
Goodbye Pythag Wins, Hello Gradient Wins

March 20, 2024
Statcast: Update to Catcher Framing

March 18, 2024
NaiveWAR and WAR2.0: Jacob deGrom

March 14, 2024
Statcast Lab: Catcher knee height prior to pitch release

March 03, 2024
NaiveWAR and VictoryShares

February 24, 2024
Complete Historical Run Expectancy Chart

February 22, 2024
When is Replacement Level not the Replacement Level

February 18, 2024
Attribution of Player Performance within the Context of the game

February 17, 2024
NaiveWAR 2023