Brandon McCarthy PITCHf/x: Sliders, Curves, and Slurves

April 10, 2009 • Analysis

News broke late this winter that Texas Rangers RHP Brandon McCarthy would be experimenting with a slurve, a pitch half-way between a slider and a curveball. It was later confirmed that this pitch was intended to replace McCarthy's curveball. I had always believed his curveball was a plus, so this news left me confused.

Yesterday afternoon, Brandon McCarthy debuted his new slurve against Cleveland and PITCHf/x was ready to go. On television, the new pitch didn't look that new, seemingly just a little harder with a little bit sharper break, and more than one person wondered if McCarthy was throwing both a slider and a curve ball.

I grabbed the PITCHf/x data from yesterday's game (April 9, 2009), and decided to compare it with a similar outing. I settled on McCarthy's April 9, 2007 start at home against Tampa Bay. In each start, PITCHf/x identified 4 different pitch types: fastball, curveball, slider, and change up. PITCHf/x data is never perfect, but there's still a lot of great information.

Let's first compare his release points from the catcher's perspective.

Brandon McCarthy's April 9, 2007 pitch release points.
Brandon McCarthy's April 9, 2007 pitch release points.
Brandon McCarthy's April 9, 2009 pitch release points.
Brandon McCarthy's April 9, 2009 pitch release points.

At first glance, it appears that McCarthy's release point has moved about 6 to 10 inches toward third base in the past two years. While definitely interesting, this may or may not actually be the case. In 2007, release points were measured at 55 feet from the back corner of home plate, but the 2009 release points were measured at 50 feet from the back corner of home plate.

Taking a bit of a deeper look reveals that McCarthy's release of his change up is very consistent with that of his fastball with a few stragglers straying up a couple of inches. In 2007, McCarthy's curveball release was a little higher and a little closer to first base, but in 2009, his curveball/slider release is noticeably higher but directly above his fastball release.

Take a look at the pitch movement scatter plots below. Vertical movement is calculated compared to gravity - an approximation of the Magnus effect. This means that zero vertical movement is equal to gravity's effect, while a negative number drops more than gravity and a positive number drops less than gravity.

Brandon McCarthy's April 9, 2007 pitch movement.
Brandon McCarthy's April 9, 2007 pitch movement.
Brandon McCarthy's April 9, 2009 pitch movement.
Brandon McCarthy's April 9, 2009 pitch movement.

Based on the PITCHf/x data shown in the graph, McCarthy's slurve is measurably different from his 2007 curveball. To further illustrate the difference, I grabbed velocity data for the two pitches as well. His average curveball velocity in the 2007 game was 73.47 mph, and his average slurve velocity in the 2009 game was 79.81 mph.

The most important difference between the old curveball and the new slurve is pretty simple: control. In the 2007 game, McCarthy threw 40% (10/25) of his curveballs for strikes. In the 2009 game, McCarthy threw 75% (15/20) of his slurves for strikes.

Yesterday, McCarthy threw 10 of 13 change ups for strikes. He had outstanding overall command of his off-speed stuff, but he really struggled with his fastball command, throwing only 34 of 60 (56.7%) for strikes.

I've noted this in the past, and it's still a major issue. McCarthy has a tendency to drag his arm behind his body when he throws his fastball. This is usually caused when the front shoulder "flies open" by turning toward home plate before the arm is ready to throw. The pitching arm tries to play catch up, but pitches usually wind up high and a tick or two slower when this happens.

In the 3rd inning, pitching coach Mike Maddux trotted out to chat with McCarthy. When he left, McCarthy's fastball jumped from 86-90 to 89-92 for his last 2.1 innings, and he was throwing it down in the zone. PITCHf/x is missing 5 pitches in this span, but after the visit, McCarthy rattled off 10 strikes on his next 12 fastballs.

Outside of that stretch, McCarthy threw only 50% strikes with his fastball. On the up side, Maddux appears to be on top of this, and I expect improvement in this aspect of McCarthy's game throughout the season.

Here are some quick shots:

  • In the 2007 game, McCarthy's fastball was 10" to 15" above gravity, and his curveball was 8" to 13" inches below gravity. That's a visual 18" to 28" of vertical separation between the two pitches. I don't have a comparison ready, but that's a huge difference.

  • McCarthy's fastball is straighter than ever. He's getting better back-spin, so the ball might appear to rise more, but his fastballs are all clustered around zero horizontal movement. In the 2007 game, he was getting a lot more arm-side movement.

  • McCarthy is a tall guy, but it's pretty crazy that he lets go of the baseball when it's nearly 7 1/2 feet off the ground. A fastball to the bottom of the strike zone travels vertically down nearly 6 feet!

  • Joey Matschulat at Baseball Time in Arlington took a look at McCarthy's PITCHf/x data as well - Profiling Brandon McCarthy: A Pitch F/X Snapshot.


Texas Rangers Win-Curve Part II: Playoff Probability

February 4, 2009 • Analysis

This is Part II of a series that examines the Texas Rangers 2009 revenue outlook in a rough version of the framework laid out by Vince Gennaro in his fantastic book Diamond Dollars. Check out the Offline Reading list for other great reads.

In Part I, Texas Rangers Win-Curve Part I: Wins vs. Attendance, I walked through a model for predicting 2009 home attendance based on the team's on-field success as measured by wins.

Part II aims to add another piece to the puzzle by determining a team's chances of making the playoffs for a given number of wins.

WHAT EVERYONE KNOWS

Two types of teams make to the playoffs: 3 division champions and 1 wild card team.

The more games a team wins, the better its chances are for making it into the playoffs by either method.

In reality, for a given number of wins, a team will either make it to the playoffs or not. There are only two outcomes: 'yes' and 'no'.

MODELING THE DATA

Because there are only two outcomes, the data can be modeled with a logistics curve. The curve is created by a generalized binomial regression. Basically, using an independent variable (wins), it determines the probability that the dependent variable (team makes the post-season) is true.

I gathered 11 years of historical data for the American League in its current alignment - since Tampa Bay's inaugural season in 1998.

I ran regressions for each division and for the American League as a whole.

THE RESULTS

Wins vs Post-season Probability - American League 2009
Wins vs Post-season Probability - American League 2009

One hypothesis that I was eager to test was that for teams in smaller divisions, like the 4-team AL West, the odds of winning the division (and therefore the odds of making the playoffs) are greater than for teams in a 6-team division like the NL Central.

I tested this hypothesis by comparing the curves for each of the three divisions against the American League curve. Essentially, all 4 curves are the same but shifted either to the left or to the right.

The AL West curve, surprisingly, is shifted right, meaning it is harder to make the playoffs in the AL West than in the AL as a whole. The AL Central curve is shifted left, and the AL East curve showed a right shift approximately equal to the shift in the AL West curve.

At 92 wins, an AL team has had a 66.96% chance to make the playoffs. The AL West, AL Central, and AL East have had 62.69%, 78.24%, and 61.94% chances, respectively, at the 92-win level.

Since it has been easier to make the playoffs in the 5-team AL Central than it has been in the 4-team AL West, the hypothesis does not hold up. The difference between the AL West and AL East was barely noticeable.

AL West
AL West
AL Central
AL Central
AL East
AL East

TEXAS RANGERS POST-SEASON PROBABILITY

The two curves that apply to the Rangers, the AL curve and the AL West curve, are fairly similar. At 80 wins, the AL curve shows a 0.35% chance, and the AL West curve shows a 0.24% chance.

In what appears like it could be a weak division in 2009, 85 wins might be enough to get the Rangers in. Historically, though, 85 wins has resulted in only a 4.70% chance on the AL curve and an even smaller 3.58% chance on the AL West curve.

If the Rangers make the improbable jump from 79 wins to 95 wins, the AL curve gives them a 90.88% chance of making the post-season, while the AL West curve gives them an 89.59% chance.

Based on the 2009 outlook, if any AL West team can get to 95 wins, it should win the division handily. One team reaching that level would have a fair amount of shock value by itself, but if two teams hit the 95-win mark, it would be absolutely stunning.

APPLYING THE POST-SEASON EFFECT

When a team makes the post-season, the fan response typically includes increases in season ticket sales, television ratings, and merchandise sales. This post-season effect has a tangible benefit on team revenue for current and future seasons.

According to Gennaro's model, a net present value (NPV) is calculated for the post-season effect. For each win, the NPV is multiplied by the post-season probability for that win total, and the resulting value is added to that point on the win-curve.

In Part III, the dual focus will be on turning attendance figures into attendance dollars and assigning a value to the post-season effect.


Eric Hurley: Hamstrings, rotator cuffs, and Mark Connor

January 26, 2009 • Analysis
Eric Hurley. (Source: Jason Cole, LoneStarDugout.com)
Eric Hurley. (Source: Jason Cole, LoneStarDugout.com)

It has become somewhat fashionable to blame Mark Connor for Eric Hurley's shoulder problems - a torn rotator cuff and frayed labrum. Without a full blown analysis of Hurley's arm action, though, it is also hard to say with certainty that his mechanics were responsible.

Given the nature of these injuries, though, Hurley's mechanics and hamstring are far more likely than Connor's teachings to be the cause of Hurley's shoulder injuries.

ANATOMY

Like any soft tissue in the body, rotator cuff muscles and tendons are torn gradually over time as stress creates micro-tears that build up and compound. There are exceptions, of course, but most of them involve severe external trauma like violent collisions and power lifting.

In pitching, the rotator cuff contracts most powerfully during the deceleration phase as it tries to keep the humerus from twisting and flying out of socket.  When the arm moves across the body, the head of the humerus becomes an obstacle to this contraction.  This forces the muscles to contract "around a corner" which adds more tension to the muscle than it can create on its own.

A frayed labrum is an early stage SLAP (superior labrum from anterior to posterior) lesion. Later stage SLAP lesions are commonly referred to simply as "torn labrums". The lesions are caused by the compressive force and friction created when the long head of the biceps brachii contracts and pulls directly on the glenoid labrum in an unnatural manner.

Certain arm actions, most notably transverse hyperabduction of the shoulder (scap-loading), can position the head of the humerus as an obstacle to the contraction of the biceps creating extra tension on the labrum where the long head of the biceps attaches.

Since part of the long head of the biceps merges with the labrum, SLAP lesions can sometimes be misdiagnosed as biceps tendinitis.  This is was the reported initial diagnosis for Hurley's shoulder injury on July 30, just three days after his final start of 2008.  On August 1, the Rangers reported that it was, in fact, shoulder soreness.

ATTACK OF THE HAMSTRING

Hurley was cruising along fairly well before he injured the hamstring of his left leg - his landing leg.

The hamstring of the landing leg experiences an eccentric contraction as the upper body moves forward over the waist. A negative change in the muscle's flexibility can decrease the amount of trunk flexion and/or shoulder rotation that occurs during a pitch. Since the body is less engaged in the deceleration of the arm, the shoulder handles more of the load than it would with normal hamstring flexibility.

Limited trunk flexion or shoulder rotation can cause the throwing shoulder's forward movement to stop early, even though the arm tries to continue moving toward the plate. The force of this action slings the arm across the body and moves the head of the humerus into the path of the muscular contraction as described above.

Hamstrings are notoriously slow-healing muscles, and flexibility can be compromised for a long period even after the muscle is fully functional.

Hamstring injuries will not always lead to shoulder injuries, but they represent a huge risk factor for someone already dealing with a weakened shoulder.

WAS HURLEY'S SHOULDER ALREADY WEAKENED?

Terry Clark (left), his mustache, and Michael Schlact. (Source: Jason Cole, LoneStarDugout.com)

The answer to this question simply has to be, "Yes."

In 234 minor league innings over the last two seasons, Hurley worked primarily with Rick Adair, Terry Clark, and Andy Hawkins - one of whom has one of the greatest mustaches in baseball.  All three coaches are extremely well regarded; none of them is Mark Connor.

Mark Connor was Hurley's primary pitching coach for about 32 innings, all in 2009. (Hurley had a 7.1-inning rehab start in Frisco near the end of that 32-inning span.) 32 innings is simply not enough to tear the healthy rotator cuff of a professional pitcher - someone whose rotator cuff should be exceptionally strong and well conditioned.

Barring severe external trauma, his shoulder must have been compromised before reaching the Majors and long before he hurt his hamstring.

SUMMARY

Hurley likely began damaging his shoulder well before his injuries became apparent. One can argue about the inevitability of a major tear, but excluding an external traumatic event, Hurley's mechanics are the most likely cause of the injuries.

When his hamstring started giving him trouble, his body compensated for that injury, effectively placing more (too much) stress on a rotator cuff that was, in all likelihood, already damaged.

Rotator cuffs simply don't tear suddenly enough to blame Connor for the injury.

Why Hurley was allowed to start that last game (and remain in it for as long as he was) is a different matter entirely.


Texas Rangers Win-Curve Part I: Wins vs Attendance

January 23, 2009 • Analysis

Fans respond to winning, but different fan bases respond differently. Fan response is most easily measured by a team's revenue stream, the largest factor of which is home attendance - essentially a measurement of demand. It follows that if one can understand the relationship between wins and attendance, then one can reasonably predict revenue at different win levels.

When plotted on a wins versus revenue graph, the function that predicts these points is called a win-curve. (NOTE: The win-curve is actually disjoint. Since modern-era Major League Baseball games do not end in ties, there are no fractional wins. The line itself serves strictly as an illustration because win totals will always be integers.)

The concepts discussed here are elaborated on in far greater detail in Vince Gennaro's Diamond Dollars - a book that I highly recommend if you find that this article sparks your interest (check the Offline Reading list). Mr. Gennaro developed win-curves for all 30 teams as a part of his research into these relationships. Using 37 years of historical data, I attempted to build my own for the Texas Rangers.

This is Part I: Wins vs. Attendance.

PREPARING THE DATA

Each team's win-curve is different. The trick to building an accurate win-curve is identifying trends in the relevant market. In Mr. Gennaro's model, he used a 50-50 weighted average of the previous year's wins and the current year's wins, and he compared that to a per-game average of the current year's home attendance.

He accounted for a "new franchise halo", overall industry growth, new stadium effects, and work stoppages. Due to the small sample size, these effects were identified and generalized for Major League Baseball as a whole, rather than for individual teams.

Based on the assumption that each market behaves differently, it is unreasonable to assume these effects will be the same from market to market, but they do represent a reasonable approximation.

In my attempts to recreate Mr. Gennaro's work, I struggled to capture these effects. Without doing similar studies for the other 29 teams, I tried to find other ways to account for these other attendance factors.

Wins

As Gennaro surely did, I played with several different values for my wins variable. Each was represented as a winning percentage to help adjust for seasons of different lengths. Here is the list of my various definitions for wins:

  • Gennaro's average of previous wins and current wins
  • Separate variables for previous wins and current wins
  • Average wins of the three most recent seasons
  • Separate variables for the wins of the three most recent seasons

For each definition of wins, I tried different weights. In nearly every model, the only significant variable from the group was current wins, though the previous wins variable was significant in a few. In the final model, I chose current winning percentage as my wins variable.

"New Franchise Halo"

This effect was minimal at best with the Rangers. In their first two years, 1972 and 1973, the Rangers averaged fewer than 4,000 fans per home game. Early on, the Rangers weren't very good, but after several years in Arlington, attendance climbed to well above 15,000 fans per game. This is counter to the typical new franchise halo, where a team sees an early boom that tapers off after a few seasons.

The Metroplex area has only had one Major League Baseball franchise, so there is only one from which to develop a model, leaving too few samples to effectively quantify the "new franchise halo" for Dallas/Fort Worth.

Industry Growth

To measure industry growth, a simple counting variable was added - a value of 0 in 1972, up to 36 in 2008. Effectively, this functioned the same as a "Years in Area" variable. In the final model, this variable is not significant - p = 0.255 (approx.) - but its inclusion in the model resulted in a smaller standard error and better R-square and Adjusted R-square values than when it was excluded.

New Stadium Effect / Work Stoppages

The Rangers are a unique franchise in this aspect. In 1994, the year they opened the only new stadium in their history, the MLBPA went on strike and the World Series was canceled. The strike continued into 1995, dramatically reducing and effectively negating the new stadium effect that was experienced prior to the strike. As a result, in every regression that was run, 1994 was a positive outlier and 1995 was a negative outlier.

Until 2008, these two seasons were the only outliers in every single model tested.

In 1996, the Rangers began the winningest period in the franchise's history, and in each of the models I ran, the results suggested that this winning period was responsible for the high attendance experienced during that period rather than the new stadium. The coincidence is somewhat striking, and in actuality, it was probably a combination of the two that resulted in the high attendance averages.

I added a value for stadium age (in years) to try and capture the new stadium effect, but even after figuring in Arlington Stadium's prior existence as Turnpike Stadium (making it 7 years old in 1972), the variable failed to be significant.

Other Variables

I tried several other variables to help build a better statistical model. Though none turned out to be significant, the following variables were included at some point during testing:

  • Years since playoff appearance
  • Made playoffs (1 or 0)

I also tried to include TMR's Fan Cost Index, but I was only able to find data for the 18 most recent of the 37 seasons. The lack of sample data for this variable resulted in its exclusion from the tested models.

THE FINAL MODEL

When narrowing my model down to the most relevant sample data, I greatly simplified my thinking. Instead of trying to identify individual factors that affect attendance, I realized that, historically, all of this information already existed as a single variable: attendance. It is definitely not the perfect solution, but last season's attendance is the most significant indicator of attendance for the current season.

Based on the models I ran, none was able to predict the huge drop off experienced in 2008. Something that I think might be responsible is the price of gasoline and the increased reliance on gas-guzzling vehicles. Not only did families have less disposable income to spend on baseball games, but the trip to the games became more expensive. Without an effective public transportation solution as an alternative means to get to the ballpark, the mostly commuter fan-base spent their money elsewhere.

After including the previous year's attendance in my models, its significance was immediately apparent, but two other variables remained viable: current winning percentage and growth factor (the counting variable discussed above). Logically, this makes sense.

With this three-variable model, three seasons stood out as dramatic outliers: 1994 (new stadium), 1995 (strike), and 2008 (transportation cost).

It is statistically questionable to eliminate outliers, but in this case, I think it makes sense. I understand that this brings the study as a whole into question, but I'm going to run with it anyway.

In 2009, the Rangers will not be moving into a new stadium; there is no labor conflict on the horizon; and for now, the price of gasoline has returned to a reasonable level.

Because another spike in gasoline prices is possible, 2008 was not removed from the data set. 1972, 1981, and 1995 were all removed because of work stoppages, and 1994 was removed because of the new stadium. (The years that followed still used accurate previous year attendance, so the effects of these events were carried forward to future years even though the immediate effects were not a part of the model.)

The final model included data from 1973 through 2008, skipping over 1981, 1994, and 1995. The dependent variable, of course, was current season average home attendance. The independent variables were the previous season's average home attendance, the current season's winning percentage, and the growth factor described above.

CONCLUSIONS

The model says that for 2009, Texas Rangers home attendance can be estimated within 2,646 attendees per game using a chosen win level, the 2008 attendance per game (24,021), and the growth factor (37).

The graph below represents the relationship between wins and attendance for 2009.

winsattendance2009.gif

The yellow dot marks the 2008 average home attendance, and the red dot marks last season's win total. According to the model, the Rangers should see an increase in attendance over last season for as few as 69 wins.

I think the transportation cost will be a huge factor going into the 2009 season, since the most viable method of getting to the ballpark is to drive.

RELEVANT STATISTICS NOTES

For the final model, the R-square value is 0.904, the adjusted R-square is 0.894, with a standard error of 2,646.

The independent variables have the following p-values: previous attendance average < 0.000002, winning percentage < 0.0008, and growth factor < 0.255.

As discussed above, the removal of the growth factor variable resulted in smaller R-square values and a higher standard error, so it was left in the final model.

IN PART II

In Part II, I will tackle the topic of post-season probability at different win levels. Combined with this article, it will be possible to start turning these numbers into dollars.