NPKtools White Paper
The Lake Wobegon Salary Survey
Question: We use the market average as our comparison point for market pricing. I have some market data where the market average is above the P75. Is this good data or is something fishy?
The Data Doktor: Your situation sounds suspiciously like Garrison Keillor’s Lake Wobegon1, “where the women are all strong, the men are all good looking, and the children are all above average.”
We can laugh along with Keillor, but does the Lake Wobegon phenomenon exist in the realm of pay market data? You might be surprised.
A few years ago I found myself analyzing the data below for one of our clients. The data was provided by a national compensation consulting firm and represents the second rung in a 4 position job ladder. The market data represents the all-sample cut for 15 firms in the Minnesota market (although there the connections with Prairie Home Companion should logically end).
Minnesota Pay Data – National Survey Firm |
P25 |
P50 |
P75 |
34.24 |
46.72 |
59.2 |
Average – 65.82 |
Supposedly, the average for the distribution is above the P75, meaning that the average is above more than 75% of the rest of the sample. Is this mythical data?
To see how this might happen, I constructed a hypothetical market distribution from the summary statistics above, with the requirement that the distribution must conform to the P25, P50, P75, and average reported by the survey vendor. The chart below does not depict the actual company data, only the vendor has those numbers, rather it shows a market distribution that meets the required distribution parameters.
The graphic at left compares ranked pay in thousands for each participating company (dots). The distribution market median (P50) is shown as a purple line; the P25 and P75 are displayed as dashed red lines. Our test distribution has only 15 observations, a small number, but it’s the shape of the distribution that reveals our answer.
- The majority of the observations range between the P25 / P75 dotted lines
- Most of the observations / participants are below the $65,000 average
- Three high-paying firms are enough to skew the reported market average
- Pay for those three firms range between $120,000 and $190,000
When viewed as a histogram, it appears that our distribution may be bimodal:
Most of our cases are clustered in the $40,000-$60,000 range, while two are clearly outliers, perhaps indicating another cluster or “hump” in the distribution. This might lead us to suspect that the sample may be mixing different jobs or markets that are not comparable. Bi-modal distributions can also be an indicator of incorrect job matching.
My conclusion from analyzing the test data above was that my data was indeed fishy! Further investigation suggested the data was misleading because of weaknesses in the survey job matching.
Question: So is this really a big problem or is this some kind of academic exercise?
In our practice over the last two years, we’ve found that reported market averages exceeded P75 or were less than P25 for 2.5% of the 180,000 records we analyzed or about one out of every 40 market records. However, that incidence increases when assessing particular surveys, especially ones that provide a wide variety of market cuts. For one 2003 national survey, the incidence was 11%, i.e., more than one out of ten times the data suggested a potential bi-modal risk.
Question: So how should we diagnose this type of potential data problem?
- First draw your own picture to get a sense of the possible sample that corresponds to the reported data.
- Ask the vendor for an explanation including similarities for high payers (e.g. industry segment) and assessment of changes in the job model. Don’t accept the answer, “that’s what the data says.” Mistakes happen more often than you might think.
- Re-calculate the market percentiles without the outliers to assess the change in market results
- Determine your firms position in your test distribution when outliers are included and excluded to gauge the potential impact on pricing
Question: If I do have a data problem, how can I manage the problem when market pricing?
You have at least four alternatives in managing the problem data in market pricing.
- Pay or shame the survey vendor to re-calculate the market data excluding the outliers – high cost option
- Use your own test distribution results excluding the outliers as a proxy for reasonable market results
- If you have other comparative market records, give the “problem record” a low weight or exclude it from the analysis.
- The fourth alternative is to do nothing. This is the high risk option because the problem will likely come back to bite you next year. Better to deal with the problem today.
Understanding the data is crucial to making informed market pricing decisions. You now know that not all market is 100% grade-A prime. How you use that knowledge for pricing is your decision. The last chapter in this play is yours to write.
|