Median in SQL
12. May 2007 08:43Median
A median is yet another type of statistic that you could loosely say refers to the middle value in a sample set. To be more precise, the median is the value in a sample set that has the same number of cases below it as above it.
Calculating a Median
Problem
You want to calculate the median bulb life for all bulbs that have been tested.
Solution
To calculate the median of the light-bulb test results, use the following query:
SELECT x.Hours medianFROM BulbLife x, BulbLife yGROUP BY x.HoursHAVINGSUM(CASE WHEN y.Hours <= x.HoursTHEN 1 ELSE 0 END)>=(COUNT(*)+1)/2 ANDSUM(CASE WHEN y.Hours >= x.HoursTHEN 1 ELSE 0 END)>=(COUNT(*)/2)+1
Discussion
This query follows the definition of the median very closely and uses the solution published severalyears ago by David Rozenshtein, Anatoly Abramovich, and Eugene Birger. Their solution is still regarded as one of the classical solutions to the problem of finding the median value in a sample ;To understand their solution, it helps to look at the query in two phases. First, you have a GROUP BY query that returns the number of bulbs for each distinct lifetime. The following is a modified version of the first part of the solution query that returns the occurrence count corresponding to each distinct bulb-life value:
SELECT COUNT(*) occurrences, x.Hours xhoursFROM BulbLife x, BulbLife yGROUP BY x.Hoursoccurrences xhours----------- -----------40 104320 105720 107320 107420 107720 1079ETC...
Because these results represent a self-join of the BulbLife table with itself, each group represents a number of detail rows equivalent to the number of rows in the sample. The two groups of 40 occurrences each exist because the data contains 2 cases with values of 1043 and 2 cases with values of 1085.
To determine the median follow these steps:
- Count the cases where the y.hours value is less than or equal to the x.hours value.
- Count the cases where the x.hours value is less than or equal to the y.hours value.
- Compare the two results.
The simplified version of median calculating :
select
max(income) as median_income
from
(select income,
count(branch_cust_ip) over (partition by migzar_nalan) as sum_income,
RANK( ) OVER (PARTITION BY migzar_nalan ORDER by income) as ranker_income,
(( ranker_income *2)/(sum_income+1)) +1 as R_L_hahn
from dwp_sas.ang_basis_migzar1_c ) a
where R_L_hahn=1
Statistics in SQL

Email 