A Blog on Analytics and Marketing

SAS, Marketing, Predictive Modeling, Statistics

Posts Tagged ‘row_number function’

Stratified sampling in SQL Server

Posted by phillippeng on July 9, 2009

As I know, there’s no well documented way to do stratified sampling in SQL server. Below is my solution to this need:

— randomly sort the table

select * into #temp1
from table_name
order by newid()

— create group_id
select row_number() over(partition by strata_name order by strata_name) as rowid, *
into #temp2
from #temp1

–set the sample size for each strata. If the sample size (set to 5 in this sample code) is greater than the available counts for a specific group, all the available records are pulled.
select strata_name, sampsize = case when counts > 5 then 5 else counts end
into #temp3
from (select strata_name, count(*) as counts
from #temp2 group by strata_name) a
–Pull the sample
select a.*
into table_name_sampled
from #temp2 a, #temp3 b
where a.strata_name = b.strata_name and a.rowid <= b.sampsize
— Clean up the temporary tables
drop table #temp1
drop table #temp2
drop table #temp3

Posted in SQL | Tagged: , , | Leave a Comment »