A Blog on Analytics and Marketing

SAS, Marketing, Predictive Modeling, Statistics

Archive for July, 2009

How To Select All Columns Except One Column From A Table?

Posted by phillippeng on July 31, 2009

If you have a table with more than ten columns and you need to select all columns except one column, you may not want to type the column names one by one. Especially, it becomes very challenging if you have too many columns. Here’s my trick to do this.

1. Select all columns into another table and drop the column you don’t want.
select * into #temp from table_a
alter table #temp
drop column col1              (col1 is the column that you don’ t want to select.)

2. Rename the column and then drop.
When you want to join two tables by matching the ID column in each table, you want to use ‘select * from table_a, table_b by a.id = b.id’. However, an error message will tell you to select only a.id or b.id. It comes back to our old question. But we cannot drop the ID column first because we need to keep it for the Join purpose. The easy answer is to rename the column and then drop it.
sp_rename ‘Table_a.ID’, ‘ID_a’, ‘Column’


Posted in SQL | Tagged: , , | Leave a Comment »

SAS: Verify function to check invalid data

Posted by phillippeng on July 29, 2009

Syntax for verify function:
verify(character_variable, verify_string)

The verify function returns the first position in the character_variable that contains a character that is not in verify_string. If no invalid data exists, the returned value will be 0.

Initial solution:  IF verify(ID, ‘XYZ012345’) then ….;
Suppose ID = ‘X12B44’, the returned value is then 4.

potential problems:
1. varying length of a variable: There are training blanks which are considered as characters and not in the verify_string.
verify(trim(ID), ‘XYZ012345’)
will return 0 for all valid values of ID, even though they are shorter then specified length.
2. Missing value: The expression above will return a 1 for a missing value of ID. Missing values are considered the same as blanks.
Solution: use missing function

Final solution: if verify(trim(ID), ‘XYZ012345’) and not missing(ID) then …;

In SAS 9 or higher version, you can use NOTDIGIT function:
notdigit(character_value) is equivalent to
verify(character_value, ‘0123456789’)

Posted in SAS | Leave a Comment »

SAS Macro – Get missing counts for all character variables

Posted by phillippeng on July 23, 2009

%macro charvar_nmiss(base,dsn, outds);
*Counting number of character variables in the dataset;
proc contents data=&base..&dsn out=dsn_vars(keep=name type)  noprint; run;

data dsn_vars; set dsn_vars; where type = 2; seqno = _n_; run;

proc sql;
select name into: varlist
separated by ‘ ‘
from dsn_vars;

proc sql noprint;
select count(*)into :nvars from dsn_vars;

*Get the missing counts for each character variable in the dataset;
%do i = 1 %to &nvars;
%LET varn=%SCAN(&varlist,&i,%STR( ));
proc sql noprint;
create table varnmiss&i as
select distinct &varn., count(*) as nmiss
from &base..&dsn.
where &varn. = ‘ ‘;

*Concatenating all the datasets;
data final(keep=id nmiss);
set %do i= 1 %to &nvars;

*Clean the space;
proc datasets nolist; %do i = 1 %to &nvars; delete varnmiss&i; %end; run;

*Final dataset with list of variables and missing counts;
proc sql;
create table &outds as
select a.name ,b.nmiss
from dsn_vars as a, final as b
where a.seqno=b.id;


Posted in SAS | Tagged: , , | Leave a Comment »

SAS macro to Load a SAS dataset to SQL Server

Posted by phillippeng on July 22, 2009

%macro dbload(databasename, userid, password, sasdn, sqltable);
proc dbload dbms=odbc data=&sasdn.;
%mend dbload;

You can define a libname using odbc and then load the data through data step. However, there are advantages with PROC DBLOAD over data step:

1. You can re-define the column data type for the loaded table.
2. You can use PROC DBLOAD APPEND to just append new records to a table.

Posted in SAS, SQL | Leave a Comment »

SAS – Create a file list under the same folder

Posted by phillippeng on July 22, 2009

%let mydir=c:\working directory;
/*DOS command to get a list of all data files programs */
filename mylist pipe “%str(dir %”&mydir%”\pre_* /B) ” lrecl=5000;
/*Adding ‘pre_’ gives all the files with a name starting ‘pre_’. To list all the files, just use ‘*’ */

/*put the list of files into a data set for macro processing*/
data mylist;
infile mylist truncover;
input filename $char50.;
put _infile_;

proc sql;
select substr(filename,1,10) into:filelist
separated by ‘ ‘
from mylist;

%put &filelist;

Posted in SAS | Tagged: | Leave a Comment »

Stratified sampling in SQL Server

Posted by phillippeng on July 9, 2009

As I know, there’s no well documented way to do stratified sampling in SQL server. Below is my solution to this need:

— randomly sort the table

select * into #temp1
from table_name
order by newid()

— create group_id
select row_number() over(partition by strata_name order by strata_name) as rowid, *
into #temp2
from #temp1

–set the sample size for each strata. If the sample size (set to 5 in this sample code) is greater than the available counts for a specific group, all the available records are pulled.
select strata_name, sampsize = case when counts > 5 then 5 else counts end
into #temp3
from (select strata_name, count(*) as counts
from #temp2 group by strata_name) a
–Pull the sample
select a.*
into table_name_sampled
from #temp2 a, #temp3 b
where a.strata_name = b.strata_name and a.rowid <= b.sampsize
— Clean up the temporary tables
drop table #temp1
drop table #temp2
drop table #temp3

Posted in SQL | Tagged: , , | Leave a Comment »