A Blog on Analytics and Marketing

SAS, Marketing, Predictive Modeling, Statistics

Deploy your excel macros from a central file

Posted by phillippeng on October 7, 2009

There are some advantages of deploying your excel macros from a central file:

1. You can manage all your macros in one file. Especially, when you need to backup all your handy utility macros, it’s very convenient.
2. You can activate all the macros when you start Excel every time.

It’s quite easy to do so. You just need to save all your macros in a file called personal.xls in the excel start folder. The path should look like ‘C:\Program Files\Microsoft Office\OFFICE11\XLSTART\’.

Next time when you develop a new macro, save the macro in this personal.xls and your new macro will be added to this file.

Posted in Excel, MS Office | Tagged: | Leave a Comment »

Ten Reasons Why Models May Fail

Posted by phillippeng on September 14, 2009

Kent Leahy and Nethra Sambamoorthi list ten most common reasons why predictive models in marketing may fail. These top ten reasons are:

(1) Modeling strategy design. The person who will actually be building the model is not included in the initial discussions or design of the model.
(2) Model overfitting. The model has been “overfit” to the sample at hand ,and, consequently, does not generalize well to the actual mailing population, or is otherwise unreliable.
(3) Population shift due to environment changes. The circumstances surrounding the actual mailing change or the mailing environment turns out to be substantially different from the one on which the model was built.
(4) Model generalization too much. The model is used as though it were ‘generic’ or ‘universally applicable’.
(5) Population shift and model overfitting. Changes in the mailing environment in conjunction with the use of an ‘overfitted’ model.
(6) Model out-of-date. The model contains “post-event” variable(s), or those that occurred after the event you are trying to predict.
(7) Model validation and implementation. Not ‘test-scoring’ the model, or making an error when implementing the model.
(8) Sample selection QC. Failing to run an audit of the file as the first step in the model-building process.
(9) Miss the model expectation. A consensus on just exactly what the model is expected to predict (and for which audience) is not reached and/or well understood.
(10) Poor fanancial Planning. The model performs well but the mailing itself is not a financial ‘success’.

Reference: http://www.crmportals.com/crmnews/2002123.html

Posted in Marketing, Modeling | Leave a Comment »

Resources from SIAM

Posted by phillippeng on September 10, 2009

SIAM stands for Society for Industrial and Applied Mathematics. Invited and Prize Lectures from the 2008 and 2009 Annual Meetings are captured as slides with audio. You can access them through the following link:


The proceedings section includes good coverage on data mining:


Posted in Modeling | Tagged: | Leave a Comment »

Text Editor to Recommend: Notepad++

Posted by phillippeng on September 3, 2009

As I search for a free but powerful text editor tool, Notepad++ just comes on top of the list. Notepad++ is widely used by advanced programmers.

Download Page:  http://download.cnet.com/Notepad/3000-2352_4-10327521.html

Great resource Link:  http://notepad-plus.sourceforge.net/uk/download.php

Notepad++ has built in formatting modules for many programming languages. However, SAS is not one of them.  You can add SAS as a user-defined language following steps below:

1. Download the user-defined language to your computer
2. Click start, run, type (or paste in) %APPDATA%\Notepad++ then click ok
3. Rename userDefineLang_SAS.xml as userDefineLang.xml
4. Copy and paste userDefineLang.xml file into the directory. If you add more than one languages, simply
copy everything from each file starting at <UserLang…> to </UserLang> and paste it at the end of the userDefineLang.xml
right before </NotepadPlus>.
5. Save the newly improved userDefineLang.xml

SAS language formatting file was originally created by Joakim and can be downloaded from SAS. I have made some changes to this file. If you have interests in it, please email me (ppeng08@gmail.com).

I’d like to call out some great features about Notepad++:

1. Highlighting your coding language;
2. Column selection and edition;
method: click the starting position –> alt+shift+click the ending position
You can then edit by column. It works great.
Column edit (alt+C) from the menu does not work the way I want.
3. Able to open an incomplete file. For example, you want to check the SAS log file while you code is still running. Regular text editor just cannot open such a file. Notepad++ can read whatever in the current file. Once there are more information added to the file, a window will pop-up asking if you want to reload/refresh the file in the window.
4. Advanced search and replace. When you search for one word, the same word in the whole file will be highlighted. This helps check your spelling and the keyword referencing in the file.


You will find more great features as you begin to use it.

Posted in Computer, SAS, SQL | Leave a Comment »

How To Select All Columns Except One Column From A Table?

Posted by phillippeng on July 31, 2009

If you have a table with more than ten columns and you need to select all columns except one column, you may not want to type the column names one by one. Especially, it becomes very challenging if you have too many columns. Here’s my trick to do this.

1. Select all columns into another table and drop the column you don’t want.
select * into #temp from table_a
alter table #temp
drop column col1              (col1 is the column that you don’ t want to select.)

2. Rename the column and then drop.
When you want to join two tables by matching the ID column in each table, you want to use ‘select * from table_a, table_b by a.id = b.id’. However, an error message will tell you to select only a.id or b.id. It comes back to our old question. But we cannot drop the ID column first because we need to keep it for the Join purpose. The easy answer is to rename the column and then drop it.
sp_rename ‘Table_a.ID’, ‘ID_a’, ‘Column’

Posted in SQL | Tagged: , , | Leave a Comment »

SAS: Verify function to check invalid data

Posted by phillippeng on July 29, 2009

Syntax for verify function:
verify(character_variable, verify_string)

The verify function returns the first position in the character_variable that contains a character that is not in verify_string. If no invalid data exists, the returned value will be 0.

Initial solution:  IF verify(ID, ‘XYZ012345’) then ….;
Suppose ID = ‘X12B44’, the returned value is then 4.

potential problems:
1. varying length of a variable: There are training blanks which are considered as characters and not in the verify_string.
verify(trim(ID), ‘XYZ012345’)
will return 0 for all valid values of ID, even though they are shorter then specified length.
2. Missing value: The expression above will return a 1 for a missing value of ID. Missing values are considered the same as blanks.
Solution: use missing function

Final solution: if verify(trim(ID), ‘XYZ012345’) and not missing(ID) then …;

In SAS 9 or higher version, you can use NOTDIGIT function:
notdigit(character_value) is equivalent to
verify(character_value, ‘0123456789’)

Posted in SAS | Leave a Comment »

SAS Macro – Get missing counts for all character variables

Posted by phillippeng on July 23, 2009

%macro charvar_nmiss(base,dsn, outds);
*Counting number of character variables in the dataset;
proc contents data=&base..&dsn out=dsn_vars(keep=name type)  noprint; run;

data dsn_vars; set dsn_vars; where type = 2; seqno = _n_; run;

proc sql;
select name into: varlist
separated by ‘ ‘
from dsn_vars;

proc sql noprint;
select count(*)into :nvars from dsn_vars;

*Get the missing counts for each character variable in the dataset;
%do i = 1 %to &nvars;
%LET varn=%SCAN(&varlist,&i,%STR( ));
proc sql noprint;
create table varnmiss&i as
select distinct &varn., count(*) as nmiss
from &base..&dsn.
where &varn. = ‘ ‘;

*Concatenating all the datasets;
data final(keep=id nmiss);
set %do i= 1 %to &nvars;

*Clean the space;
proc datasets nolist; %do i = 1 %to &nvars; delete varnmiss&i; %end; run;

*Final dataset with list of variables and missing counts;
proc sql;
create table &outds as
select a.name ,b.nmiss
from dsn_vars as a, final as b
where a.seqno=b.id;


Posted in SAS | Tagged: , , | Leave a Comment »

SAS macro to Load a SAS dataset to SQL Server

Posted by phillippeng on July 22, 2009

%macro dbload(databasename, userid, password, sasdn, sqltable);
proc dbload dbms=odbc data=&sasdn.;
%mend dbload;

You can define a libname using odbc and then load the data through data step. However, there are advantages with PROC DBLOAD over data step:

1. You can re-define the column data type for the loaded table.
2. You can use PROC DBLOAD APPEND to just append new records to a table.

Posted in SAS, SQL | Leave a Comment »

SAS – Create a file list under the same folder

Posted by phillippeng on July 22, 2009

%let mydir=c:\working directory;
/*DOS command to get a list of all data files programs */
filename mylist pipe “%str(dir %”&mydir%”\pre_* /B) ” lrecl=5000;
/*Adding ‘pre_’ gives all the files with a name starting ‘pre_’. To list all the files, just use ‘*’ */

/*put the list of files into a data set for macro processing*/
data mylist;
infile mylist truncover;
input filename $char50.;
put _infile_;

proc sql;
select substr(filename,1,10) into:filelist
separated by ‘ ‘
from mylist;

%put &filelist;

Posted in SAS | Tagged: | Leave a Comment »

Stratified sampling in SQL Server

Posted by phillippeng on July 9, 2009

As I know, there’s no well documented way to do stratified sampling in SQL server. Below is my solution to this need:

— randomly sort the table

select * into #temp1
from table_name
order by newid()

— create group_id
select row_number() over(partition by strata_name order by strata_name) as rowid, *
into #temp2
from #temp1

–set the sample size for each strata. If the sample size (set to 5 in this sample code) is greater than the available counts for a specific group, all the available records are pulled.
select strata_name, sampsize = case when counts > 5 then 5 else counts end
into #temp3
from (select strata_name, count(*) as counts
from #temp2 group by strata_name) a
–Pull the sample
select a.*
into table_name_sampled
from #temp2 a, #temp3 b
where a.strata_name = b.strata_name and a.rowid <= b.sampsize
— Clean up the temporary tables
drop table #temp1
drop table #temp2
drop table #temp3

Posted in SQL | Tagged: , , | Leave a Comment »