A Blog on Analytics and Marketing

SAS, Marketing, Predictive Modeling, Statistics

Scoring observations using PROC FASTCLUS

Posted by phillippeng on October 6, 2008

PROC FASTCLUS can be used to perform a k-means clustering for observations. All the observations in the training dataset are assigned to clusters on the basis of the parameterization of the procedure and of their variable values. Scoring the observations in the validation dataset using PROC FASTCLUS seems a little bit challenging because the cluster assignment rules depend on new observations now.

Scoring new observations without changing the cluster assignment rules can be achieved by using a SEED dataset in PROC FASTCLUS.

/*original clustering */

%let indsn = input;  *your input dataset;
%let nclus = maxclus; *number of clusters to request;
%let indvars = varlist; *independent variables to run proc fastclus on;
%let valid = val_data; *validation dataset to score;

proc fastclus data=&indsn maxclusters = &nclus outseed= clusterSeeds;
var &indvars;

/*scoring new observations using the seed dataset */
proc fastclus data=&valid  out=&valid._scored seed = clusterSeeds maxclusters = &nclus maxiter = 0;
var &indvars;

“Data Preparation for Analytics Using SAS” By Gerhard Svolba, Gerhard Svolba, Ph.D.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: