A Blog on Analytics and Marketing

SAS, Marketing, Predictive Modeling, Statistics

Use PROC SQL to merge large datasets – Nodupkey

Posted by phillippeng on September 29, 2008

Generally, we have two approaches to merge two datasets: merge statement in a data step and proc sql. Recently, I realized the advantages of using proc sql to merge large datasets.

Merge statement in a data step approach is simple, but it requires you to sort the data first. For a large data set, it may be very time consuming. For example, I have a data set with 33mm records. It took me almost one hour to only sort the data on the server. Using PROC SQL, it only took 25 minutes to finish the whole merge process.

In this post, I am going to summarize the PROC SQL way to achieve the nodupkey option in data step.

proc sql;
create table comb as
select distinct a.*, b.id
from one a right join two b
on a.id= b.id
quit;

If you try above code, you won’t get distinct match from dataset a as you use the nodupkey in the data step. The reason is that distinct option in proc sql is not the same as nodupkey but as nodup. In order to get the same results as nodupkey, we need to use an unknown function “monotonic()”. Here’s how:

proc sql;
create table comb as
select a.*,  b.response, b.id, monotonic() as rowid
from one a right join two b
on a.id = b.id
group by b.id
having rowid=min(rowid);
quit;

What the above code does is to find all the matches from table a for each id in table b and then group them by id. Using function monotonic(), we create a field rowid and then take only the first record (“having rowid = min(rowid)”). This works great when you merge large datasets.

Advertisements

One Response to “Use PROC SQL to merge large datasets – Nodupkey”

  1. Jinhwa said

    Hi, Phillip. ^^
    My name is Jinhwa. I’m from South Korea. When I was looking for using nodupkey example, I found your blog. And your post is very helpful.
    Thank you. I am poor at English. I hope you understand my comment.
    Bye. ^^

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: