A Blog on Analytics and Marketing

SAS, Marketing, Predictive Modeling, Statistics

Archive for the ‘Uncategorized’ Category

SQL: Common Tasks

Posted by phillippeng on February 23, 2009

I use thisĀ  for SQL Server.

1. select a random sample from a table in SQL

select top 100 *
from table_name
order by newid()

Note: the trick is to use newid() function.

2. Rename a table in SQL:
use database_name;
Go
EXEC sp_rename ‘tablename_old’, ‘tablename_new’;
GO

3. Rename a column:
EXEC sp_RENAME ‘TableName.OldColumnName’ , ‘NewColumnName’, ‘COLUMN’

4. Drop a column:
alter table tablename
drop column col1, column2

Advertisements

Posted in SQL, Uncategorized | Tagged: , | Leave a Comment »

SAS: Merge a very large dataset and a small dataset efficiently

Posted by phillippeng on January 24, 2009

I need to append zip penetration data to a campaign file. This is a very typical data appending job, in which the campaign file is a very large file with over 24mm records while the zip penetration file has only 13,000 records. PROC SQL can make it, but it takes more than one hour to finish the process (running on a fast server). Using the hash table is a more efficient way to achieve the goal.

Macro Code:
%macro appnd_zip(dsin, dsout);
data home.&dsout.;
if 0 then set home.penetration_zip5(keep=zip5 zip_penetration);
declare hash zip(dataset: “home.penetration_zip5”);
zip.definekey(“zip5”);
zip.definedata(“zip_penetration”);
zip.definedone();

do until(eof);
set home.&dsin.(rename=(ms403=zip5)) end=eof;
if zip.find(key:zip5) = 0 then output;
else do;
call missing(zip_penetration);
output;
end;
end;
stop;
run;

%mend;

In this macro, I first define a hash table based on the small dataset (penetration at zip5 level). Then I match it against the large dataset based on the matchkey zip5. When there’s a match, append the data. When there’s no any match found, assign a missing value to it. This is equivalent to a left join. If you want to do an inner join, you need to output only when there’s a match found. The processing time is reduced to about 15 minutes.

Posted in SAS, Uncategorized | Leave a Comment »

Time Management

Posted by phillippeng on October 3, 2008

This afternoon I attended the internal training session “Time Management Essentials” instructed by Susan Connors. Susan provided lots of tips and principles of managing times. It’s an informative session. Everybody needs to learn how to manage their time so that they can be more productive.

Time management is not about how much we know about time management, but about how much we put into action. In another word, it’s not about science but behaviour.

Here are some tips/views shared:

1. Manage your email at once. Don’t check your email every 3 to 5 minutes. However, we need to balance the client expectation and email management efficiency.

2. Retool the priority system. If we have the following labels:
Urgent and goal-related = A
Goal-related but not urgent = B
Urgent but not goal-related = C
we need to have have at least one “B” priority in our to-do list every day.

3. Use the primary energy time for priority tasks.

You can find more free resources through

http://site.ebrary.com/pub/mcgraw-hill

Posted in Uncategorized | Tagged: | Leave a Comment »

Use SAS like FTP – PROC DOWNLOAND

Posted by phillippeng on September 29, 2008

Very often we need to work from home through VPN. The speed is fairly OK when you work on sas datasets through server. However, if you tried to move some non-sas files, it’s painfully slow. With PROC DOWNLOAD, you can use SAS as a ftp tool to move non-sas file much easier.

rsubmit;

filename source ‘/sourcefolder/myfile.pdf’;
filename dest ‘/destinationfolder/myfile.pdf’;


proc download infile=source outfile=dest binary; run;

endrsubmit;
signoff;

The BINARY option transfers files without any character translation (for example EBCDIC to ASCII) or insertion of record delimiters.

The simiar proc is PROC UPLOAD.

More information on this can be found from SAS site.

Posted in Uncategorized | Tagged: , , | Leave a Comment »