Common Frequentley Asked Questions     Savvy Forum       


SAS Programming FAQ Contents (Use 'Control F' to search keywords.)



CDISC, Pinnacle 21, SAS Code

Basic Questions

1. In general, what are the rules for lengths for SDTM variables that are kept in ADaMs? - Since the variables were created at the SDTM level, the same length should be preserved.  Only new ADaM variables can have their lengths specified at the ADaM level.  Another point to consider is the dataset requirements based on the version 5 transport file.

2. In general, why and what are the reasons for combining SDTMs and ADaM into classes? - SDTMs and ADaMs are combined based on the type of data stored - LB measurements are findings, AE are events and CM are interventions.

3. When creating SUPPAE dataset from AE dataset, which variable is created to link both datasets? - The AESEQ variable is created and used to link both SUPPAE and AE datasets.  The AESEQ variable will always uniquely identify the record since it is sequence counter based on key variables, which is reset for each patient.  Note that creating the AESEQ variable is always required even if SUPPAE dataset is not needed.

4. In general, if you have additional documentation on data process flow, then is it useful to include in FDA submission? - Yes, if you do not follow CDASH, then in general, FDA welcomes this documentation.  Other places to include this information is in the reviewer's guide.

 Advanced Questions

1. In general, is it possible to create two versions of ADSL and if so, what are the rules? - Yes, if it makes sense, ADSL_XX pr ADBASE dataset may be created to store additional variables not in the ADSL specification.  The general key rule is to keep the ADSL_XX dataset one record per patient.  See SAS paper.

2. Is it possible to compare --DTC CHAR date variables directly with --DT NUM date variables? - In general, it is best to first create temporary --DT NUM variables using the new informats e8601da10. or e8601da19., and then compare with other --DT NUM variables.  Ex. CMSTDT=INPUT(CMSTDTC, e8601da10.);  IF CMSTDT LT TRTSDT THEN PRIORFL='Y';

3. What are the anatomic location variables for TU oncology domain? - TULOC, TULAT, TUDIR, TUPORTOT.

4. Should ADaM variable algorithms be included in define.xml? - Yes, in general this pseudo code algorithm should be included in the 
Computation Description section instead of the comment section of define.xml. 

5. What are useful Oracle Clinical files?  a) CRF annotation rpt - by visit, dataset, OC variable name and SAS variable attributes (name, label, type and length), b) Study DVG rpt - OC variable attributes (name, label, code and decode values), c) View definition - OC variable attributes (name, type, length)

6. How best to convert SDTM duration such as 'PT1M32S' to ADaM duration of 92? - Ex. if EXDUR = ‘PT1M32S’ then TBOXACT=92 with this statement - if exdur > '' then call is8601_convert('du', 'du', exdur, tboxact);

7. What are the methods for converting SDTM to ADaM numeric dates, times and duration? - There are two methods:  

Convert sdtm dtc to adam datetime, date or time. xxxdtm = INPUT(xxDTC, E8601DT.); /* Datetime */ xxxdt = INPUT(xxDTC, E8601DA.); /* Date */ xxxtm = TIMEPART(xxxdtm); /* Time */

Convert sdtm difference in two dates or datetime duration to adam duration. CALL IS8601_CONVERT('dt/dt', 'du', aestdtm, aeendtm, aedur); CALL IS8601_CONVERT('du','du', aedur, hours); 

8. What is a useful procedure for one-proc away for ADaMs? - See Proc Tabulate

9. Is Best Overall Response Rate the same as Objective Response Rate (ORR) which is the presence of at least one confirmed CR or confirmed PR? - Yes.  See SAS paper.

10. What are common broken FDA Validation Rules?

 1789 - Study Tagging Files (STF) or study files do not exit in correct folders

 1734 – TS missing, incorrect study start date or SDTMs missing

 1735 - STF file-tags is incorrect in SDTMs or define.xml

 1736 – For SDTM, DM, ADSL or define.xml is missing

         - For SEND, DM or define.xml is missing 

11. What the methods for creating define.xml? The simplist method to create draft define.xml is from Pinnacle 21 Community version reading SDTM xpts filesSee Pinnacle 21 step-by-steps.

Step 1: Select SDTM IG version number.

Step 2: Create define spec excel file from reading SDTMs in Pinnacle 21 and selecting Define.xml > Create specs. Best to make sure SDTMs have minimum Pinnacle 21 issues first, else clean/update SDTMs first.

Step 3: Manually update define spec excel file with details in each sheet - Codelist, ValueLevel and WhereClauses.  See SAS paper.

Step 4: Create define.xml file from reading define spec excel file in Pinnacle 21 and selecting Define.xml > Generate Define.  See SAS paper.

Step 4: Repeat process for ADaMs.

Another method is to write SAS macros to construct define.xml. Writing SAS macros requires advanced macro programming and following XML tag rules.  Finally, the most professional and current method is to use Pinnacle 21 Enterprise version.

12. What is unique about define.xml 2.0?  Define.xml 2.0 allows for explicit definition of Value level metadata including Where clauses , CRF links, linking codelists to NCI controlled terminology and is based on ODM 1.3.2, SDTM IG 3.1.2, ADaM IG 1.0 and SEND 3.0.

13. What do the custom domains XYZ mean?  X means interventions, Y means events and Z means findings.

14. What is the difference between DSDTC and DSSTDTC? DSDTC is date of collection and DSSTDTC is start date of disposition event and DSSTDY is study day for start of disposition event. Below is the code to create disposition listing.

proc sort data=ds out=ds1 (keep = usubjid ACTARMCD dsstDTC DSSCAT dsdecod); by usubjid ACTARMCD dsstDTC; *where DSSCAT in ('END OF TREATMENT' 'END OF STUDY'); where index(dsdecod, 'CONSENT')=0; run;

* one word for variable name;

data ds1; set ds1; if DSSCAT = 'END OF TREATMENT' then DSSCAT='eot'; if DSSCAT = 'END OF STUDY' then DSSCAT='eos'; if DSSCAT = '' and dsdecod='RANDOMIZED' then dsscat='random'; if DSSCAT = '' and dsdecod='SCREEN FAILURE' then dsscat='screen_failure'; run;

proc transpose data=ds1 out=ds2 (drop = _label_ _name_) prefix=dtc_; by usubjid ACTARMCD dsstDTC; id DSSCAT; var dsstDTC; run;

data ds_eot (drop = dtc_random dtc_screen_failure dtc_eos) ds_eos (drop = dtc_random dtc_screen_failure dtc_eot) ds_random (drop=dtc_screen_failure dtc_eos dtc_eot) ds_screen_failure (drop=dtc_random dtc_eos dtc_eot); set ds2; drop dsstDTC; if dtc_eot > '' then output ds_eot; if dtc_eos > '' then output ds_eos; if dtc_random > '' then output ds_random; if dtc_screen_failure > '' then output ds_screen_failure; run;

proc sql; create table ds_all as SELECT coalesce(a.usubjid, b.usubjid, c.usubjid, d.usubjid) as usubjid , coalesce(a.ACTARMCD, b.ACTARMCD) as ACTARMCD , a.*, b.*, c.*, d.* FROM ds_random as a FULL JOIN ds_screen_failure as b on a.usubjid=b.usubjid FULL JOIN ds_eot as c on a.usubjid=c.usubjid FULL JOIN ds_eos as d on a.usubjid=d.usubjid ; quit; 

15. How do you assign USUBJID for pts continuing into follow up or extended studies? USUBJID is supposed to be unique to a person within a submission, so can provide traceability.  The sponsor has to make sure, however, that they can identify such people.  They will typically ask a question on the Demographics page whether the subject has been in a previous trial. At the time of submission, the sponsor will ensure that the same USUBJID has been assigned to the same person across all the studies. See section in SDTM IG 3.1.2 - 'Additionally, the same person who participates in multiple clinical trials (when this is known) must be assigned the same USUBJID value in all trials.

16. What is one method for join xx with suppxx?

%macro suppxx(dsupx);

proc transpose data=sdtm.supp&dsupx out=supp&dsupx (drop = _name_ _label_);

by usubjid idvarval; var qval; id qnam; idlabel qlabel; run;

data supp&dsupx; set supp&dsupx; &dsupx.seq = input(idvarval, 4.); drop idvarval; run;

proc sort data=supp&dsupx; by usubjid &dsupx.seq; run;

data supp&dsupx; merge sdtm.&dsupx supp&dsupx; by usubjid &dsupx.seq; run;

%mend suppxx; %suppxx(ae);

17. What is the code to process RELREC? See SAS paper.

* select all AE and CM idvars, ex. XXSPID is equal to XX.SPID;

* relrec.relid links two records by same value - sequence by usubjid;

data ae cm; set lptss.relrec; if idvar='CMSEQ' then output cm; else if idvar='AESEQ' then output ae; run;

proc sort data=cm; by usubjid relid; run; proc sort data=ae; by usubjid relid; run;

* keep all linked ae and cm records;

* rename idvar to xxidvar and idvarval to xxseq;

* keep only matching by relid so only xxidvar such as xxspid;

data relrec_cmae (drop=rdomain); merge cm (in=cm rename=(idvar=cmidvar idvarval=cmspid)) ae (in=ae rename=(idvar=aeidvar idvarval=aespid)); by usubjid relid; if ae and cm; run;

* xxspid is the used over 80% time for relrec records;

* relrec link by ae.aespid=aeseq and cm.cmspid=cmseq;

proc sql; create table relrec_ae_cm (where=(relid > '')) as select r.relid, a.usubjid, a.aestdtc, a.aeendtc, a.aeterm, ae.aerel , b.cmstdtc, b.cmendtc, b.cmtrt from relrec_cmae as r right join as a on r.usubjid = a.usubjid and input(r.aespid, 4.)=a.aespid left join as b on a.usubjid = b.usubjid and input(r.cmspid, 4.)=b.cmspid; quit; 

18. What are effective pk checks? See SAS paper.

Inconsistent value for standard unit, proc freq data=sdtm.pp; tables pptestcd*ppstresu/noprint out=tests; run;

Missing units on value, data rep; set sdtm.pp; if input(pporres, best.) ne . and pporresu eq ''; run;

Missing character result when original result provided, proc freq data=sdtm.pp; tables pptestcd*pporres*ppstresc/list missing; format pporres ppstresc $missf.; run; data rep; set sdtm.pp; if pporres ne '' and ppstresc eq ''; then output; run;

Invalid subject, proc sql; create table bad_ids as select distinct usubjid from sdtm.pp where usubjid not in (select usubjid from; quit; 

19. What are examples of value level metaata?

Model: domain.variable when domain.item comparator 'vlmd'



20. What is the code to read an excel file and compare data set attributes with SDTMs?

 Three method for looping and code-generator options

A. create one list of variable values

 scan list and process each value

B. access set of variables and records in dataset

 loop through datasets

 call execute(SAS statements)

C. code-generator

 create set of macro variable values for each variable

 data _null_ to create .sas file

 put statements to build SAS statements

 %include .sas file 

* SDTM data; libname sdtm 'C:\project\studies\116\sdtms\datasets';

* list all sdtms;

proc sql noprint; select unique memname into :sdtmd separated by '.' from sashelp.vcolumn where upcase(libname) = "SDTM"; quit;

%put &sdtmd;

* loop through all sdtms;

%let sdval = 1;

%macro sdtmv;

%do %while(%length(%scan(&&sdtmd, &sdval, %str('.'))) > 0);

%let sdtvc = %scan(&&sdtmd, &sdval, %str('.'));

* import sdtm specs;

proc import datafile='C:\project\studies\Standards\CDISC SDTM Template Specs IGv3.3_compliance.xls'

out=&sdtvc dbms=xls replace; sheet="&sdtvc"; run;

proc sort data=&sdtvc (keep = variable_name variable_label type core format codelistname);

by variable_name; where variable_name > ''; run;

data &sdtvc; set &sdtvc; if type='text' then type='char'; else if type='integer' then type='num'; run;

* read sdtms as x.1;

proc sql; create table &sdtvc.1 as select name as variable_name, type as type1, length as length1, label as label1, format as format1

from sashelp.vcolumn where upcase(libname) = 'SDTM' and upcase(memname) = "&sdtvc" order by name; quit;

* compare attributes;

data diff; merge &sdtvc.1 (in=a) &sdtvc (in=b);

by variable_name; domain="&sdtvc"; if ^(a=b) or ^(label1=variable_label) or ^(type1=type) then diff=1; run;

proc sort data=diff; by descending diff; run;

* append results; PROC APPEND BASE = diff_all DATA = diff FORCE; RUN;

%let sdval = %eval(&sdval + 1);


%mend sdtmv;

* create shell dataset;

data diff_all; attrib domain format=$10. diff format=8. variable_name format=$10. variable_label format=$40. type format=$10.

core format=$15. format format=$10. codelistname format=$15. type1 format=$10. length1 format=$10. label1 format=$40. format format=$10.; run;


proc print data=diff_all; where domain > '';run;

* Alternative - Create and call for each metadata row data;

data _null_;

set cutoff_dates1;

call execute('%dcut('

|| memname

|| ', ' || name || ')' );


%macro run_lb;

data _null_;

file "/bdm/tbos/STDDEV001/studies/0201/dev4/documents/";

set libpath;

put 'libname '


" ' " strip(%str(_lpath)) " ' access= "





%mend run_lb;


%*include "/bdm/tbos/STDDEV001/studies/0201/dev4/documents/"; 

21. What is one method for confirming invalid values?  This method combines summary function COUNT() with conditional processing.

proc sql; 

create table dca3 as select unique 3 as check_num , a.usubjid , catx(' ', 'Age=', age) as var1 , catx(' ', 'ASA=', asa) as var2 , case when age < 18 or asa ^in ('1', '2', '3') and ACTARMCD ^in ('SCRNFAIL' 'NOTTRT' ' ') then 1 else . end as bad_value from fa3 as a; 

create table dcb3 as select unique 3 as check_num, 'DM, FA' as domain, "Inclusion check for age >= 18 and ASA physical status classification (1-3)" as check , case when count(bad_value) = 0 then 'PASS' else 'FAIL' end as dc_rslt , count(bad_value) as count from dca3; 


22. What is the code to convert dataset xml and datasets? First example creates dataset and second creates xml file.

libname SGFXML xml "C:\Documents\example1.xml"; libname DATA "C:\Documents\Data"; data; set; run;

libname SGFXML xml "C:\Documents\example2.xml" xmltype=oracle; libname DATA "C:\Documents\data"; data sgfxml.example2; set; if usubjid=101 or usubjid=824; run;

libname myxml xml 'U:\XML\subjects.xml'; libname dat 'U:\data\'; proc copy in=myxml out=dat; run;

23. What the decision tree methods for creating SDTMs?

All raw data must be stored in SDTMs: AE, SUPPAE, RELREC, FA, FAAE.

AE - map standard data, horizontal structure

SUPPAE - map non-standard data for single, independent variables, vertical structure, ex. AELOC

RELREC - define relationships between domains, horizontal structure, ex. CM

FA - map non-standard data for group of related measurement variables with dates from many different types of data, vertical structure, ex. FAOBJ='Standard Symptoms', FATESTCD='OCCUR', FATEST='Occurrence', FAORRES='Y', FASTRESC='Y', FADTC='2019-03-12'

FAAE - focused FA for AE that may have events that differ in times with AE, vertical structure, ex. FAOBJ='Standard Symptoms', FATESTCD='OCCUR', FATEST='Occurrence', FAORRES='Y', FASTRESC='Y', FADTC='2019-03-12'

24. What are the population flags in ADSL? What is the difference between SAFFL and ENRLFL?  ITT, PP, ENRLFL = Enrollment flag, SAF = Safety flag (enrlfl=y and trtstdt > .).  See SAS paper1, SAS paper2.

25. What does BDS mean?  Basic Data Structure.  See SAS paper.

26. What is the method to use XML mapper to convert define.xml to a dataset?  See SAS paper.

filename defxml 'c:\define-demo.xml';

filename defmap 'c:\';

libname defxml xmlv2 xmlmap=defmap access=READONLY;

data itemgroupdef; set defxml.itemgroupdef; run;

27. What are methods to confirm Data Transfer Specification Requirements - Files and file formats, Datasets, Variables, DM, Define.xml.  See SAS paper 1, SAS paper 2, SAS paper 3.



1. Converting numeric variable to character variable? - Use COMPRESS(PUT(num_var, numeric_format)) function.  See SAS Blog.

2. Converting character variable to numeric variable? - Use INPUT(char_var, numeric_format) function.  Can use BEST. informat as a good general numeric format. Note that one trick to use INPUT() with numeric variables is to first apply PUT() to convert the numeric to character variable, ex. INPUT(PUT(EXAM_DATE, 8.), YYMMDD8.) AS DATE FORMAT=YYMMDD10. where EXAM_DATE=20150101.  Note that when converting non-numeric character values to numbers with nested INPUT(PUT()), to prevent errors for unexpected character values not defined in the character format, it is useful to include OTHER = '.' to convert the unexpected character value to a missing value instead of passing a character value.  The SAS statement may give an ERROR if it only expects numbers. 

3. Searching for character string content? - Use INDEX(char_var) to identify text string.  Use SCAN(char_var, 2, 'delimiter_characters') to extract the second position text string separated by list of delimiters.

4. Replacing text string?  Use TRANWRD(char_var, 'find_str', 'replace_str') function to search and replace text string.  Note that since this works for only one find_str at at time, one option for other strings to replace is to have a series of TRANWRD() statements or nested calls for each find_str. Note that commas can be within the quote string to replace.

5. Removing text match from character string contents or translate character values? - Use COMPRESS(char_var, 'characters_to_remove') to match text and remove.  Note that the COMPRESS() removes any matching characters_to_remove as well as spaces.  To address this, you may also want to use the COMPBL() function to reduce multiple spaces to single spaces.  The best function for exact text replacement is the TRANWRD() function.  To remove special characters such as '^M', apply this code - AETERM=COMPBL(STRIP(TRANWRD(AETERM, '0D0A'x,'')));.

* Remove non-printable characters such as carriage returns, k means keep and w means writeable and i means to ignore case; comment = compress(strip(Comment), , 'kw');

* Remove carriage return (‘OD’x) and line feed (‘OA’x) hidden characters; comment = COMPRESS(comment, ’0D0A’x); 

* Remove all special characters; comment = tranwrd(comment, '+', '');

The TRANSLATE function replaces characters in strings, but is less "sophisticated" than TRANWRD in that it replaces single characters, not patterns of characters (source, replacement text or list, target text or list). See SAS site for TRANSLATE.

Example of standard and non-standard quotes.  The "K" functions are normally used when dealing with non-English characters in a DBCS environment. However I found that KTRANSLATE was needed in this case if SAS is running with Unicode support. So I think that it's safest to use KTRANSLATE instead of TRANSLATE.  The first pair of arguments converts left quotation marks to straight quotation marks. The second pair of arguments converts right quotation marks to straight quotation marks.  Similar for single quotation marks.  for line feed, use '0D0A'x for windows and '0A'x for unix.   See link for ASCII characters in hexidecimal.

data work.bad;

length name $6 value $42;

infile cards4 truncover;

input name $6. value $char42.;


MOBLFL The value should be “Y” or null.




data work.good;

set work.bad;

value = ktranslate(value, '"', '93'x, '"', '94'x, "'", '91'x, "'", '92'x);


6. Syntax for referencing variable names longer than 8 characters or with blanks - often used in excel files? - Use "My Var Name"n.

7. What format can be used to create leading zeros? - Can use Zw.d format.

8. What is one technique for controlling the sort order of Day XX where XX is a number from 1 to 99, for example?  If VISIT, character or numeric variable type, contains the XX value, then apply this PUT to correctly insert spaces to right justify numbers in the character variable - XVISIT='DAY'|| PUT(VISIT, $2.);.

9. Which function can be used to convert multiple spaces to a single space? - Can use the COMPBL() function.

10. What is one method for creating character variables with an apostrophe as part of the data value?  Use double quotes to save single quotes and single quotes to save double quotes.  DATA MYDATA; MYVAR = "I'll love to join you."; MYVAR1 = 'I"ll love to join you.'; RUN;

11. Is there a character function that does both LEFT(TRIM())? - Yes, the STRIP() function removes leading and trailing blanks.

12. When concatenating variables, what are some ways to remove blanks? - Can use STRIP() on each variable.  Also, can apply PROC FREQ to see the blanks.

13. Which function is useful to convert to proper case text? - PROPCASE().

14. Are there functions to dynamically apply IF-THEN-ELSE logic as a single function? - Yes, use IFC(A, B, C, D) for character variables and IFN(A, B, C, D) for numeric variables.  Where A = expression results in TRUE or FALSE, B = value if TRUE, C = value if FALSE and D = value if the expression is missing which may not be the same as if the a single variable is missing.  Maybe better to use CASE-WHEN to have more control.  


Note that IFC() is also ideal for concatinating strings such as, BMNDOSE=IFC(DOSE600 NE ., STRIP(PUT(DOSE600, 8.1)) ||' mg/', 'NA/') || STRIP(PUT(DOSEEND,8.1))||' mg';

Note that this can also be applied in PROC SQL statements. See SAS paper for DATA Step example.  See SAS paper for Proc SQL example.

15. When adding two variables that may be missing, is there a way to prevent the 'Missing values' message in the SAS log? - Yes, for example IF ATDA > . AND ATSC > . THEN ATNIF=SUM(ATDA, ADSC); ELSE ATNIF=.;

16. Is there a function to return the number of non-missing and missing values from a list of variables? - Yes, the N() function returns the number of non-missing values in a list of values and the NMISS()/CMISS() function returns the number of missing numeric/character values in a list of values.  To make sure at least one non-missing numeric variable then apply IF N() >= 1.  To make sure at least one non-missing character variable then apply IF CMISS() < <max number of variables>.  See SAS paper for details.  See CMISS() syntax.

Use this method to compare both numeric and character variables in one N() function - proc format; value $ctnmiss ' ' = '.' other = '1'; quit; n(lbstresn, input(put(lbstresc, $ctnmiss.), best.)) = 1 then 1

17. What is the INDEX() function useful for? - INDEX() function is used to identify exact match of a character text in a character variable.  A value greater than 0 indicates that a match was found.  Note that IF INDEX() > 0 THEN ... is the same as IF INDEX() THEN ....

18. What is the syntax for applying variable name short-hand reference? - Ex. A=SUM(OF X1-X10);

19. When create a new variable from a variable with a user defined format, are there options to save the actual value stored as well as the formatted label value as the actual value? - Yes, this can be done for both numeric or character variable, to save the actual, value then, NEW_VARIABLE = OLD_VARIABLE; to save the formatted label value as the actual value then, NEW_VARIABLE=PUT(OLD_VARIABLE, USER_DEFINED_FORMAT.);


21. What is one way to always display to three significant digits even if the value is very large or small? - AVAL=INPUT(PUT(AVAL, E9.), E9.);

22. Is there a function that can make it easier to create a new variable by applying a format without using the PUT() function? - Yes, the VVALUE() function can be applied, for example, OLDWAY=PUT(Y,10.2); FORMAT Y 10.2; NEWWAY=VVALUE(Y);  

23. What is an example of nested functions? - PUT( INPUT( INPUT(RESP, $BRESP.), 4.), BRESP. ), the inner INPUT() applies the invalue $BRESP. to convert the character values, ex. 'CR', to numeric values, ex. 1, and then the outer INPUT() converts the character variable containing numeric values to numeric variable and finally the outer PUT() displays a label for the numeric value, ex. 'CR'.

24. Can PUT() function be applied in a WHERE statement to apply conditions? - Yes, proc format; value visitwin (Multilabel) LOW = 'BASELINE' 0 = 'DAY 0' 1 - 2 = 'DAY 1' 2 - 4 = 'DAY 3' 4 - 6 = 'DAY 5' 6 - 8 = 'DAY 7' 12 - 16 = 'WEEK 2' 26 - 30 = 'WEEK 4'; quit; proc freq data=bz1_adam.adbmk; tables usubjid*avisit*ady*visit/list missing nocum; where aval > . and avisit ^=: put(ady, visitwin.); run

25. What is the maximum length of character variable? - 32,767 bytes.

26. What is a good way to search for words or text string with a character variable or another text string? - The INDEX() function works on both character strings or two text strings.  The INDEX() function returns 0 for non-match and > 0 for matches. These are helpful for conditional processing.  INDEXW() function works on word so it treats as a single unit.

27. What is the function to calculate the standard deviation? - STD()

28. What is one method to compare variable names with a constant? - The index function can compare two constants, a macro variable storing a variable name and a suffix value.  Normally the index function compares variable values, INDEX("&VAR', 'END') > 0

29. What is the method for assigning scientific notation values? - SAS recognizes XeD syntax.  M = 1e9 is the same as M  = 1000000000 which is 9 zeros.

30. What functions are useful for fuzzy matches? - COMPARE(), COMPGED(), COMPLEV(), SPEDIS()

31. How can CATX() be used? - CATX() provides the ability to insert separaters (-, to, etc.) between character strings along with replacing TRIM(LEFT()) functions. 

32. What functions are similar to INPUT/PUT but allow the format name to specified during execution time instead of compile time? - INPUTN(), INPUTC(), PUTN() and PUTC() allow format names to be in variables instead of hard coding them.  This technique can be applied for example to convert to standard units for matching raw units.  See example in SAS paper.  Example 

DATA NEW_SCORE2; SET indata; ** figure out the format we need to use to resolve the SCORE **; FACTOR_TYPE = UPCASE(compress('FACTOR_'||sex)); ** apply the format to the SCORE to determine the FACTOR **; FACTOR = INPUTn(age,FACTOR_type);  ** apply the FACTOR to the SCORE **; new_bal = SUM(SCORE*FACTOR); RUN;

33. What are the differences between strip(), compress() and trim ()?  See SAS blog

34. What is a useful function to confirm non-missing values? - Use the NOT MISSING() function.  DATA AGE; SET DOG; TDAY=TODAY(); IF NOT MISSING(BIRTHDT) THEN AGE=ROUND((TDAY-BIRTHDT)/365, 0.1); ELSE AGE=.; RUN;

35. Do summary functions ignore missing values? - Missing values can be generated by a. Performing an illegal math operation, such as dividing by zero, b. SQL JOIN or a data step MERGE, where the value of the key is in one file, but not in another, c. Any of several math functions that create missing values in some circumstance - Yes, in general, all summary functions such as SUM(), MIN(), MAX(), etc. ignore missing values.

36. What is a useful alternative to applying multiple index() functions? - Use PRXMATCH SAS function which finds several words within a text string avoiding the use of multiple indexes.

Example: if prxmatch("m/PARACETEMOL|ASPRIN|LEVOTHYROXINE|COCODAMOL|AMOXICILLIN/oi", pnam ) > 0 then Flag = 1;

The above example look at any of the medications listed from the pnam variable and sets the Flag variable to 1.

  • The 'm' tag at the beginning of the search string tells PRXMATCH that it is doing a matching operation.
  • The 'o' tag at the end tells SAS to compile the parse string once.
  • The 'i' tag at the end forces a case insensitive match so that uppercase text strings is equal to lowercase for the purpose of matching.
  • The pipes (|) separate the search strings. Do not add spaces unless they are part of the matching criteria because every character counts. 




Example of SAS Date Constant (date9. format): '01JAN1960'D, SAS Time: '12:47'T, SAS DateTime '01JAN1960:12:47'DT

When creating date/datetime variables, make sure to combine only non-missing values.  One way to check this for character variables is to assure LENGTH(VARTM) > 1.

1. Creating date and datetime variables from numeric or character variables? - Use MDY(month, day, year) and DHMS(date, hr, mn, sec) functions.  Remember to convert to numeric variables, as needed, first. To create a date variable, since generally, INPUT() works only on character variables, you can use this trick syntax to convert numeric variable containing, 20150101, to a character variable and then apply a date format, INPUT(PUT(EXAM_DATE, 8.), YYMMDD8.) AS DATE FORMAT=YYMMDD10.

2. Creating date variables from character variables for any date format? - Concatenate variables and assign using INPUT() function - input(COMPRESS(DAY || MON || YEAR), date9.) as DATE format=date11. length=4.  Can apply version 9.2 ANYDTDTEW. informat as generic date informat.  Best to confirm dates.

3. Creating datetime variable from date and time variables? - Use DATETIME16. format and one of these three calculations - ex. VARDTTM=DHMS(VARDT, HOUR(VARTM), MINUTE(VARTM), SECOND(VARTM)), VARDTTM = (VARDT * 24 * 60 * 60) + VARTM or VARDTTM=INPUT(PUT(VARDT, DATE7.)||':'||put(VARTM, TIME8.), DATETIME16.);.

4. Creating date and time variables from datetime variables? - Use DATEPART() and TIMEPART() functions.  As a reminder, always create new variables instead of redefining original variables since SAS may give inconsistent results.

5. Creating a time variable from a character variable, ex. '01:05'? - Use INPUT(TIME, TIME5.); 

6. Which informats are useful for reading in a variety of date and time formats? - Use ANYDTDTE. for dates, ANYDTDTM. for datetimes, and ANYDTTME. for times.

7. What is a useful condition to subset and compare two datetime variables that may contain missing time variables? - This syntax will compare non-missing datetime variables or date variables if any of the time variables are missing.  WHERE ( (STARTDTM >= ENDDTM > .) or (STARTDT >= ENDDT and (STARTTM = . or  ENDTM = .) ) );

8. Is there a function to calculate the difference between two dates, times or datetime variables to get the number of days or weeks between, for example? - Yes, use INTCK() function, etc. WEEKS = INTCK('WEEK', START, END);  You can also change to 'DAY', 'MONTH' or 'YEAR'.  The INTNX() function is useful to determine the next day or week for example, NEXTPAYDATE = INTNK('WEEK', LASTPAYDATE, 2, 'SAME').

9. Does SAS automatically correctly compare datetime variables with date variables?  - No, make sure to DATEPART(DATETIME variable) in the comparison with date variables.

10. How would you make a macro variable that contains a date, formatted like “04/13/2013” which is two days in the future? - Use the INDEX() function and the system macro variable with today’s date.

11. What are tips to help assure all time values are displayed as 08:30 instead of 8:30? - Make sure the time variable width is large enough, apply TODw.d to display leading zeros since TIMEw.d does not display leading zeros.

12. When converting from datetime numeric variables to two character variables for dates and times, then make sure to apply DATEPART() and TIMEPART() before applying the PUT(, YYMMDD10.) and PUT(, TIME5.).

13. How do you convert a character --DTC, datetime variable into a numeric --DTM, datetime variable? - Use exstdtm = input(strip(exstdtc), b8601dt19.);

14. How do you convert a character --DTC, date variable into a numeric --DT, date variable? Yes, the SUBSTR() function is useful to select specific values instead of the complete value, setting required are starting and length of the selected value - Use exstdt = input(substr(strip(exstdtc), 1, 10), b8601da10.);

15. Within a WHERE clause, for example, what is one method to compare a time variable with hours? - Apply time constants such as, '2:00't <= EXTMSCHR <= '4:00't.  As an alternative, can first apply PUT() function with TIME2. format to display as hours only, then apply SUBSTR() to keep only the hours and then apply INPUT() to convert from character value to numeric value, WHERE INPUT(SUBSTR(PUT(EXTMSCHR, TIME2.), 1, 2), 3.) <= 4;

16. What is one major difference between INTCK()  and INTNX()? INTCK() counts the number of intervals based on day, week, month or years between two date values and INTNX() advances to a date value by a number of intervals.  For INTNX(), the default starting 1 day, week, month or year, Sunday, 1st and last day of each month. and January. This starting point can be shifted by adding a number after the unit sch as year2 for two year internal.

17. How are AM and PM determined in time variables?  SAS Time variables display time as military time so 1 pm is 13:00:00.  

18.  What is the syntax to create date variables from char date and char datetime variables? - Define attributes; format exstdat1 date9.; length exstdat1 8.; * example of char date – ‘01 JAN 2016’; exstdat1 = input(compress(exstdat1),??date9.); * example of char date – ‘01JAN2016:00:00:00:00’; exstdat1 = input(strip(scan(put(exstdat, datetime19.), 1, ':')), date9.);

The Question Mark (?) format modifier can be used to bypass error processing should invalid data be expected and acceptable to the application - for example, attempting to read a character field into a numeric variable. A double Question Mark (??) conceals every indication of an error - the “Invalid Data” message and the echoing of the input line containing the bad data are omitted from the SASLOG, and the _ERROR_ variable is not set.

19. How do you convert a variable storing number of seconds into a TIME variable? - Since SAS stores TIME variable as number of seconds, you do not need to use informat or INPUT() unless the variable is character so needs to be converted to numeric variable. You next need to apply a format such as TIME5. for 24-hour clock such as 14:05 with a max of 24:00, TIME8. for 24-hour clock such as 14:05:10, TIMEAMPM8. for 12-hour clock such as 02:05:10 PM or MMSS5. for minutes and seconds which will display 67:10 instead of 1:07:10.

20. What happens when a datetime format is applied to a date variable? - The number of days get treated as the number of seconds from 01JAN1960 so display will be incorrect and not display dates.  Best to apply date formats on date variables and datetime formats for datetime variables.

21. What is one method to expand or create each date record based on start and end dates? - Note that this method assumes you have a small dataset since many records can be created very quickly.  data adae_adsl1; set adae_adsl; by usubjid; do i=astdy to aendy; ady=i; output; end; run;

Alternative method is proc sort data=adp.adae (obs=3) out=adae (keep= usubjid astdt aendt aetoxgr aedecod); by usubjid; run; data a1; set adae; where nmiss(astdt, aendt)=0; diff=aendt - astdt; output; do until (i = diff); astdt=astdt + 1; i + 1; output; end; run;

22. What is one method for creating a group counter? Both BY and FIRST. are needed, See SAS Blog - * Sort by counter group variable; proc sort data=sashelp.class out=class; by sex; run; * Create simple counter; data class; * Keep count value to next record; retain count; set class; by sex; * Add 1 to counter; count=sum(count, 1); * Reset counter for each new sex value; if then count=1; run;

23. What is one method for tracking changes in oncology responses (char) grouped over time (num).  The task is to convert multiple records to one record per pt and to create multiple variables which are in order of last ady for each response - ex. 11 (BFBM) 13 (CRh)? - Below combination of steps will display at summary level by pt each max study day and response value.

libname adam 'X:\Client_A\EOP1_2018\Biostatistics\Production\ADAM\Data';

proc sort data=adam.adrs out=adrs; by usubjid phase paramcd ady avalc; run;

data adrs1;

length avalc1 $30.;

set adrs;

* Append retreatment to AVALC;

if substr(left(avisit), 1, 2) = 'Re' then avalc1 = strip(avalc) || 'r';

else avalc1 = avalc;

keep usubjid phase paramcd ady avalc1;

* Keep only post treatment;

if ady > 0; run;

data adrs2;

* Create and retain key group vars;

length usubjid_avalc1h $100.;

retain usubjid_avalc1h;

set adrs1;

* Assign as combo of key group vars;

usubjid_avalc1h = strip(lag(usubjid)) || '_' || strip(lag(avalc1));

* If current is different from previous record then store avalc1 value;

if strip(usubjid) || '_' || strip(avalc1) ^= usubjid_avalc1h then do;


end; run;

* For records marked, keep previous ady value;

data adrs3;

retain usubjid_avalc1h2;

set adrs2;

* Use in second data step since lag was not working in previous step;



* Assign as combo of key group vars;

usubjid_avalc1h2 = strip(usubjid) || '_' || strip(avalc1);

* If current is different from previous record then store avalc1 value;

if usubjid_avalc1h ^= usubjid_avalc1h2 then do;

if last_ady > ady then usubjid=usubjidh;


* keep and assign only if last_avalc1 is non-missing;

if last_avalc1 = '' then last_ady=.; run;

* Create combined variable;

data adrs4;

length avalc2 $30.;

set adrs3;

avalc2 = ' ' || strip(put(last_ady, best.)) || ' (' || strip(last_avalc1) || ')' ;

if last_avalc1 > ''; run;

proc sort data=adrs4;

by usubjid phase paramcd last_ady; run;

* Transpose to get one record per pt;

proc transpose data=adrs4 out=dcc (drop =_name_) prefix=ADY_RESP;

by usubjid phase paramcd;

var avalc2; run;

* Craete excel file; ods csvall file='\\Client\C$\eop1_adrs2.csv'; proc print data=dcc; run; ods csvall close;

24. What is one method to use functions to identify the last day of the month? Use INTNX() function.

If me_dt = INTNX ('MONTH' /* increment = month */ , today() /* start at today – use macro var instead */

, 0 /* move ZERO months */ , 'E') /* return END of current month */ ;



1. What are the two methods for accessing dictionary tables? - Use PROC SQL to access DICTIONARY.X or SASHELP.X datasets or DATA Step to access SASHELP.X datasets.

2. Which procedures are most useful for creating a codebook of all variable names and attributes? Components of a codebook inlcude: metadata attributes of all datasets, variables, freq and means, formats, missing values and outliers in pdf, rtf or html format with links - PROC CONTENTS.

3. What are examples of using dictionary tables and VNAME() function to get the variable name - See SAS blog.

4. What is one method to display the full pathnames of all LIBNAMES? - proc sql; create table libnames as select unique libname, path from sashelp.vlibnam where libname ^in ('SASHELP', 'WORK', 'STPSAMP', 'SASUSER', 'SASDATA', 'MAPS', 'MAPSGFK', 'MAPSSAS' 'LIBRARY'); quit;

5. How can you cancel a LIBNAME? LIBNAME LIBREF CLEAR;

6. What is one method to compare common domains across libnames? - proc sql; create table dc as select memname, libname, crdate, nobs, nvar, filesize from sashelp.vtable as a where upcase(memname) in ("PCR" "CYT" "CSF") order by memname, libname; create table dc1 (where = (upcase(memname) in ("PCR" "CYT" "CSF"))) as select memname, a.libname, b.path, crdate, nobs, nvar, filesize from sashelp.vtable as a left join libnames as b on a.libname=b.libname order by memname, a.libname; quit;

7. What is the method to compare dates across sdtms? - libname dev '\\sasprod01\PrisonersBay\stat_prog\KTE-C19\KTE-C19-103\analysis\dev\data\sdtm'; libname stable '\\sasprod01\PrisonersBay\stat_prog\KTE-C19\KTE-C19-103\analysis\stable\data\sdtm_unfiltered'; libname anal '\\sasprod01\PrisonersBay\stat_prog\KTE-C19\KTE-C19-103\analysis\EOP1_2018\data\sdtm_unfiltered'; * Compare domains across folders; proc sql; create table sdtm_dstamp as select libname, memname, datepart(crdate) as crdate1 format=date9., nobs, nvar, filesize from sashelp.vtable where upcase(libname) in ('DEV' 'STABLE' 'ANAL') order by memname; quit;  proc transpose data=sdtm_dstamp out=sdtm_dstamp1 (drop= _name_ _label_); by memname; id libname; var crdate1; run; proc print;run;

8. What is the method to compare dates across similar SAS programs? - * Source path; %*let dsnme = p_dstamp; %*let pathdir=\\sasprod01\PrisonersBay\stat_prog\KTE-C19\KTE-C19-101\analysis\dev\program\sdtm; * QC path; %*let dsnme = s_dstamp;%*let pathdir=\\sasprod01\PrisonersBay\stat_prog\KTE-C19\KTE-C19-101\analysis\dev\validation\sdtm; %macro dstamp(dsnme=, pathdir=); filename dirRef "&pathdir"; data &dsnme (where =(index(file_name, '.sas') > 0)); dirID=dopen("dirRef");

num_of_files=dnum(dirID); &dsnme._path="&pathdir"; keep &dsnme._path file_name &dsnme._moddate; format &dsnme._moddate date9.; do i=1 to num_of_files; file_name=dread(dirID,i); rc1=filename("fRef",catx("/","&pathdir",file_name)); fid=fopen("fRef");

if fid > 0 then do; full_name=finfo(fid, "FileName"); bytes=finfo(fid, "File Size (bytes)"); &dsnme._moddate=datepart(input(finfo(fid, "Last Modified"), anydtdtm.)); owner=finfo(fid, "Owner Name"); rc2=fclose(fid); output; end; end; fId = fClose( fId ); run; %mend dstamp; %dstamp(dsnme=p_dstamp, pathdir=\\sasprod01\PrisonersBay\stat_prog\KTE-C19\KTE-C19-101\analysis\dev\program\sdtm); %dstamp(dsnme=s_dstamp, pathdir=\\sasprod01\PrisonersBay\stat_prog\KTE-C19\KTE-C19-101\analysis\dev\validation\sdtm);

proc sql; create table b_dstamp as select coalesce(p.file_name, s.file_name) as file_name, p_dstamp.*, s_dstamp.* from p_dstamp as p full join s_dstamp as s on p.file_name = s.file_name; quit;

9. What is one method to determining variables that data sets have in common? - See SAS blog

10. What function returns the name of a dataset? - DSNAME() returns the dataset name associated with a dataset id, %let dsname = %sysfunc(dsname(&dsid));

11. What methods exist to access the global macro variable of the SAS program running?   See SAS tip.

For SAS batch run: %put The current program is %sysfunc(getoption(sysin));

For SAS interactive:

%macro pname; %global pgmname; %let pgmname=;

data _null_; set sashelp.vextfl; if (substr(fileref,1,3)='_LN' or substr

(fileref,1,3)='#LN' or substr(fileref,1,3)='SYS') and index(upcase(xpath),'.SAS')>0 then do;

call symput("pgmname",trim(xpath)); stop; end; run; 

%mend pname; %pname; %put pgmname=&pgmname; 

12. What method exists with SAS Studio to create list of SAS programs or log files in a data set?  Use dopen() function.

%macro dstamp(dsnme=, pathdir=, ext=sas);
* SAS program and log path names;
filename dirRef "&pathdir";
data &dsnme (where =(index(file_name, ".&ext") > 0));
keep &dsnme._path file_name file_name2 &dsnme._dt;
format &dsnme._dt datetime16. file_name2 $50.;
* Loop through each file;
do i=1 to num_of_files;
  file_name=dread(dirID, i);
  rc1=filename("fRef",catx("/","&pathdir", file_name));
  if fid > 0 then do;
    full_name=finfo(fid, "FileName");
    bytes=finfo(fid, "File Size (bytes)");
    &dsnme._dt=input(finfo(fid, "Last Modified"), anydtdtm.);
    owner=finfo(fid, "Owner Name");
      * remove extentions;
     file_name2 = tranwrd(strip(file_name), '.sas', '');
     file_name2 = tranwrd(strip(file_name2), '.log', '');
 fId = fClose( fId );
%mend dstamp;
%dstamp(dsnme=s_sas, pathdir=\\analysis\program\tfl);
%dstamp(dsnme=s_log, pathdir=\\analysis\program\tfl, ext=log);
%dstamp(dsnme=q_sas, pathdir=\\analysis\validation\tfl);
%dstamp(dsnme=q_log, pathdir=\\analysis\validation\tfl, ext=log);
* Combine and create issues flag;
* and program names are combined;
proc sql;
create table b_dstamp as
 select unique coalesce(ss.file_name2, qs.file_name2) as file_name,
 s_sas_dt, q_sas_dt, s_log_dt, q_log_dt
, case when s_log_dt > q_log_dt then 'Source log is after QC log' else ' ' end as issue
 from s_sas as ss full join q_sas as qs on ss.file_name2 = substr(qs.file_name2, 3)
full join s_log as sl on ss.file_name2 = sl.file_name2
full join q_log as ql on ss.file_name2 = substr(ql.file_name2, 3);
proc print data=b_dstamp  noobs;
 where file_name > '' and nmiss(s_sas_dt, q_sas_dt, s_log_dt, q_log_dt)=0;

13. Other than the main dictionary tables for dataset and variable attributes, which additional dictionary tables are useful? Below three tables are useful.

sashelp.vcatalg - complied macros: libname, memname, memtype, objname

sashelp.vlibnam - libnames (name), path

sashelp.vmacro - macro variables: scope, name, value



1. What is the formula for converting date values from Excel to SAS date values? - Use SAS_DATE = EXCEL_DATE -(365.25*60)-1; FORMAT SAS_DATE DATE9.;  Also, it is best to first confirm all dates are in the same format before importing the Excel file into SAS.  As an alternative if the dates are in different formats, you can apply version 9.2 ANYDTDTEW. informat to correctly read in dates.  Make sure to confirm date values.

2. What is the formula for converting time values from Excel, which are stored as fraction of a day such as .5 for 12:00 and 1 for 24:00, to SAS time values? - Use SAS_TIME = INPUT(EXCEL_TIME, 8.6)*(24*60*60); FORMAT SAS_TIME TIME8.;  The 8.6 informat allows for high precision.

3. What is a useful PROC IMPORT option to read more than the default number of rows when determining the variable length for character data?  GUESSINGROWS = 1000.

4. What is one technique for preserving leading zeros in numeric variable formats in SAS datasets when exporting to Excel?  One option is to create a character variable with "'" before the leading zeros.  When Excel imports the file, it will assume that variable is character and keep the leading zeros.  As an alternative, save the file in excel as a numeric value without leading zeros and then read the variable back with an informat such as Zw.d to insert leading zeros in the SAS dataset.  See SAS paper.

5. Other than using ODS, what is an alternative SAS procedure to create excel files? - PROC EXPORT with DBMS=XLS and the OUTFILE= options.  Use the REPLACE option to recreate the excel file or without REPLACE to update the excel file.

6. How are excel file variable names with spaces between characters referenced? - 'variable name'n for example is a variable name.  Both character or numeric variables with or without spaces still require the 'n' at the end.  This is also a valid variable name within macro calls.

7. When reading numbers from excel file, how can you make sure the correct numbers are read?  - See the true value of the cell in the upper left hand corner instead of the formatted value which may be to display to one decimal place. 

8. What is the syntax for accessing XLSX files? - Use DBMS=EXCELCS option. Options.

9. What is one of the easiest methods to create datasets from excel files in Unix? - Create a CSV file and use DATA step to read the CSV file.

10. Missing values displayed as zeros when using SAS Universal Viewer to create excel files. 

11. What is the reason that PROC IMPORT can not import more than 255 variables or columns from an excel file? - It is a Microsoft Excel limitation which is still outstanding.  The Import Data task of SAS Enterprise Guide can be used to import the entire file. Just make sure you don't select the "Import the data using SAS/ACCESS Interface to PC Files whenever possible" checkbox. You can start the wizard using the click sequence: File > Open > Data.  See SAS notes 1, and SAS notes 2. See SAS macro.

12. What are some options when you get this message - ERROR: Connect: Class not registered ERROR: Error in the LIBNAME statement. Connection Failed. See log for details. NOTE: The SAS System stopped processing this step because of errors? - Run PROC SETINIT to confirm SAS Access to PC files exists.  If so, then check with engine should be specified such as PCFILES. 

13. In general, when is it better to use PROC IMPORT over LIBNAME when accessing excel files? - When there are more than 255 columns, it is better to use PROC IMPORT.

14. What is a basic style to create basic excel files? - STYLE=MINIMUM.

15. When using PROC IMPORT to convert excel files to datasets, SAS can recognize if the first row contains variable names and will use them.

16. Character strings can be truncated at 255 or 1024 characters when importing Excel files into SAS - SAS Tip

17. What is the syntax for creating multiple sheets in an excel file? - To create multiple sheets, the excel file can not be closed until all data is displayed.  SHEET_NAME allows for multiple sheets.  ods markup tagset=excelxp file='C:\biomarker\data_checks\dc_7_1.xls'; ods tagsets.excelxp options(sheet_name='SDTM_LBCYT'); proc print data=dc_7_lbcyt; run; ods tagsets.excelxp options(sheet_name='RAW_CYT'); proc print data=dc_7_cyt; run; ods markup close; 

18. What is one advantage of using DDE to read excel files instead of other methods? - DDE allows running in batch mode to select rows and columns and also requires DATA Step with INPUT statement.  Alternative method may first require you to save as CSV file first. 

19. What is the syntax to batch run to first automatically open excel files? - filename ddecmd dde 'excel|system'; data _null_; file ddecmd; put '[FILE-OPEN("c:\wuss\names workbook.xls")]'; run;

20. What are the options to set for default of data filter and fixed column width? - AUTOFILL = "ALL" and ABSOLUTE_COLUMN_WIDTH = '10'.

21. What is the excel file syntax for comparing two columns? - with data in columns A and B, insert this formula in column C, '=IF(A2=B2, "Match", "No Match")'

22. What is the syntax to create a multiple sheet excel file based on a collection of datasets? - ods _all_ close; proc sort data=sashelp.class out=work.class; by sex; run; quit; ods Excel file='C:\temp\sunil.xlsx' style=HTMLBlue; ods Excel options(sheet_name='Gender - #byval(sex)'); proc print data=work.class; by sex; run; quit; ods Excel options(sheet_name='Retail Sales'); proc print data=sashelp.retail; run; quit; ods Excel close; 

Another example to create one sheet per proc is ods excel file = "SHEET_NAME_My_String4.xlsx" ; ods excel options(SHEET_NAME='My_Sheet_Name_1' sheet_interval='proc'); proc print; by region; run; ods excel options(SHEET_NAME='My_Sheet_Name_2' sheet_interval='proc'); proc print; by region; run; ods excel options(SHEET_NAME='My_Sheet_Name_3' sheet_interval='proc'); proc print; by region; run; ods excel close; 

23. Do named ranges have any impact when reading excel files?  Yes, PROC IMPORT ignores named rages by default while XLSX libname does not ignore named ranges.

24. What excel options exist when using ODS Excel to customize excel files? Yes, see SAS paper.

ods excel file = "\filename.xlsx"

options(start_at = "2,2" /*Options */

embedded_titles = "on" embed_footnotes_once = "on"

autofilter = "1-7" frozen_headers = "on" frozen_rowheaders = "2"

absolute_column_width = "15,16,13,10,15,15,14,14"

row_heights = "30,20,50,20,20,25,5" sheet_name = "#byval1" tab_color = "yellow") style=Listing;

25. Since specifications are excel files, what are useful excel functions to check metadata values?

=len() for variable name length

=upper() to convert variable names to upper case

=if(a=b, "Match", "Not Match") to match values

=unique(a1:a10) to get unique list

25. What is the general syntax for PROC Import to read csv files?  See SAS paper.

PROC IMPORT OUT= fromcsv DATAFILE= holdit DBMS=CSV REPLACE; guessingrows=444444; GETNAMES=YES; DATAROW=2; run;


JMP (Top)




1. Identifying the output objects created from SAS Procedures? - Use ODS TRACE ON/OFF;  Add /LISTING to display object names just before it's contents in the list file.

2. Stopping and starting writing to SAS list file? - Use ODS LISTING CLOSE; ODS LISTING;

3. Creating datasets from SAS Procedures? - Use ODS OUTPUT <table_name> = <dataset_name>;

4. Adding PageXofY as footnote? - Use ODS ESCAPECHAR = '^'; footnote1 "Page ^{pageof}";

5. Creating a table of contents file linking results from SAS Procedures? - Use CONTENTS and  TOC_DATA options with STARTPAGE = YES option in the ODS RTF statement.  When opening the word file, right click under the Table of Contents header and select update field option.

6. What is one way to avoid issues in titles and footnotes when applying the BODYTITLE option. - One method is to also apply the option - BODYTITLE_AUX which will allow you to justify titles.  Another method is to apply the BODYTITLE option and manually insert the microsoft word PageXofY syntax - see Proc REPORT for details.  As an alternative, use the RTF tagset.  Note that PAGEOF will not work with the RTF tagset. To display RTF tagset documentation, use the OPTIONS(DOC=”HELP”) on the Tagsets.RTF statement.  See SAS example of BODYTITLE_AUX.

7. When using ODS for example to write to an RTF file, what is the option to prevent the default of new page for each SAS procedure result? - Use STARTPAGE = NO to prevent page breaks.

8. What is the ODS default search path? - ODS PATH SASUSER.TEMPLAT (UPDATE) SASHELP.TMPLMST (READ);

9. What is the option to display the ODS PATH to search for ODS template stores? - ODS PATH SHOW;

10. What is the option for creating a temporary ODS template store, MY_STYLE? - ODS PATH WORK.MY_STYLE(UPDATE) SASHELP.TMPLMST(READ);  It is best practice to always specify SASHELP.TMPLMST(READ) as the last path to prevent errors.

11. What is the syntax to display styles in a template store? - PROC TEMPLATE; LIST STYLES; RUN; proc template; title '1) List a Particular Template Store'; list / store=sasuser.templat; run;

12. How are special characters different from general RTF control words? - Special characters have unicode while RTF control words do not.  Both special characters and RTF control words can be applied the same way - ex. ^R"\u8804\f3 " Footnote with LE symbol';

13. In general, when plotting group means from detail level data, can you apply PROC SGPLOT directly? - Yes, you apply PROC SGPLOT directly to get descriptive stats or first apply PROC MEANS to get mean values and then plot from the dataset saved.  See PROC GCHART example with PROC MEANS.

14. What and how exactly does DDE or Dynamic Data Exchange work?  DDE enables you to create a word file with columns and rows cells with bold text for example.  DDE works by first creating bookmarkes in a word file that identifies where to start inserting values and then referencing the bookmarks from within the SAS program to start populating the word file.  Often used with ARRAYS, DO-LOOPs and PUT statement to populate cells.  

This method works similar to the PROC TEMPLATE approach to create a style definition, however, it allows easier update of the word file based on visual changes instead of ODS style syntax changes.  See SAS paper for basic example.

15. When applying ODS LAYOUT, what are two placement options for displaying a table and plot for example?  One method is applying the grid with the # of columns - ODS LAYOUT
code; ODS LAYOUT END;, and the second method is using absolute positions
5.5 IN; * PROC TABULATE code; ODS REGION X = 5.5 IN Y = .5 IN HEIGHT =
5 IN WIDTH = 5.4 IN; 

16. What is a good SAS 9.2 graph procedure that has useful options to help prevent the need to customize graphs? - PROC SGPLOT

17. In PROC GPLOT, what is the syntax to create scatter plots and group plots? - PLOT X*Y for scatter plots and PLOT X*Y=Z for group plots;

18. For PLOT options, what do HAXIS and VAXIS stand for? - HAXIS for the X or horizonal axis ans VAXIS for Y or vertical axis.

19. What are the advantages of using the ANNO= option?  The ANNO= option allows you to specific a dataset that contains data specific graph symbols.

20. What is the option on the AXIS statement to control the starting, ending and by value for scales? - ORDER=(10 to 20 by 5) 

21. What is the statement to add symbols to each line? - SYMBOL1 or PATTERN1.

22. In general, which of the following three SAS procedures has more options (PROC SGPANEL, PROC SGSCATTER or PROC SGPLOT)? PROC SGPLOT has many options such as SCATTER, SERIES, DOT, and HISTOGRAM.

23. What is one way to change ODS output file to nodate, nonnumber, missing=0 and landscape? - OPTIONS NODATE NONNUMBER MISSING='0' ORIENTATION=LANDSCAPE;

24. In PROC SGPLOT, what is the difference between SCATTER and SERIES statements? SCATTER plots dots only and SERIES connects the dots to form a line.  Note that there can be multiple SCATTER and SERIES statements plotting different variables for greater customization.

25. What options exist for MARKATTRS? - Options include SYMBOLS, COLOR, SIZE.

26. In PROC SGPANEL, what options can be used to control the number of PARAMCD plots per page? - on the PANELBY statement, you can use ROWS=2 for example to display two plots within a panel and have three panels per page.  Another option is SPACING.

27. What PROC SGPLOT statement allows for plotting multiple lines? - each SERIES statement can be used to plot each variable.

28. What are some key features of PROC SGPLOT over GPLOT - Better control of axes in different scales and intervals, Provide flexible ways to control graph appearance, Plots overlay in single-cell or multi-cell graphs, Enhanced annotation and Classification panel of multiple plots with individualized labeling.

29. What options in PROC SGPANEL are useful for comparing plots side-by-side? - COLUMNS= and ROWS= compare x number of plots together as columns or as rows.

30. What option is important to set to create graphs? - ODS GRAPHICS ON;

31. Which statement is useful for inserting text in plots? - Use INSET as in proc sgplot data=sashelp.class noautolegend; title 'Cubic Fit Function'; inset "R(*ESC*){sup '2'} = &r2" "(*ESC*){unicode mu}(*ESC*){unicode hat} = &mean" / position=topleft; reg y=weight x=height / degree=3; run;

32. What is one way to create docx file instead of rt file? - You can use the RTF file syntax but have the DOCX as the file extension. ods tagsets.rtf file="c:\temp.docx" options(doc="help" sect="none" trhdr="\trrh800" trowhdrcell="\cltxbtlr"); proc print data=sashelp.class style(obsheader header)={just=l}; run; ods tagsets.rtf close;  See other SAS paper one and two.

33. When creating patient profiles, what are options to consider? - Depending on the end user, both technical and medical professionals may make critical decisions from reading and reviewing individual patient descriptive and summarized data similar to having direct access to the case report forms.  

Baseline and Disposition, Summary information such as dosing, Graphs and Listings are the main components. ODS/Template/RTF TextProc ReportPROC SGPLOT  9.3 (to display day=1), Domains (DM, AE, LB, CM, etc.), PUT() to map character values to meaningful words in sentences such as his/her.  Other tasks include preparing the input dataset to contain only patient's data in structure, order and variables to run within a macro.  Also, formatting issues such as bold, page breaks and page numbers also need to be programmed. Transpose the data as needed.

34. What is the syntax for creating datasets from SAS procedures? - ODS OUTPUT BASICMEASURES = MEASURE; PROC UNIVARIATE DATA=DEMOG; VAR WEIGHT; RUN; ODS OUTPUT CLOSE;

35. In boxplots from PROC SGPLOT, what is an option to remove the default Mean Marker? - Apply NOMEAN option.  To change mean marker attributes, use the MEANATTRS option.

36. What options exist for creating oncology plots - There are five possible plots- 1) Paneled bar chart showing stratified results, 2) waterfall plot, 3) spider plot showing change in tumor size, 4) response duration timeline plots and 5) survival plots.

37. What is one method to control the x variable order in bar charts using PROC SGPLOT?  See SAS paper.



1. Which SAS procedures are useful for saving the SAS system options to a SAS dataset and reloading SAS system options from a SAS dataset? - PROC OPTSAVE, PROC OPTLOAD.

2. Which SAS procedures are useful to direct the log and list to external files? - Use PROC PRINTTO LOG= for the log file and PROC PRINTTO PRINT= for the list file.

3. When creating transport files from SAS datasets with longer than 8 character variable names, are variable names truncated to 8 characters long? - Yes, be aware of this issue.




1. Is it possible to append more than one dataset at a time? No, repeat PROC APPEND for each new dataset to append.




1. Which option is useful to display variables in the order they were created instead of the default alphabetical order? - Add POSITION to the PROC CONTENTS statement.

2. See PROC DATASETS as an alternative. See CONTENTS option.

proc datasets library=sdtm; contents data=dm details varnum memtype=data; run; quit;




1. What are useful PROC COMPARE options to display all variables for each difference found? - Use LISTALL OUTDIF OUT= option or OUTNOEQUAL option;

2. How can you identify any non-matching key variables from PROC COMPARE? - Use OUT=RESULT OUTNOEQUAL OUTBASE OUTCOMP options .  Since the RESULT dataset is expected to be sorted by BASE and COMPARE repeated, in a DATA step, you can apply the condition IF LAG(_TYPE_) = _TYPE_ to identify the non-matching key variables. You can also add the OUTDIF option to add a new record to show the variable differences for each record.  As an alternative, can use PROC SQL's EXCEPT join on the larger dataset as LEFT selecting the key variables.  Other options include removing one key variable at a time to identify the differences.

3. What are useful PROC COMPARE options to confirm exact matches?


4. What are useful PROC COMPARE options for flexible matches?
- Use METHOD=RELATIVE CRITERION=0.00001 options;

5. What are useful PROC COMPARE options to display a sample of differences? - Use default options.  In general, the MAXPRINT= option is set to 50 which will display the first 50 mis-match records for each variable compared.

6. What options are available to display non-visual or hidden values differences from PROC COMPARE?
- As an option, apply temporary FORMAT BINARY16. format for numeric variables or $BINARY16. format for character variables to display the actual stored value representation.  An alternative is apply $HEXw. format or in DATA step before, X=compress (x, ,’kadp’).  Once identified, one solution for numeric ID variables is to round to one integer for example.

7. By default, if two datasets have different number of variables, will PROC COMPARE automatically compare the common variables and ignore the non-common variables? - Yes, use the LISTALL option to display all non-common variables.

8. If variable names are the same but the types are different, does PROC COMPARE still compare the values? - No, only after the variable types match will there be comparisons performed.

9. Is there a method for displaying extra variables in PROC COMPARE results to help identify is the correct value (base or compare)? - Yes, by adding the extra variable at the end of the ID statement, the common value will be displayed with each base/compare value difference. 

10. Instead of the default of comparing variables of the same name, is there an option to compare variables that are not the same name? - Yes, use VAR and WITH to list the variables in the same order to compare.

1. What is one way to compare and display differences in a derived variable? - Include the source and derived variables in the BY and ID statements such as - PROC SORT DATA=ADSL; BY SUBJID WEIGHTBL HEIGHTBL BMIBL; RUN; PROC SORT DATA=QC_ADSL; BY SUBJID WEIGHTBL HEIGHTBL BMIBL; RUN; PROC COMPARE BASE=ADSL COMPARE=QC_ADSL LISTVAR LISTOBS LISTEQUALVAR; ID SUBJID WEGITHBL HEIGHTBL BMIBL; RUN;

12. What PROC COMPARE option exists to determine changes that sould be made with UPDATE statement? - Use NOMISSCOMP to print a list of the changes that will be made to a master data set in a DATA step UPDATE statement. data newpay; update payroll increase; by id; run; proc compare base = payroll compare = increase nosummary nomisscomp; id id; run;

13. What is the option to display differences between PAYROLLV02 and PAYROLLV01? - proc compare base = payrollv01 compare = payrollv02 briefsummary listall transpose; id id; run;

14. How exactly does PROC COMPARE compare formats? - PROC COMPARE compares the unformatted values and treats the formats as an attribute.

15. What is a useful alternative to PROC COMPARE when there may be differences in key variable values? - Use DATA Step merge to join the two datasets and rename the second variable names.  data adqs_diff; merge adqs (in=a keep= usubjid paramcd adt qsseq avisit aval) adqs_org (in=b keep= usubjid paramcd adt qsseq avisit aval rename=(avisit=AVISIT_ORG aval=AVAL_ORG)); by usubjid paramcd adt qsseq; if aval > . then avalf='non-missing'; else if aval= . then avalf='missing'; if aval_org > . then aval_orgf='non-missing'; else if aval_org= . then aval_orgf='missing'; if a and b then dfile='both'; else if a and not b then dfile='anotb'; else if b and not a then dfile='bnota'; run; proc freq data=adqs_diff; tables usubjid*avisit*avalf*avisit_org*aval_orgf/list missing nocum; where avisit ^= avisit_org; format usubjid $20. avisit avisit_org $10.; run;

16. What is one alternative for displaying more ID variables when ID variables are shorten? - Use BY and ID statements to compare by section.

17. What is one method for creating a dataset of only differences in variable values? - proc compare data=demog compare=compare outnoequal out=toprint noprint outcomp outbase; id unique; run; proc transpose data=toprint out=transp; by unique _obs_; id _type_; run; proc print data=transp (where=(base^=compare)); run;

18. What is a method to create a Discrepency / Reconcilation Report without using Proc COMPARE?

- proc sql; create table dc as select coalesce(a.usubjid, b.usubjid) as usubjid, coalesce(a.paramcd, b.paramcd) as paramcd, coalesce(a.aperiod, b.aperiod) as aperiod, ifn(a.peakdt = b.peakdt, 1, 0) as df_peakdt, ifn(a.ttpeak = b.ttpeak, 1, 0) as df_ttpeak, ifn(round(a.peakval, 3) = round(b.peakval, 3), 1, 0) as df_peakval, ifn(round(a.aucaval, 3) = round(b.aucaval, 3), 1, 0) as df_aucaval, a.peakdt as a_peakdt label='a_peakdt', b.peakdt as b_peakdt label='b_peakdt', a.ttpeak as a_ttpeak label='a_ttpeak', b.ttpeak as b_ttpeak label='b_ttpeak', a.peakval as a_peakval label='a_peakval', b.peakval as b_peakval label='b_peakval', a.aucaval as a_aucaval label='a_aucaval', b.aucaval as b_aucaval label='b_aucaval' from z6srt.adpcrs as a full join z6ash.adpcrs as b on a.usubjid=b.usubjid and a.paramcd=b.paramcd and a.aperiod=b.aperiod; quit;

proc sql; select name , sex , ifn(sex="F", 1, 0) as female from sashelp.class; quit;

19. Code from SAS Savvy Training Session on Alternative method to proc compare - * Create sample datasets;proc sort data=sashelp.class out=class; by name; run; data class; length Sex 8.; set class (drop=sex); Sex=1; age=10; if _n_ <= 10; keep name sex age; run; * Compare variable attributes between two similar datasets; 

* 1 - physical metadata - compare of dataset file attibutes; proc sql; create table class_1 as select libname, memname, crdate, nobs, nvar, filesize from sashelp.vtable where upcase(libname) in ('SASHELP' 'WORK') and upcase(memname) in ('CLASS') order by memname; quit; proc print;run; proc transpose data=class_1 out=class_2 (drop = _name_ _label_); by memname; id libname; var crdate; run; 

* 2 - variable metadata - compare dataset attributes; * this approach enables var by var comparison instead record by record comparison if single proc sql; proc sql; create table class_sashelp as select name, type, length, label, format from sashelp.vcolumn where upcase(libname) = 'SASHELP' and upcase(memname) = 'CLASS' order by name; quit; 

* domain dictionary table; proc sql; create table class_work as select name, type as b_type, length as b_length, label as b_label, format as b_format from sashelp.vcolumn where libname = 'WORK' and upcase(memname) = 'CLASS' order by name; quit; data class_2; merge class_sashelp (in=a) class_work (in=b); by name; if ^(a=b) or ^(type=b_type and length=b_length and label=b_label and format=b_format) then diff=1; run; proc sort data=class_2; by descending diff; run; proc print;run;

* SDTM data;

libname sdtm 'C:\project\studies\116\sdtms\datasets';

* list all sdtms;

proc sql noprint;

select unique memname into :sdtmd separated by '.' from sashelp.vcolumn where upcase(libname) = "SDTM"; quit; %put &sdtmd;

* loop through all sdtms;

%let sdval = 1;

%macro sdtmv;

%do %while(%length(%scan(&&sdtmd, &sdval, %str('.'))) > 0);

%let sdtvc = %scan(&&sdtmd, &sdval, %str('.'));

* import sdtm specs;

proc import datafile='C:\project\studies\Standards\CDISC SDTM Template Specs IGv3.3_compliance.xls'

out=&sdtvc dbms=xls replace; sheet="&sdtvc"; run;

proc sort data=&sdtvc (keep = variable_name variable_label type core format codelistname);

by variable_name; where variable_name > ''; run;

data &sdtvc; set &sdtvc;

if type='text' then type='char'; else if type='integer' then type='num'; run;

* read sdtms as x.1;

proc sql;

create table &sdtvc.1 as

select name as variable_name, type as type1, length as length1, label as label1, format as format1

from sashelp.vcolumn where upcase(libname) = 'SDTM' and upcase(memname) = "&sdtvc" order by name; quit;

* compare attributes;

data diff; merge &sdtvc.1 (in=a) &sdtvc (in=b); by variable_name;

domain="&sdtvc"; if ^(a=b) or ^(label1=variable_label) or ^(type1=type) then diff=1; run;

proc sort data=diff; by descending diff; run;

* append results; PROC APPEND BASE = diff_all DATA = diff FORCE; RUN;

%let sdval = %eval(&sdval + 1); %end;

%mend sdtmv;

* create shell dataset;

data diff_all; attrib domain format=$10. diff format=8. variable_name format=$10. variable_label format=$40. type format=$10.

core format=$15. format format=$10. codelistname format=$15. type1 format=$10. length1 format=$10.

label1 format=$40. format format=$10.; run;

%sdtmv; proc print data=diff_all; where domain > '';run; 

b. Discrepancy / Reconciliation Report 

proc sql; create table dc as select coalesce(, as name, ifc( > '', 'Y', 'N') as a_usubjid label="a_name", ifc( > '', 'Y', 'N') as b_usubjid label "b_name", a.age as a_age, b.age as b_age, as a_sex, as b_sex from sashelp.class as a full outer join class as b on strip(; quit;

20. What PROC COMPARE options are useful to direct all differences to SAS log - NOVALUES WARNING NOPRINT.

21. What options exist to better view subject numbers?  Apply TRANWRD() function to remove study id.  

data adsl_source; set adam.adsl; usubjid=tranwrd(usubjid, 'TAK-861-1001-', ''); run;



1. In general, what is the difference between VALUE and INVALUE when creating formats? - In general, VALUE is used to display variables and INVALUE is used to read or create variables.  VALUE is ideal with PUT() function to convert NUM to CHAR.  INVALUE is ideal with INPUT() function to convert NUM to NUM or CHAR to NUM.  Note that one trick to apply INPUT() on number variables is to first apply PUT(, best.) to make it character.  See also PUT/INPUT functions.

2. What option is useful to control the sort order instead of applying the default sort alphabetical sort order by formatted values?- Apply PROC FORMAT (NOTSORTED) option.  With PROC MEANS or PROC TABULATE, also apply PRELOADFMT and ORDER=DATA.  With PROC REPORT, also apply one of these options - DATA, FORMATTED, FREQ,  and INTERNAL.

3. Which option is required to access format catalogs not using the reserved word LIBRARY? LIBNAME MYLIB 'C:\MYFILES'; OPTIONS FMTSEARCH=(MYLIB.MYFORMAT); Both the library and format file name should be specified if FORMATS is not the name of the format catalog.  Note that just using PROC FORMAT to display or create a format catalog is not the same as using a format catalog.  Temporary formats created using PROC FORMAT will be available for the session.  

4. Which option is useful to prevent missing format errors when the format catalog is not available?  OPTIONS NOFMTERR; enables viewing the dataset variables as the values are stored.  Another option is to create another dataset without any formats.  DATA TEMP02; SET TEMP01; FORMAT _ALL_; RUN;

5. While PROC FORMAT works on one variable, what options exist when trying to map based on multiple variables? - One technique is to apply lookup tables or a PROC SQL many-to-many join.  Note that in general, there two types of lookup tables: a. adding a new variable from the second dataset which will be populated for matched records, b. flag variable set to Y for matched records and N for unmatched records.  See also DATA Step/Merge.

6. What keywords are useful to identify lowest and highest numeric values? - Use LOW and HIGH in the format.  Can also use OTHER to group all other values.  Any value not in the format without the OTHER keyword are displayed as is.

7. Is it possible to use a character format to convert from one character value to another character value? - Yes, first create a character informat with INVALUE and then apply INPUT(OLD_CHAR_VALUE, $NEW_CHAR_INFORMAT.);  In this example, the informat is treated like a format.

8. Is it possible to use a numeric informat to convert from a character value to a numeric value? - Yes, first create a numeric informat even though it seems to be a character format since 'xx'=#, ex. INVALUE RACEF 'WHITE'=1 'OTHER'=2; and then apply INPUT(RACE, RACEF.);

9. Is it possible to use a numeric informat to convert from a numeric value to another numeric value? - Yes, first create a numeric informat, ex. INVALUE CONVTPCT 1=100 2=75 3=50 4=25 5=0; and then apply INPUT(RAND01, CONVTPCT.);

10. Is it possible to disassociate a format from a variable? - Yes, within a DATA step, specify the variable without a format name as FORMAT X;

11. In general, what columns are needed in an excel file to use as the source for creating a format catalog? - START, END, LABEL, FMTNAME and TYPE.  Note that character formats start with '$' in their names and that both formats and informats can be created.

12. Is there a quick way to create a missing report by counting non-missing values for both numeric and character variables? - Yes, create two formats, one numeric and one character using other and '.' for numeric and '' for character and apply as PROC FORMAT; VALUE MVALUE OTHER='NON-MISSING' .='MISSING'; VALUE $MVALUE OTHER='NON-MISSING' ' '='MISSING' ''='MISSING'; RUN; PROC FREQ; TABLES _ALL_/MISSING; FORMAT _NUMERIC_ MVALUE. _CHARACTER_ $MVALUE.; RUN;  Results will be displayed as total counts for NON-MISSING and MISSING.  See also Characterize Data task.

13. In general, what is the difference between a format and picture? - Formats are used to convert values, both numbers and characters, while pictures are used to convert only numbers but also enable rounding, adding leading zeros, embedding characters withing numbers such as for phone or social security numbers, and adding prefix or suffixes. 

14. What is one technique useful for creating case insensitive informats? - In PROC FORMAT, apply the (UPCASE) option after the informat name and then use upcase codes.  This will apply the UPCASE() function first to the value and then match the codes to identify the label. To use this a VALUE format, apply the INPUT() function.

15. What is the syntax to print a format catalog? - With the FMTLIB option and LIBRARY= to the format catalog library, you can print the format catalog to the SAS list file.  You will need to repeat this for each format catalog.  PROC FORMAT FMTLIB LIBRARY=MYFMTLIB; RUN; 

16. From a format catalog, is it possible to tell which operating system it was created from?  Yes, see SAS Tech Support.

17. Is it possible to create a format that overlaps so that values are counted twice in tables? - Yes, apply the MULTILABEL format option, for example, for year end reports that include by quarter and by semi-year.  This is most helpful in PROC MEANS, PROC TABULATE and PROC REPORT.  Note also note to use the same boundary numbers for example - PROC FORMAT; VALUE TIMEINT (MULTILABEL) LOW - 1.99 = '< 2' 2 - HIGH = '>= 2'; RUN;

18. Does using a numeric format such as XX.X also round to XX.X? - No, FORMAT XX.X only truncates the display to one decimal place, along with FORMAT XX.X, you need to ROUND(VAR1, 0.1).

19. When using PROC FORMAT, what the options to read in and save to a dataset? CNTLIN= reads dataset to create a format catalog and CNTLOUT= create dataset from the format catalog.   A great method to know exactly what values to assign to EEXCL and HLO is to manually define the format and the use CNTLOUT to view the dataset.


VALUE PERSON LOW-2='Baby' 3-<13='Child' 13-19='Teen' 20-64='Adult' 64<-HIGH='Senior'; RUN;

To get this as output dataset which can also be used as input

Fmtname Start End Label Sexcl Eexcl HLO

Person LOW 2 Baby N Y L

Person 2 12 Child N N

Person 13 19 Teen N N

Person 20 64 Adult N N

Person 64 100 Senior Y N H 

20. In PROC FORMAT, what happens if the range end values “touch” each other? - SAS assigns the value to the first range. Ex. value Scores 0 – 30 = ‘Range 1’ 30 – 60 = ‘Range 2’ 60 – 100 = ‘Range 3’; This is better value Scores 0 – 30 = ‘Range 1’ 30 <– 60 = ‘Range 2’ 60 <– 100 = ‘Range 3’;

21. Is it possible to group character strings by keyword?  Yes, with Proc FCMP.  Note that LIKE is not supported to group all values with 'aspirin' to Aspirin and all other text to 'Not Aspirin'.  LIKE performs similar to INDEX() > 0.  This is similar to metadata quality for flaging keywords in string phases.  OTHER is useful to group all text not specified in PROC FORMAT.  '' and . are useful to group non-missing and missing. 

options cmplib=(work.functions); proc fcmp outlib=work.functions.formats; function aspirin_format(drug_name $) $; length drug_type $11; if (indexw(lowcase(drug_name), 'aspirin') gt 0) then drug_type = 'Aspirin'; else drug_type = 'Not Aspirin'; return(drug_type); endsub; run; quit;

data work.drug; length drug1 drug2 $45; infile cards truncover; input drug1 $char45.; drug2 = drug1; cards;

Aspirin aspirin 100mg aspirin 50mg aspirin 10% HCl 50 mg Aspirin 50 mg Aspirin tonight ibuprofen ; run;

proc format; value $aspirin other=[aspirin_format()]; run; quit; proc print data=work.drug label noobs; var drug1 drug2; format drug2 $aspirin.; label drug1 = 'Drug' drug2 = 'Type'; run; 

 proc format; value $aspirin (like ‘%aspirin%’)=‘Aspirin’ other = ‘Not Aspirin’; run;

* AGEGR1=put(AGEGR1N, agef.); proc format; value AGEGR low - < 45 = "<45" 45 - < 65 = "45 - <65" 65 - high = ">=65"; run;

22. What is the difference between INVALUE (INPUT statements or INPUT()) and VALUE?  VALUE statement is a tool to create formats to enhance the display of the variable values whereas INVALUE statement is used to define the informats for reading and converting the raw data values into understandable terms.




1. What are some useful PROC FREQ options to list unique combinations of multiple variables such as VAR1*VAR2*VAR3? - Use LIST and MISSING options to display counts as rows instead of table structure.

2. What is the option for controlling PROC FREQ results by order of frequency? - Use ORDER= FREQ.  Use ORDER=FORMATTED to order by the format created.

3. How can you control the grouping order when crossing several variables?  Specify the order or multiple variables as the most important grouping variable first followed by the next grouping variable etc, ex. GROUP1*GROUP2*GROUP3.  This controls the display order of the rows without changing the counts.

4. What is a key difference between the OUT= option in the TABLE statement and the OUT= option in the OUTPUT statement? - To get frequency counts only, you need to specify OUT= option in the TABLE statement.  In general, the OUT= option in the OUTPUT statement is ideal for saving statistical results which is specified as options in the TABLE statement.

5. Is there a method to display variable labels instead of variable names in cross tabulation results from Proc FREQ? - Yes, one method is to use variable labels as the variable name.  Another method is to modify the Proc FREQ table template.

6. What is one method for identifying and displaying only duplicate records? - Apply PROC FREQ on the combined variables, save dateset with ODS OUTPUT and then apply PROC PRINT with WEHRE FREQUENCY > 1;  Note that this works well to identify duplicate records but does not display all variables and all duplicate records.

7. What is a useful option for counting the number of unique values in all variables? - Yes, use the NLEVELS option.

8. How exactly is the WEIGHT statement used? - If your dataset already has the counts for the two variable combination, for example, then the WEIGHT statement is used to display the counts instead of count again.  Count is based on detail data and WEIGHT is based on summary data.

9. What tip can I use to create a codelist of all unique categorical values and merge with results from PROC CONTENTS? - Use PROC FREQ OUT= option for each categorical variable and output dataset from PROC CONTENTS to save as final dataset to display using PROC REPORT.

10. Is it easy to automatically create plots from PROC FERQ? - Yes, adding the PLOTS statement is needed.  proc freq data=sashelp.Heart order=freq; tables weight_status*smoking_status / plots=freqplot(twoway=stacked orient=horizontal); run;

11. What are the options for zer-fill from proc freq?  Use SPARSE and dummy dataset since PRELOADFMT option does not exist in Proc FREQ.  By default, table percentages are saved in output datasets.  To save row and column percentages, use the OUTPCT option.  See SAS paper to simulate PRELOADFMT in Proc Freq.   See SAS paper.




1. What is one technique to sort the data by one variable yet display values by another variable?  - USE PROC SORT by VISITDTN for example to sort records by visit dates and then PROC GPLOT;  PLOT RESULT * VISITLBL = SUBJID; 

2. What is one technique for controlling the alphapetical order display of values on the x-axis when you want to display for example SCR before Day 1, Day 2, etc.? - Can assign value of ' SCR' with a blank as the first character so that the sort order is preserved.

3. What are all of the different options to customize the VAXIS and HAXIS? - Use MINOR for minor tick marks, ORDER to set starting, ending and by value, VALUE to replace display values, OFFSET to display axis position and REFLABEL to define reference lines.

4. What is the option to display horizontal or vertical lines within a plot? - Use VREF for horizonal lines or HREF for vertical lines.  Correspondingly, use LVREF and LHREF for the line reference number.

5. What is one method to control the sort order of Forest Plots? - Sort the input dataset in descending order of the rows in the Forest Plot. 



1. What are useful PROC MEANS options when specifying multiple CLASS variables? - Use NWAY option to keep only the records containing all possible combination of the CLASS variables.  The NWAY option automatically removed individual or smaller combination of CLASS variables.  The _TYPE_ variable stores the level of each stats.  For example, 0 is the overall grouping, then it increases by 1 for each separate grouping and then to combined grouping.

2. Creating output datasets containing descriptive statistics including medians from Proc Means? - Use OUTPUT OUT = <dataset_name> p50 = <var_name>;  Other methods to get medians - MEDIAN/P50 in Proc Tabulate, MEDIAN().  Note that MEDIAN is not easily calculated using PROC SQL.  It may work for overall but may give inconsistent results with GROUP BY.  See SAS paper for PROC SQL trick.

3. What is an easy way to automatically assign variables names to store descriptive stats for multiple variables? - Use the AUTONAME option to include the original variable_name_stats.

4. In PROC MEANS, what is the difference between 'N Obs' and the N and NMISS options?  N Obs = N + NMISS.  N Obs is the count of all non-missing CLASS variables.  N is the count of all non-missing CLASS and analysis variable.  NMISS is the count of all non-missing CLASS variable and missing analysis variable.  This is why it is always important to include the N option to know the correct value.  Make sure to include MISSING option in the PROC MEANS statement to include missing CLASS variables.

5. What is one of the main differences between PROC MEANS and PROC SUMMARY? - The OUTPUT OUT= option is almost always used with PROC SUMMARY where PROC MEANS is mostly used to display results.

6. What is the main purpose for the ID statement in PROC MEANS? - The ID statement is used to include additional variables in the output dataset.

7. What is the main purpose of the LEVELS and WAYS option in PROC MEANS and PROC SUMMARY? - LEVELS adds the variable _LEVEL_ to the OUT= data table. This numeric variable counts the observations within _TYPE_. This means that when FIRST._TYPE_ is true _LEVEL_ will equal 1.  WAYS adds the variable _WAY_ to the OUT= data table. This numeric variable equals the number of classification variables that were used to calculate each observation e.g. for a three way interaction _WAY_ will equal 3.  

8. Are there options to identify the record associated with the minimum or maximum value? - Yes, use the MINID and MAXID options in the output dataset.  PROC SUMMARY DATA=DEMOG NWAY; CLASS RACE EDU; VAR HT WT; OUTPUT OUT=STATS MEAN= MEANHT MEANWT MAX=MAXHt MAXWT MAXID(HT(SUBJECT) WT(SUBJECT))=MAXHTSUBJECT MAXWTSUBJECT; RUN;



Major sections - Within or Outside of Table: 


 B. BODYTITLE / BODYTITLE_AUX / Header and Footer section

 C. PageXofY / Titles / Footnotes / PROC REPORT COMPUTE Block within table

 D. Proc Template for within table formatting options (bold, italics, etc.)


 F. For Graphs, use Titles and Footnote statements without BODYTITLE to get titles/footnotes in header/footer sections and then GTL (Proc Template and Proc SGRENDER) to get titles/footnotes within the graph body.  Use LAYOUT section to create output that matches Proc SGPLOT, for example, so that Proc SGPLOT graphs can be migrated to GTL.  See mapping table.  See steps to use GTL. See Proc SGPLOT to GTL example.

Structure/Format Task

Syntax/SAS Paper 
Alignment, Decimal Places

Alternate Row Shading

Bold Text

 RTF-Control Word b Paper
 Cell Color and Font change

Columns (Custom Styles)

 ODS Tricks Paper

 Columns Stacking and Indenting

 Proc Report Tutorial Paper

 Data Driven Footnote


 Group and Order variables

Header (Custom)


 Header Background Color change


 Header Font change

 STYLE HEADER FROM HEADER/ FONT=('times', 11pt, bold italics); 

Italics Text and Special Characters such as *

 ODS ESCAPECHAR and COMPUTE BLOCK Paper, Multiple RTF Control Words

Leading or trailing spaces as well as line breaks


 SAS Paper

 Page Breaks

 BREAK AFTER Paper (N/A for RTF)
 Page Margins


COMPUTE BLOCK Paper, Other tags include {sub text}, {dagger}, {thispage}, {lastpage}, {pageof};

Postprocessing of RTFs

 Data _NULL_
 Special Characters

* 1 - Lookup unicode #, ex. 2264 for <= or 2265 for >=,;
* 2 - Insert in titles or char variables;

ods escapechar='^';
title 'Roman Numeral twelve is ^{unicode 2264}';
data test;
 a='^{unicode 2264}' || "20';

 UNICODE function (9.2), Hex# to Decimal# (See e-guide), $UCS2B4. Informat, Unicode Lookup

You can also lookup Unicode # - Start > Programs > Accessories > System Tools > Character Map

Table of Contents,  Sub/Superscripts and Hyperlinks

 RTF Control Word Paper
 Table Frame  
Proc Template Paper

Text before/after table
 (Alternative to titles/footnotes)

Titles/Footnotes (Custom)  

Titles/Footnotes (Custom - RTF control words)



 Proc Template TABLE Template Paper

1. How many general summary table layouts exist? - In general up to 6 table layouts exist - continuous summary table, categorical summary table, data listing, data listing with subtotals, data listing using summary input data.

2. What are options to 'fill-in' missing values when using GROUP or ORDER in the DEFINE statement? - Can use COMPUTE block to save and replace missing values.

3. Does SAS generate an error message if you specify a variable name in the COLUMN statement that does not exist in the dataset?  No, SAS assigns COMPUTE to that variable and assigns it missing values.

4. How does SAS handle user defined formats when DEFINE statements are not specified?  SAS automatically applies the user defined format with the FORMAT= option as default in the DEFINE statement.  You an override this with your own DEFINE statement.

5. How can you control the sort order of a PROC REPORT listing?  One technique is to presort the data set with the BY variables and then list the BY variables first in the COLUMN statement.  In addition, in the DEFINE for each BY variable, specify ORDER option.

6. How can you override the default of calculating numeric variables?  By specifying the CHAR option on the COMPUTE statement, you can concatenate character variables.

7.  How can you prevent variables from being displayed when they are required in COMPUTE blocks?  Keep the required variables in the COLUMN statement but also create DEFINE statement and specify the NOPRINT option.

8.  In general, what is required when computing new variables?   Generally, a descriptive statistics such as AGE.SUM needs to be in the COMPUTE block.  This applies even if age values are listed and not summarized.  The table is only summarzied if there is a SUM in the DEFINE statement.

9. Is it possible to create a character column based on a numeric variable?  Yes, apply the CHAR option in the COMPUTE statement and apply PUT() function to convert variable type.  Remember to also inlcude the numeric variable in the COLUMN statement and apply NOPRINT in the DEFINE statement to prevent displaying the numeric variable.

10. Does SAS automatically create DEFINE statements for each variable if it does not already exist? - Yes.  If not already specified, SAS also automatically assigns default values for the format, width, spacing, alignment and label.  In addition, for computed variables, SAS also automatically assigns COMPUTED in the DEFINE statement.

11. What is one technique for displaying '()' around values?  Use PICTURE with PREFIX option to display '()' such as '(1.5)'.

12. What is one technique for repeating GROUP and ORDER variables across page breaks?  Use the TAGSETS.RTF instead of the RTF destination along with UNIFORM and PAGEPANELS=NONE in the ODS statement and SPANROWS in the PROC REPORT statement.

13. What is one technique for 'counting' the number of observations in a table to help identify page breaks? - Create a PAGEIT variable in advance and apply it in PROC REPORT.

14. What are these options used for - _PAGE_ , _BREAK_, and _RBREAK_? A BREAK statement is required to use the _PAGE_ option. The _BREAK_ variable is only created when writing to a dataset.  These are possible values of _BREAK_ = Blank/space means no break processing was used on this report row, Variable name means that variable appeared on a BREAK statement and the name of the variable was put into _BREAK_ variable, _RBREAK_ means that you had an RBREAK statement in your code, _PAGE_ means that you had the PAGE option on your BREAK statement.  See how to create an output dataset.

15. What are some options when you want to specify two variables for page breaks?  One option is to use the BY statement in PROC REPORT on your main category variable and then the  BREAK /PAGE statement on the page count variable.  Another option is within the DATA step to reset the BREAK variable based on changes in the two page break variables such as SUBJID and _N_.  A third option is to use the COMPUTE BEFORE statement on the SUBJID to display the SUBJID number on the top of each page.  When applying this option, remember to place the page break variable after the SUBJID variable so that page breaks are applied with the main grouping factor of SUBJID.  Note that the third option does not force a page break with new SUBJID values. 

16. When outputting SAS output to Excel with ODS, using Proc Report with BY group processing, the output file has new tab for each BY group. For “BY Manager_Area”, the tab labels, by default, are like “Manager_Area=Area 1” etc.    How do we get rid of the “Manager_Area=” part and just have “Area 1” on the tab label? - Use the NOBYLINE system option to remove the default BY line and use the TITLE statement with #BYVAL(BY_VARIABLE) keyword to display 'BY_VALUE' instead of 'BY_VARIABLE = BY_VALUE'.  See custom TITLES with #BYVAR and #BYVAL keywords and BYLINE system option.

17. Is there a method to consolidate two columns? - Yes, you can reference each column in a COMPUTE BLOCK to display the one column.

18. In general, is PCTN used for calculating percent values? - No, mostly because they may not work well with summary statements such as BREAK and RBREAK. See SAS paper for example.

19. Which are popular PROC REPORT options that are for Listing only? - WIDTH, OL, UL, DOL, and DUL, HEADLINE, HEADSKIP, LS= and SPACING.

20. What is the difference between ORDER and GROUP? - Order displays the first unique value in order at the record level and GROUP combines all common values into one record.

21. Is presorting the dataset still required even if GROUP option is specified? - Yes, since PROC REPORT does not sort datasets, PROC SORT is required in advance.

22. In general, it is easier to calculate percentages using PROC FREQ or PROC TABULATE instead of using PROC REPORT.  You can use PROC FREQ or PROC TABULATE to save percentage results in a dataset to display using PROC REPORT.  The only exception is when calculating simple, one or two variables crossed percentages such as SASHELP.CLASS.SEX or SASHELP.SHOES.REGION with PRODUCT.   See SAS paper for example of crossing two variables.  See SAS example of two grouping variables.

23. Is there a SAS reference that lists outstanding PROC REPORT syntax? - Yes, the SAS blog for complete list of old options. Compare with other SAS Procedures blog.

24. What is the effect of BODYTITLE?  With the BODYTITLE option, the titles and footnotes are placed with the table instead of the default in headers and footnotes sections.

25. What is the difference between ORDER and GROUP options? ORDER controls the record sort order for displaying all records but not repeating the ORDER variable while GROUP summarizes all other variables by the GROUP variables.  GROUP also does not repeat values.

26. Is presorting the dataset required with the GROUP option? Yes, PROC REPORT does not sort datasets.


1. Creating a dataset containing one record per patient from multiple records per patient such as patient, visit and AE events? - First PROC SORT selects only AE event records with WHERE clause, then the second PROC SORT applys NODUPKEY option with BY _ALL_ to prevent any double counting of AEs in the OUT= dataset.

2. Saving duplicate records in another dataset? - Use DUPOUT = <DATASET NAME> option with NODUPKEY or NODUPREC option.  Remember to also apply OUT= <DATASET NAME> to save all unique records to temp dataset.  If not, then the original dataset in the DATA= <DATASET_NAME> will be updated, which may not be wanted.  Note that this saves only one record of the duplicate records.  Can use PROC SQL with the group descriptive statistics to save all duplicate records.

3. Quick check for duplicate records? - Apply NODUPKEY option with BY _ALL_ to save to a temp dataset and then check log for any records deleted.

4. When needing to subset and apply the NODUPKEY option, is there a difference in the order? - In general, apply the subset condition before or during the NODUPKEY option in PROC SORT to assure correct records are selected.  By applying the subset condition after the NODUPKEY option, there is risk of getting incorrect results if the BY statement does not include the same variable used in the subset condition.

5. What is the main difference between NODUPS and NODUPKEY? - NODUPS prevents duplicate consecutive records from all variables from being written to the dataset, while NODUPKEY prevent duplicate key variables written to the dataset so that the NODUPKEY option creates only one record per each unique value of key variable.

6. Other than PROC SORT, is there an alternative to remove duplicate records? - Yes, PROC MEANS with the MAX option can assure you get non-missing values and remove duplicate records.

7. For changing the default ascending order in PROC SORT, do you need to specify DESCENDING before each BY variable to change to descending for each BY variable?  Yes, else insert DESCENDING just before only the BY variable as needed.  One DESCENDING option only impacts the first following variable and not any other variable. 



1. Creating a non-missing variable from multiple datasets? - Use COALESCE() function

2. Creating character string macro variables without any blanks? - Use !! and TRIM() function

3. Creating an empty dataset with minimum code and no uninitialized notes? - Use PROC SQL; CREATE TABLE SHELL (NAME CHAR, AGE NUM); QUIT;

4. What technique is useful for identifying and adding baseline lab values in clinical trials? - Generally, the baseline visit is defined as the last visit before the first dose date.  PROC SQL; CREATE TABLE _BRES AS SELECT LBTESTID, SUBJID, RESULT AS BRESSAFE FROM QC_ALAB_RAW2 WHERE RESULT > . AND VISITDTF < FDOSEDTF GROUP BY SUBJID, LBTESTID HAVING VISITDTF = MAX(VISITDTF); CREATE TABLE QC_ALAB_RAW3 AS SELECT A.*, B.* FROM QC_ALAB_RAW2 AS A FULL JOIN _BRES AS B ON A.SUBJID = B.SUBJID AND A.LBTESTID = B.LBTESTID ; QUIT;

5. In general, when having nested summary functions such as SUM(SUM(VOL1, VOL2)*10) with a GROUP BY clause, what is the difference in the two SUM() functions? - The inner SUM() is first applied to total VOL1 and VOL2 at the record level times 10, then the outer SUM() is applied to total all of the individual records to get one record per GROUP BY variables.

6. When applying a condition to subset the first dataset A used in a LEFT JOIN with a second dataset B, does PROC SQL subset based on the condition after the join or apply the LEFT JOIN to keep all records in dataset A?  PROC SQL keeps all records in dataset A and ignores the condition on dataset A.  As an option, apply a WHERE dataset option to first subset dataset A.

7. Is it possible to select all variables in a dataset as well as apply the COALESCE() function to keep non-missing values in the same SELECT statement when joining two tables?  - In general, yes, make sure to first apply the COALESCE() function before selecting all variables to correctly define the possible non-missing variable.  Switching the order may not work.  SELECT UNIQUE COALESCE(a.dosegrp, c.dosegrp) AS dosegrp, A.* FROM conmeds_all2 AS A LEFT JOIN analyvar AS C ON A.subjid=C.subjid ORDER BY dose, scrid;  As an alternative to preventing the WARNING message, save to a new variable name such as B_DOSEGRP and then apply DROP DOSEGRP and RENAME B-DOSEGRP to DOSEGRP as dataset options.

8. What PROC SQL option can be added to prevent any warnings? - NOWARN option will prevent any warning messages.  This is helpful when using B.* which adds common key variable names, for example.  Make sure confirm results before turning off warning message.

9. Is it possible to control and loop through each record in a dataset as done with the _N_ in the DATA step?  - Yes.  With the WHERE MONOTONIC() = &I clause, you can insert Proc SQL within a do loop to increment the value of I. 

10. In general, do variables listed in the GROUP BY clause need to be included in the SELECT clause? - Yes, in general, you need to include, in the same order the same variables in the SELECT as you specify in the GROUP BY clause.  As an alternative, you can pre-sort the dataset with a variable you do not want in the PROC SQL to control the sort order.  This technique is helpful to sort the dataset based on a detail variable, and then to create one summary record to merge back without the detail variable.  As needed, you can then include new variables in the SELECT clause.

11. When submitting Proc SQL code in batch mode, is there a method to submit a line of syntax greater than 256 characters? - Yes, by including line breaks to break up the 256 character statement. Another option is to save the long Proc SQL line of code in a separate file and
%INCLUDE the code and set the LRECL= option as in this example - FILENAME MYFILE 'PATH TO FILE' LRECL=32767;

12. Is it possible to select more variables when applying EXCEPT or INTERSECT joins? - Yes, since EXCEPT or INTERSECT selects limited key variables to identify specific records, you can apply the EXCEPT or INTERSECT SELECT as a subquery that applies condition when selecting other variables.

13. What are some techniques to prevent this 'NOTE: The query requires remerging summary statistics back with the original data'?  If you use a summary function in a SELECT clause or a HAVING clause without a GROUP BY clause then you will get this note in the SAS log.  This message is only a NOTE which states that PROC SQL must remerge data (or make two passes through the table).  The first pass calculates and returns the summary function values and the second pass retrieves all other variables in the SELECT clause.  

Remerging does not cause incorrect results, just requires additional processing time and is often unavoidable. This is because the summary function creates overall summary stats based on all records and is not by a group variable which is generally used.  Grand percentages, however, are common without GROUP BY clause.  See SAS Paper, SAS Tip, SAS Blog, SAS Tech Note.

14. Is it possible to calculate event percentages such as total # of patients with events/total # of patients using PROC SQL? - Yes, see the e-guide for an example with the DATA step.  Additional useful information is total # of events.

15. What is one advantage that PROC SQL has over DATA step when joining datasets? - PROC SQL allows for joining by different variable names instead of requiring the same variable name.

16. What is the syntax to apply the colon modifier wildcard ':' to match based on values only and exclude blanks? - In PROC SQL, use NAME EQT 'Sunil';

17. What is the PROC SQL syntax to create macro variables? - PROC SQL; SELECT count(DEPT) INTO :DEPTCNT; QUIT;

18. When doing a many-to-many join without a WHERE clause, what is the danger? – Without a WHERE clause causes unrelated records to be linked from both datasets and creates a cartesian product. With the WHERE clause to link related records, each record per key variable is linked with each record in the second dataset. The correct total number of records is the number of records per key variable in the first dataset times the total number of records per key variable in the second dataset. Using the DATA step with or without the WHERE statement may cause incorrect joins and incorrect number of records. The number of records created is based on the maximum number of records per key variable from the datasets.


Below is the code to create flag for matching list of drugs to cm.

* create lookup tables;

data mhlst1;

input drug: $15.;













* join CM with lookup table to create flags;

* cartesian product for many to many merge;

* unique to keep only unique records;

* keep only master variables;

* create lookup flag variable that matches value;

proc sql; create table cm1 as select unique a.usubjid, a.cmstdtc, a.cmendtc, a.cmtrt, a.CMDECOD, case when a.CMTRT = b.drug or a.CMDECOD = b.drug then 1 else . end as mhlst1 label='Match MHLST Lookup Table' from as a, mhlst1 as b; quit; 

19. Is there a method to pull data from multiple datasets and create flag variables for non-missing values to get patient accountability? - Yes, first create a population dataset by creating flag variables using IFN(AGE > ., 1, .) and then LEFT JOIN with the population dataset to create a summary dataset with all selected variables to be used in PROC PRINT.  PROC FREQ is used to display all flag variable combinations and PROC PRINT with WHERE to display detail records.

20. What is the syntax to add variable attributes when creating a new variable? - The attributes can be any order after the variable name.  SELECT ‘my constant’ AS MYNAME LABEL = ‘My Name’ LENGTH= 11 FORMAT=$11.;

21. What is one technique for selecting the first non-missing variable in a list of variables? - You can use the COALESCE() function to do this.

22. Is it possible to specify COUNT(UNIQUE SUBJECT) to count the number of non-missing SUBJECT? - Yes, proc sql; create table avisitcnt as select unique a.usubjid, a.avisit, b.avisitcnt from bz1_adam.adbmk as a left join (select unique usubjid, avisit, count(unique usubjid) as avisitcnt from bz1_adam.adbmk where avisit > '' group by usubjid) as b on a.usubjid=b.usubjid; quit;

23. When displaying records based on a summary condition, which approach is best? - for identifying matches based on existing records, SUBQUERY is better to use since the condition selects existing records.  For identifying non-matches or exclusive based on missing records, EXCEPT is better to use since the condition selects missing or records that do not exist. title 'Check for missing important baseline records'; proc sql; select unique usubjid, paramcd from BZ1_adam.adbmk where paramcd > '' except select unique usubjid, paramcd from BZ1_adam.adbmk where avisit='BASELINE'; quit;  See CDM for edit checks.

24. What is an effective method for adding a flag variable based on group descriptive stats. - proc sql; create table cart3 as select a.*, case when b.pncart = ncart then 'Y' else '' end as peak length=1 format=$1. from cart2 as a left join (select usubjid, max(ncart) as pncart from cart2 where ncart > . group by usubjid) as b on a.usubjid=b.usubjid; quit;

25. What is the syntax for creating a data view? - proc sql; create view v_class as select * from sashelp.class; quit;

26. Does the DISTINCT or UNIQUE option in PROC SQL impact all selected variables? - Yes, DISTINCT or UNIQUE option must be specified before the list of variables and lists all unique combination of all selected variables.

27. What is one method for identifying non-matches between two datasets?  Using DATA Step program, you can use IF (A and not B) or (B and not A).  Below code can be used with Proc SQL.

* diff_cnt1 - Match between adam and metadata;

* diff_cnt2 - Combine adam and metadata;

* diff_cnt3 - Adam or metadata only - potential variables to remove in metadata file;

proc sql;

create table diff_cnt1 as select name, memname from cont intersect select name, memname from adsl;

create table diff_cnt2 (where=(memname='ADSL')) as select unique coalesce(, as name, coalesce(a.memname, b.memname) as memname

from cont as a full join adsl as b on and a.memname=b.memname;

create table diff_cnt3 (where=(name > '')) as select name, memname from diff_cnt2

except select name, memname from diff_cnt1;


28. What is one method to identify first. or last. record? - proc sql; create table first_bygroup_rows as select rating, title, 'FirstRow' as ByGroup from movies M1 where title = (select min(title) from movies M2 where M1.rating = M2.rating) order by rating, title;

29. What is a method for for comparing two variables and joining records by matching values - proc sql; select a.usubjid, a.cmstdtc, a.cmendtc, b.aestdtc, b.aeendtc, a.cmaenum, b.aenum, aedecod, crit5fl, crit5, cmstdy, ntmg from adp.adcm as a full join adp.adae as b on a.usubjid=b.usubjid where cmstdy > 0 and crit5fl='Y' and indexw(cmaenum, aenum, ' ,') > 0 and cmiss(cmaenum, aenum)= 0; quit;

30. Are there macro to calculate dataset vitals such as total number of observations, unique patient counts and if duplicate records exist?  Yes, see big_n collection of macros.  See SAS Paper.

31. Is it possible to insert CASE clause within an ORDER clause? - Yes, this is considered custom ordering, proc sql; create table sortshoes as select * from order by case region when "United States" then 0 when "Canada" then 1 else 2 end, region;quit;  

proc sql; create table sortcars as select make, type, invoice, mpg_city from order by type, case type when "SUV" then mpg_city else invoice end; quit;

32. What is the order of clauses in PROC SQL statements? - Select, From, Where, Group, Having and Order

33. What is the difference between FULL JOIN and FULL OUTER JOIN? - None, they both mean the same thing.  When applying FULL JOIN, it is best practice to use COALESCE(CLASS.NAME, STUDENTS.NAME) AS NAME, for example, to get non-missing common variables.  LEFT and RIGHT are also examples of OUTER JOIN but keep all LEFT or RIGHT records instead of all records as in FULL JOIN. Note also that INNER and WHERE joins are the same meaning they keep only matching records. See Proc SQL Mindmap.

34. Is it possible to custom sorting with the Proc SQL Case Statement? - Yes, proc sql; create table sortshoes as select * from order by case region when "United States" then 0 when "Canada" then 1 else 2 end, region;quit;

35. What is an example of merging datasets by different key variables and dates where one is datetime and the other is char?  The COALESCE() function is used to combine common variables, INPUT() function is used to convert char to num with BEST. format.  DATEPART() function converts datetime to date.  COMPRESS() function is used to remove '-' to make it consistent with DATE9. format.  FULL JOIN is used to pull from both datasets.

proc sql;  create table ncartv as select coalesce(a.subject, b.subjid) as subjid, coalesce(datepart(a.LABDAT), input(compress(b.lbdt, '-'), date9.)) as lbdt format= yymmdd10., a.wbc_raw, a.lympper_raw, a.MONper, input(b.rptresc, best.) as qppb, 1000*(a.WBC)*(a.MONper + input(a.lympper_raw, best.))/100* calculated QPPB/100 as ncart from rdp.lbhema1 as a full join rdp.pcr (where = (lbtestcd='QPPB')) as b on strip(a.subject)=strip(b.subjid) and datepart(a.LABDAT)=input(compress(b.lbdt, '-'), date9.); quit;

36. Is it possible to apply a condition within functions? - Yes, select "Male" as Variable, mean(Male) format percent10.2 as Percent, min(Male) format comma10.0 as Min, max(Male) format comma10.0 as Max, sum(Male=.) format comma10.0 as N_Miss, sum(Male=.) / count(Male) format percent10.2 as Pct_Miss from school_nodups;  Apply this method with caution.

37. Is it possible to apply functions in a CASE block? - Yes, create table &stagen as select unique 'check' as combo, case when count(&keyvr) = 1 then "No Dup " else 'Yes Dup' end as &stagen._dup label="&stagel: &stageds duplicate patient count" from &stagel..&stageds &cond group by &grpvr; quit;

38. Can Proc SQL be used for one to one joins of different dataset records? - No, Proc SQL, can only link datasets of same number of records since it set the number of records to the dataset with the minimum number of records, better to use MERGE in DATA step for one to one join.

39. What examples of adding integrity constraints to PROC SQL? - proc sql; alter table add contraint fred check ((sex='m' and pregnant='n') or sex='f'); quit; proc sql; alter table basefl_PS add constraint primkey primary key(keyvar), cvar_not_null not null (cvar); quit;

40. What is one method to combine overall Q1 and Q3 stats to all records in dataset? - Use proc means to get quartiles, since not in Proc SQL, else can use Proc SQL subquery. proc means data=ad_12mo.adbase noprint; var cd19hscr; where saffl = "Y" and aphasen in (2.1, 2.2); output out=adbaset (drop= _type_ _freq_)q1=q1hscr min=minhscr median=medhscr q3=q3hscr max=maxhscr; run; proc sql; create table adbase2 as select a.*, b.* from ad_12mo.adbase as a, adbaset as b; quit;

41. What is one method to identify differences in variable attributes?  - Use Proc sql.  * compare common variable attributes ex. type; proc sql; * all vars; create table comvar as select name, memname, type, length, label, format from sashelp.vcolumn where libname='SF' group by name having count(name) > 1; * one variable; select name, memname, type, length, label, format from sashelp.vcolumn where libname='SF' and name = 'PATIENT_ID_NUM' group by name having count(name) > 1; quit; 

* select differences in type; proc sort data=comvar; by name memname type; run; data comtype; retain namehld typehld; set comvar; by name memname type; if then do; namehld = name; typehld = type; end; if name = namehld and type ^= typehld; run; 

* merge back to get all records for var type differences; proc sql; create table comtype2 as select name, memname, type, length, label, format from sashelp.vcolumn as a right join (select unique name as b_name from comtype) as b on and libname='SF' group by; quit; 

42. What is an example of proc sql, case when sum?

Proc sql; create table baselinex as select unique trtp, usubjid, ’DERIVED’ as paramtyp, ’ALSGE3BL’ as paramcd, ’Elevated ALT or AST at baseline’ as param, 111 as paramn, case when sum (paramcd in (‘AST’ ‘ALT’) and ablfl=’Y’ and . < aval >= 3*input (anrhi, best.)) > 0 then ‘Y’ else ‘N’ end as avalc, ifn (calculated avalc=’Y’, 1, 0) as aval from lb (where= (dtype is NULL)) group by usubjid; quit; 

43. What is an example of boolean condition in Proc sql? This applies conditions and summarizes in one step. find() syntax See SAS paper example.  

proc sql; select Department, sum((find(Job_Title, "manager", "i") >0 )) as Managers, sum((find(Job_Title, "manager", "i")=0))as Employees, calculated Managers/calculated Employees "M/E Ratio" format=percent8.1 from SGF2020.employee_information group by Department; quit;  

What is similar code for Data Step?  if AEREL ne '' then AERELN=1*(AEREL in ('RELATED' ))+2*(AEREL IN ('NOT RELATED')); same as if AEREL ne '' then do; if AEREL = 'RELATED' then AERELN=1; else if AEREL = 'NOT RELATED' then AERELN=2;

proc sql; create table comp as select subject as patid, project, studyenvsitenumber, sum(ifn(exyn='YES', 1, 0)) as ex_yes, sum(ifn(exyn='NO', 1, 0)) as ex_no, (calculated ex_yes / (calculated ex_yes + calculated ex_no) *100) as ex_comp format=8.1 from EXyn group by subject; quit; 

44. In Proc SQL, what is the syntax for creating a data set and inserting some records like for data entry?  See SAS paper.

PROC SQL; CREATE TABLE table1 ( charvar1 CHAR(3), charvar2 CHAR(1), numvar1 NUM, numvar2 NUM INFORMAT=DATE7.) ;

INSERT INTO table1 VALUES('me1','F',35786,'10oct50'd) VALUES('me3','M',57963,'25jun49'd) VALUES('fg6','M',25754,'17jun47'd)

VALUES('fg7','F',.,'17aug53'd) ;

SELECT * ROM table1; QUIT; 

45. What is one example of using Proc SQL HAVING with a summary function such as SUM()?  See SAS paper.

proc sql; select avg(weight) as Median

from (select e.weight from class e, class d group by e.weight having sum(case when e.weight = d.weight then 1 else 0 end)

>= abs(sum(sign(e.weight - d.weight)))); quit;

46. What is an example of the IN operator in proc sql? Select State, City, Store, Year, Month, VarCost from work.Cost a where a.VarCost in (Select Varcost from work.Cost b having Varcost gt (1.25 * avg(varcost)));



1. In general, how many different layout combinations of summary tables exist? - Up to 10,752 combinations of tables - Seven Table Structures  X  Two Adding Variables  X  Four Ordering Data  X  Four Data Type/Display  X  Four %s  X  Three Missing Values  X  Four Totals/SubTotals.

2. The option for saving the results of PROC TABULATE in a dataset? - Use the OUT= option in the PROC TABULATE statement.

3. What's one advantage of using the COLPCTN over the PCTN<> option to calculate column percentages?  Using the COLPCTN option saves you from having to specify the denominator variable.

4. Other than using PROC FREQ to summarize and count the occurrence of each unqiue value of a character variable, is there an alternative approach using PROC TABULATE? - Yes, by default, in your TABLE statement, use, for example, SEX to display counts for Male and Female.

5. Since the SUM statistic is not valid with character variables, is there an alternative to achieve similar results? - Use ALL keyword to calculate total records of character variables.  Note that all unique character values will also be displayed before the total count.

6. What option is useful to display '0' for missing values instead of blanks? - Use MISSTEXT='0' TABLE option with MISSING PROC TABULATE option.  For missing values in class level variables, you may want to create a format to label missing values and then assign the format to the variable.  Note that to get correct percentages by not including missing counts in the denominator, then do not specify MISSING in the PROC TABULATE statement and PRINTMISS MISSTEXT options.  This works for small tables but not large tables since any record with missing values are excluded for all other variables so use with caution.

7. Do dataset options still work when saving PROC TABULATE results to a dataset? - Apply the dataset option after the dataset name in the OUT= PROC TABULATE statement.

8. What method can be used to control the order of PROC TABULATE columns?  - If the column variable is numeric, then convert to character variable and assign '1. ', '2. ', etc. and then PROC SORT by the character variable to control the column order.

9. For tables with percentages, is it possible to display '%' with the percent values within the cell?  Yes, apply a PICTURE FORMAT to the percent keyword.

10. In general, with PROC TABULATE, is it possible to only display values instead of always summarizing data as N, SUM, or MEAN for example?  No.  It is, however, possible to combine two descriptive statistics values into one column.

11. How is it possible to cross two numeric variables in a table where one numeric variable is used to classify the descriptive statistics of the second numeric variable? - Yes, by specifying the categorical numeric variable in the CLASS statement, the number is treated as a label. The other numeric variable is specified in the VAR statement and treated as the analysis variable.  

12. What is a technique for controlling both the sort order of columns and row labels? - Specify both column and row variables as COLUMN variables with the PRELOADFMT and ORDER=DATA option.  Next, apply the user defined format and specify the (NOTSORTED) option when creating the user defined format.  List the values in the order desired.  This is an alternative to assigning ordered numeric values.  Note that character formats of the same value in the desired order also works.  

13. What is the default format for all table cells? - FORMAT = BEST12.2.  This can be changed in the PROC TABULATE statement or in the TABLE statement.

14. Is the F= option in the TABLE statement only for formatting numeric variables in the VAR statement? - Yes, use the FORMAT statement to format CLASSification variables.

15. Other than using the WHERE statement to exclude records, is there an alternative to remove record counts in the table? - Yes, in the DATA step before PROC TABULATE, change the unwanted value to missing, such as IF ITTFL='N' THEN ITTFL='', and do not specify MISSING to exclude those records.  Not however, the risk for not specifying the MISSING option is that if any variable in the TABLE statement is missing, then that whole record is excluded even if other variables are non-missing.

16. Since RTS and INDENT options work only on the LIST destination, is there an alternative to expand the row title for RTF or PDF destinations? - Yes, first try setting LEFTMARGIN=0.5IN and RIGHTMARGIN=0.5IN.  If you still need more space, then override length in CLASS statement - CLASS GENDER / STYLE={CELLWIDTH=4IN};

17. Is there a technique to keep only one record per patient and still sort by a separate row variable in the summary table? - Yes, first PROC SORT by patient with NODUPKEY and then PROC SORT on the row variable. 

18. What option can be used to assign blanks to statistics labels? - Use the KEYLABEL <STATSITICS>=' '; statement to assign blank to N labels for example.  Make sure to include a space in the quotes and not '' since without a space may not work.

19. Is there an option to apply a constant denominator instead of the normal row or column percentages when calculating percentages? - Yes, apply the PCTN<> for CLASS variable or PCTSUM<> option for VAR variable.  This technique is useful when assigning a constant variable from a macro variable. See SAS paper.

20. What options are best to resolve this ERROR: 'Page size too small to print any data for the table'? This error has nothing to do with the TABLE statement.  Set the following - ODS LISTING CLOSE; OPTIONS LINESIZE=256 PAGESIZE=5000;

21. Is there a way to apply MISSING and calculate percentages correctly by excluding missing values? - Yes, create a dummy variable to track missing values and then use COLPCTSUM on the numeric variable instead of the GENDER variable to get percentages.  data one; set one; if gender=. then gen_cnt=0; else gen_cnt=1; run; proc tabulate data=one missing; class gender; var gen_cnt; table gender all, gen_cnt*(n colpctsum); freq freq; run;

22. Is it possible to have multiple TABLE statements within one PROC TABULATE block?  Yes

23. What are options to expand the column width in tables? - Set a larger width value in the F=d.w option to expand the column width to correctly display all values.

24. Instead of displaying the default variable name or using the LABEL= or QUIZ='My Quiz' for example, is it possible to display the variable label? - No.

25. What is one technique for display results stored in a dataset as is using PROC TABULATE? - Normally, PROC TABULATE is used to create and display summary stats, however, you can pre-process the dataset and save results in the dataset and use the SUM= option to display results as is without requiring summary stats.

26. Is the VAR statement required in PROC TABUALTE? - No, the VAR statement is not required - proc tabulate missing; class region product subsidiary; * Subsidiary nested within region as rows by product counts as columns with subtotals; table (region='Region')*(subsidiary='Subsidiary' all), product='Product'*(n='N'*f=5.0 pctn='%'*f=5.1) all; run;

27. Is it possible to have a TABLE statement without any VAR variables? - Yes, if you only want N or %.  proc tabulate data=bz1_adam.adsl missing; class aphase ia1fl ia2fl trtcutfl; tables aphase all, ia1fl ia2fl trtcutfl all; run;

28. How to add a percent sign to PROC TABULATE table cell percents? - proc format; picture mypct (round) low-high='009.99%'; run; proc tabulate data=sashelp.class; class age sex; var weight height; table age, sex*(weight height)*(pctsum*f=mypct.); run;

29. What are popular Proc Tabulate options? - MISSING, NOSEPS, INDENT=3, MULTILABEL/MLF, NOTSORTED/PRELOADFMT, MISSING to display row for missing values, NOSEPS to remove box-like display, INDENT=3 for values under variable label header for limited output files, MULTILABEL and MLF options to display both individual and grouping of values and NOTSORTED and PRELOADFMT options to control order of values.



1. What are some key statements needed to transpose records to variables? - Going from vertical to horizontal, use BY <by_var_names>; ID <new_var_names>; VAR <data_values_var_name>;

2. What are some key statements needed to transpose variables to records? - Going from horizontal to vertical, use BY <new_by_var_names>; VAR <old_var_names>; 

3. What option can you use transpose a numeric variable to assure non-numeric value as the first character of the new variable name? - Use the PREFIX= option to assign a constant character prefix variable name.  The PREFIX= option is also useful if you have invalid variable names in the ID variable. 

4. What is one technique for keeping constant BY variables when applying PROC TRANSPOSE? - Include constant variables in the BY statement to keep the variables in the output dataset.  The alternative is to use the COPY statement.

5. When transposing variables to rows in a dataset, is it possible to control the record sort order? - Yes, the record sort order is based on the list of variables in the BY statement. 

6. When transposing records to variable, along with the ID statement to specify variable names, is there another statement to specify variable labels? - Yes the IDLABEL statement specifies the variable labels for the new variables created. 

7. When a dataset is said to be narrow, how is that different from a wide dataset? - Narrow or vertical datasets have more observations or generic variable names and wide or horizontal datasets have more variables or specific data content variable names.

8. Is the BY statement required in PROC TRANSPOSE? - Yes, a BY statement is required to define new dataset by records or original dataset by records.

9. Which SAS Procedure can be used as an alternative method to PROC TRANSPOSE? - PROC SUMMARY and a combinations of DATA Steps with two SETs can be used to replicate PROC TRANSPOSE.  This process, however, is a manual process but allows for more control.  Other options include using arrays. 

10. Is the BY statement required and if not, how is the dataset transposed? - In general, the BY statement is used for by-group transpose instead of overall transpose.The BY statement is not required.  Without the BY statement, the whole dataset is transposed so that variables become records and rows become variables.  For an example, a dataset with 7 records and 3 numeric variables will become a dataset with 7 variables and 3 records.

11. What is one method for viewing a series of dates? - For best review of dates, transpose all date variables by patient or by patient/visit and then sort by date to get vertical structure and big picture of key milestones.

12. What is one alternative to PROC TRANSPOSE that can still transpose records but will allow duplicate records by key variables? - Use Data step with SET dataset (WHERE= ()  RENAME= ()) dataset (WHERE= () RENAME= ()) ; to append each set of records for each unique key variable.

13. Can you use Proc Transpose to transpose XXTESTCD values into new variables?

* keep char standard values for both alpha and numeric values;

proc sort out=eg1 (keep = usubjid EGDTC visit egtestcd egstresc egstresu); by usubjid EGDTC; where egtestcd='QTCF’; run;

* transpose with ID, one or more VARs and WHERE;

proc transpose data=eg1 out=eg2 (drop = _label_); by usubjid EGDTC visit; id egtestcd; var egstresc egstresu; run;

* Output based on _name_ values and rename vars;

data eg_dat (drop = _name_) eg_unt (drop = _name_ rename=(QTCF=QTCFU)); set eg2; if _name_ = 'EGSTRESC' then output eg_dat; else if _name_ = 'EGSTRESU' then output eg_unt; run;

proc sql; create table eg_all as SELECT coalesce(a.usubjid, b.usubjid) as usubjid , coalesce(a.egdtc, b.egdtc) as dtc , a.*, b.* FROM eg_dat as a FULL JOIN eg_unt as b on a.usubjid=b.usubjid and a.egdtc=b.egdtc; quit;



1. What are some options to customize PROC UNIVARIATE OUTPUT? - Applying the HISTOGRAM and INSET statements for example will include histograms and descriptive stats within the histogram.




1. What are common syntax for libname, filename and reading datasets?    See R paper.

sdtm <- "//product/study/analysis/data/sdtm" # assign libname to object named sdtm

out <- "//product/study/analysis/data/adam"  # assign out filename to path

library(haven) # required to read SAS datasets

dm <- read_sas(file.path(sdtm,"dm.sas7bdat")) # read sas file as a data frame

#'read_sas' function from the haven package (part of the tidyverse)

taadmin <- read_sas("H:/rproject/project_y_r2/taadmin.sas7bdt")

2. What is R? R is a programming language and free software developed by Ross Ihaka and Robert Gentleman in 1993. R possesses an extensive catalog of statistical and graphical methods.

3. Is there a website to run example R programs?  Yes, see site.

4. What is R? R is a programming language that uses the concept of functions to create objects to be linked together.  There are many rules to understand and follow.

5. Why should you learn R?  Since R is so vast and can be confusing quickly, I suggest you identify what you plan to use R for such as data access, data management, data reporting or data analysis.  In the pharma industry, R got a jump start with graphs that were easy to create.  I suggest you have patients learning R since may require more syntax than you may be expecting. 

6. What are useful methods to learn R? Show and Tell is useful to see and run R syntax on a command by command basis.  Taking your own notes helps to retain and build understaning.  Cheat sheet are only helpful if you recognize the syntax and it's purpose.  Since R is very technical, try to focus on your task and master the those few sets of commands.  How-To step checklist and examples help to remind users how to run R syntax.

7. What is compariable R syntax for Data step KEEP and WHERE statements? 

# complete cases and select

taadmin2 <- taadmin %>% filter(complete.cases(taadmin[["DOSECUN"]])) %>% select(INV, PT, DCMDATE, DOSECUN, DOSETL)

data taadmin2 (keep=inv pt dcmdate dosecun dosetl flag); set taadmin (where=(dosecun is not null)); run;

8. How can you get first. and last. records in R?

taadmin <- taadmin[order(taadmin$PT, taadmin$DCMDATE),]

taadmin <- cbind(taadmin, flagf=0, flagl=0)

taadmin[[6]] <- (!duplicated(taadmin$PT))

taadmin[[7]] <- (!duplicated(taadmin$PT,fromlast=TRUE))

proc sort data = taadmin2; by pt dcmdate; run;

data taadmin3; set taadmin2; by pt dcmdate; flagf=0; flagl=0; if then flagf=1; if then flagl=1; run; 

9. What is an example of a simple R plot? Below syntax will create a cty by hwy plot.

g <- ggplot(data = mpg, aes(x = cty, y = hwy))

10. How can you read csv file in R?

data1 <- read.csv("./data/DiastolicBloodPressure_initial.csv")

11. What are useful R metadata functions to display data frame attributes?

tg <- ToothGrowth # save sample data frame to tg data frame

View(tg) # browse tg

str(tg) # display tg attributes and sample data

attributes(tg) # display tg attributes

head(tg) # display tg sample records

print(tg) # display tg all records

stats <- summary(tg) # create stats object of continuous vars

print(stats) # display tg stats object

freq <- table(tg) # create freq of categorical vars

print(freq) # display tg freq object 

12. What is a simple R function?

info <- function(d) { writeLines("First display the structure of the data frame.")  str(d)

writeLines("Then print the first 6 observations to see the variables and values.") print(head(d)) }

# Call the new ‘info’ function:


13. What are R date formats?

%d for day of month, %a for 3 digit day, %A for full day, %m for short month, %b for 3 digit month, %B for full month, %y for 2 digit year, %Y for 4 digit year

14. What is the difference between package and library? A package is a like a book, a library is like a library; you use library() to check a package out of the library.

15. What is your working directory?  R is always pointed at a directory on your computer. Often this will be your home directory. When you work within a RStudio project, the working directory will be the head of that directory. You can find out which directory by running: getwd.

16. What packages are installed? installed.packages()

17. What are useful R packages and libraries for SAS programmers to install to read SAS datasets, select variables, merge data frames,  export to SAS datasets and read transport files?  





dat1 <- read.sas7bdat(“dm.sas7bdat”)


dm1 <- merge(x=dat2,y=dat3, by=”SUBJECT”,all.x=TRUE)

write.xport(dm_f2,file=paste(getwd(), "dm.xpt", sep="/"),autogen.formats =FALSE


adsl <- sasxport.get("adsl.xpt", lowernames=FALSE)

18. How do write and run R code?  Writing your first R code. Enter code in code editor window. Click Run to execute current line and Source to execute all statements.

19. What are examples of applying proc freq in R?  

table(cars$Type) # proc freq data=cars; tables type; run;

table(cars$Type,cars$Cylinders) # proc freq data=cars; tables cars*Cylinders; run;

20. What are examples of appying proc means in R?

ASL_mean <- ASL %>% group_by(ARMCD) %>% summarise(avg_age = mean(AGE), avg_bmi = mean(BMI))

# %>% is pipe to connect R functions

# proc means data=ASL; by ARMCD; var mean bmi; run; 

21. What are examples of proc sql in R? See R paper.

adsl <- dm %>% select(studyid, subjid, age, sex, height, weight, race, scrfl) %>% mutate(bmi = (weight*703)/height^2 ) %>%

filter(scrfl == “Y”) %>% select(-scrfl) %>% arrange(studyid, subjid)

# apply %>% to connect R functions

# select 8 dm variables

# derive bmi variable from weight and height

# subset for scrfl = ‘Y’

# sort by studyid and subjid 

22. How do you convert excel files to R data frame? library(readxl) # required for read_excel function 

l_cancer <- read_excel(“C:/data/l_cancer.xlsx”)

23. How do you display a list of functions once a package/library is opened?  lsf.str("package:dplyr")

24. What is the syntax for keeping and dropping variables? df = subset(mydata, select = -c(x,z) ) # by variable name

df <- mydata[ -c(1,3:4) ] # by column index number

25. How best can you sort data?  newdata <- mtcars[order(mpg, cyl),] # sort by mpg and cyl

26. What is the syntax to create R arrays?  Array_name = array(data,dim = c(row_size,column_size,matrices), dimnames = list(row_names,column_names,matrices_names))

27. What are useful metadata functions and to check for missing data?  Does the data frame exists? What is the data frame class? Does the data frame contain values?

28. How best to create data frames from vectors?  hospital <- c("New York", "California") patients <- c(150, 350) df <- data.frame(hospital, patients)

29. How do you name values in vectors? Access defined islands named vector  > str(islands) 

Named num [1:48] 11506 5500 16988 2968 16 ...

- attr(*, "names")= chr [1:48] "Africa" "Antarctica" "Asia" "Australia" ..

30. %>% - Next R Function in Sequence by passing the left hand side of the operator to the first argument of the right hand side of the operator, requires installing magrittr package, library(magrittr) iris %>% head()

31. What is R Shiny?  R Shiny is a web application framework for R and R Studio's Shiny server which makes shiny applications available over the web.  See R Studio blog on R ShinySee how to create a simple R Shiny app.

32. Can you write macros in R?  With R, you can create custom and complex functions that are similar to macros.  This give you flexibility in multiple methods for calling the R function. See Doing Macros in R blogSee Macros In R blog.

33. What options are there to export R?  Export to CSV, Excel and SAS. See export to RTF.  See other examples.




1. What is a unique and effective task to quickly get both summary (descriptive stats) and detail (unique values) views of your dataset? - Apply the Characterize Data task.  Note that if the limit is set to less than the total number of records, then not all unique values may be displayed.  This can be deceiving if the frequency list is assumed to be sorted alphabetically when it is not.  This is comparable to PROC FREQ DATA=; TABLES _CHARACTER_/LIST MISSING; RUN; PROC MEANS DATA=; VAR _NUMERIC_; RUN;  Can also include MINID and MAXID options to identify cases with minimum and maximum values.  Include a BY statement if it makes sense to review results BY groups.  See also Missing Report.

2. What are some useful automatic global macro variables when running a project? - &_CLIENTTASKLABEL stores the name of the project file without the extention and &_CLIENTPROJECTPATH stores the full path name and file name of the project.

3. What are the two options for reading excel files? - Enterprise Guide's Import Data task writes the DATA step code to access the excel file contents.  As an alternative, writing PROC IMPORT code requires SAS/ACCESS to PC Files on Windows.

4. What option is useful to display the full path names of opened files? - Can use Tools -> Options -> Project Views -> Project Tree or Project Flow.  If this does not work, then once the object is selected, then click on 'change' in the filename path to view the path.

5. What option is useful to run a series of programs? - Can use New -> New Ordered List option to add programs that have already been included in the project file.  In general, make sure the individual programs do not depend on, for example, proc import to convert an excel file to a dataset as a prerequisite to running the program.

6. What changes, if any, are required to convert existing SAS programs to be SAS Enterprise Compatable?  Generally minimum if any are needed. May need to change file references if moving from PC SAS to Server SAS? For SAS programs creating txt files, you may need to set the default of Report General option to 'Text Output' instead of the SAS Report.

7. Are embedded SAS programs more portable than linked SAS programs? - Yes, embedded SAS programs are fully contained within the SAS EG project file instead of referenced in another folder.

8. When using the Query Builder, what is the best way to select all variables from the table? - Select the first variable, hold the shift key down and then select the last variable in the table to highlight all variables.  Alternative methods include selecting/dragging the table name/icon over to the "select" area or set the options to change the behavior to assign all variables by default.

9. When using the query builder, what are the three main functions? - 1. Select variables, 2. Subset records, 3. Sort records.

10. By default, what is the maximum number of intermediate datasets that can be viewed within SAS EG? - You can reset this with 'Tools -> Options -> Results'. 

11. Can you select from a drop down list of valid values several values with the IN operator? - Yes, use 'In a List' and 'Ad Values' to select values.

12. Is it possible to apply more complex queries including functions? - Yes, by using the Advanced Edit opton, you can select variable names, functions, values and operators to create more complex expressions.  Since Proc SQL code is constructed behind the scence, make sure your expression is a valid Proc SQL WHERE clause such as  LOWCASE(PGNAME) EQT 'LDEMOG'. See SAS paper for more info.

13. What are some ways to identify the full path name of pointed SAS programs files and SAS data sets?  Sometimes, I tend to forgot them after creating the project file.  In SAS EG 4.3, this information is stored in the properties of each item.  So if you do a right-click on the item in your SAS Enterprise Guide Session and select PROPERTIES, you can typically get to this information.  SAS Note.

14. Is it possible to include total page counts, page numbers and dates automatically in reports? Yes, see the SAS Note.

15. Is it possible for SAS EG to always run a process flow every time you open a project file? - Yes, rename a project flow to AUTOEXEC for SAS EG to ask you to run this process flow when you first open the project.  Note that any query datasets created in the AUTOEXEC process flow can be copied, the icon, in other process flow within the same project file.

16. Which task is useful for viewing variable attributes? - DATA > DATA ATTRIBUTES.

17. What are the SAS Studio Installation Options? - SAS-U on Amazon AWS, SAS-U in VirtualBox, VMware Locally-installed, SAS Remote SAS server. For checking for SAS/ACCESS for PC Files licence using - PROC SETINIT NOALIAS; RUN; You can add SASAUTOS in the same way in BASE SAS.

18. When creating excel files using ODS, what are the options? Many options exist - ODS Excel Filetype (Standard) vs ODS Tagset (Custom), Input (Raw data) vs Output Excel files (Reports), Single vs Multiple sheets, Libnames vs SAS Procedures, Excel file vs SAS Data Management, Create vs Update Excel file, Standard Defaults vs Custom Options vs Styles, Standard vs Custom Excel Properties, Data values vs Graphs, Static vs Traffic Lighting Conditions vs Hyperlinking, All vs Selected ODS Output Objects.

19. Is it possible to use an autoexec with SAS Enterprise Guide? - Yes, see Autoexecs—the SAS Enterprise Guide advantage blog

20. What are some useful tips when coding in SAS EG? - Ten SAS Enterprise Guide program editor tricks

21. Can SAS Studio read SAS Enterprise Guide projects? Yes, however, it might be better to upload and run SAS programs to recreate output files within SAS Studio instead of using the project file.  See SAS paper.

22. How can you open excel within SAS Studio? First you need to upload the excel file. Next you click on it for SAS Studio to open window and then click on run to import excel file into dataset such as work.import. The code created is below.  See Video.


23. What is the autoexec file in SAS Studio?  See SAS documentation.  Open SAS Studio, Options > Edit Autoexec file.

24. What is a good way to apply filters within SAS Studio?  When a dataset is open, right click on variable to subset, select add filter option, select value with formatted codelist.

25. What option is useful to display macro libraries used? MAUTOLOCDISPLAY.




Order for Defensive Programming: SAS Library/Path exists? Dataset exists? # OBS > 0? Key Vars exist? Key Vars non-missing? Run macro

Metadata: Sample records, range, descriptive stats, no duplicates, cutoff alerts, # OBS, # PTS (unique records), list of vars (create, process), Mindmap of all datasets, codelist, Variable Definition Table, file datetime stamp, disposition counts, scan log

1. Checking if a macro variable exists as defensive programming? - %if %length(&macvn) > 0 %then %do;  This is a more robust method to view blank macro variables and not process them - %let num = %length(&name); %put # %if &num > 0 %then %do; ... %end;

2. Creating macro variable to check if dataset or variable exists as defensive programming? - Yes, = 1, use %let dsetyn = %sysfunc(exist(<dataset_name>));  This checks if a variable exists by > 0 - data _null_; call symputx("vr_exist", varnum(open('demog', ‘i'), 'gender')); run;

%if %sysfunc(exist(&sasdsname)) = 0 %then %do;

/* Insert code that needs to be executed if the data set &sasdsname does not exist. */



%let dsid = %sysfunc(open((&sasdsname));

%let numobs = %sysfunc(attrn(&dsid, nobs));

%let rc = %sysfunc(close(&dsid));

%if &numobs = 0 %then %do;

/* Insert code that needs to be executed if the datset &sasdsname has zero records. */


%let dsid = %sysfunc(open(&sasdsname));

%let varnum = %sysfunc(varnum(&dsid, &sasvarname));

%let varchar = %sysfunc(FUNCTION(&dsid, &varnum));

%let rc = %sysfunc(close(&dsid));

%if &varnum = 0 %then %do;

/* Insert code that needs to be executed if the variable &sasvarname does not exist. */


3. Determining the number of records in a dataset as defensive programming? - Can use &SQLOBS from most recent PROC SQL SELECT statement.  Another method is %let DSID = %sysfunc(open(DEMOG, IS)); %let countds = %sysfunc(attrn(&DSID, NLOBS));  Another option is to create a macro variable that stores the number of records after a subset condition is applied and then use that in conditional execution. proc sql; select count(usubjid) into: aedscnt from ae1 where usubjid > .; quit;  %if %eval(&aedscnt) > 0 %then %do; ... %end;  You can add code to abort if needed. %if %eval(&countds) = 0 %then %do; %put "*** ABORT program due to zero records in dataset: &dsnam ***"; DATA _NULL_; ABORT; RUN; %end;

4. Conditional processing if macro variable is populated as defensive programming? - Can use %if %length(<macro_variable>) > 0 %then %do; %end;

5. A simple technique for scanning and creating a macro variable from a list of dataset names one at a time? - Can use this code
%do dupi = 1 %to 3; %let dupdsn = %scan(DATASET1 DATASET2 DATASET3, &dupi); %end;

6. Which automatic macro variable can be used to identify the full path name of the SAS program?

- Can use this macro
%macro prgpath ;
%qsubstr(%sysget(SAS_EXECFILEPATH), 1, %length(%sysget(SAS_EXECFILEPATH))-%length(%sysget(SAS_EXECFILEname)))
%mend prgpath;
%let libpath = %qsubstr(%prgpath,1,55)&pgm._toc.rtf;
%put &libpath;

7. Transferring control to another section of the program? - Can use %GOTO THERE; to direct program to %THERE: position.

8. What are useful defensive programming techniques for checking the existance of external files, for example? - %IF %SYSFUNC(FILEEXIST(&OBJECT_NAME)) = 1 %THEN %DO;  See SAS Paper One and, Two.

9. What is a useful SAS Version 9 function to create macro variables of numeric values using DATA step? - SYMPUTX(dsvar1, dsvar2) automatically applies TRIM(LEFT(PUT())) to the numeric value.

10. Which macro function is useful to dynamically create SAS statements using data set values and then automatically executing the code? CALL EXECUTE() also is useful to conditionally execute macro definitions. 

11. In general, what are the three types of double ampersands (&&) or more for indirect references of macro variables?  Use &&&XX to reference one macro variable.  Use &&XX& to reference a list of root word multiple variables such as &dsn1, &dsn2 and &dsn3.  Use &&&XX& to reference a list of any multiple variables such as &engine1, &engine2, &mpg1 and &mpg2. 

12. What is one technique for accessing the dataset creation date which is preserved instead of using the dataset file date which could be changed when the dataset is copied? %let download=%substr(%sysfunc(attrn(&dsid,crdte),datetime.),1,7);

13. What options exist to demacrotize the SAS program to actual SAS statements? - MFILE option.  See also MTRACE.

14. Are there useful macros to clean up after SAS programs runs such as clear libnames and delete temp formats, etc?  Yes, you can apply LIBNAME, PROC CATALOG for example.   Clean SAS programming environment.  PROC DATASETS is useful just after PROC APPEND to delete the temporary dataset recently added. This helps to prevent adding extra records.  PROC DATASETS LIB=WORK NOLIST; DELETE AE EX; QUIT; RUN;

15. What is the syntax for saving and compiling stored macro facility? - Apply the STORE and SOURCE options when creating the macro and the MSTORED option to access the macro.  See maco documentation for more info (search for store macro facility, select Storing and Reusing Macros).

16. What are some of the key differences between the methods to create macro variables (%LET, SYMPUT and INTO)? See SAS paper

17. What statement is used to reset the SASMSTORE option? - %SASMSTORECLEAR;

18. Is there a limit to the number of characters in a macro call string? - No, can include line breaks to break up the 256 character macro call. Another method is to save the macro call in a separate file and INCLUDE the macro call program.  See SAS reference.  See MVARSIZE system option. 

19. Is there a useful general macro function that can accept most any DATA step function? - Yes, the %SYSFUNC() macro function can accept most any DATA step function.  This is ideal for non-macro functions such as %SYSFUNC(STRIP(X)).  You can also use %sysfunc() to convert data types.  See SAS paper.



20. What is the syntax for saving today's date as a date constant variable? - %let dtcutoff=&sysdate; %put Todays Date = &dtcutoff; data visit_qc; set dbs.visit;
dtcutoff= "&dtcutoff"d; format dtcutoff date9.; if visendt > . and dtcutoff > . then studydy= dtcutoff - datepart(visendt); else studydy=.; run;  An alternative is %
let ANCUTOF = %sysfunc(today(), date9.); %put &ANCUTOF;

21. In general, what are differences in when to apply %IF/%THEN and IF/THEN conditions? - %IF/%THEN conditions become true only when user or system macro variables equal a constant, for example, while IF/THEN conditions become true when both user/system macro variables or dataset variables equal a constant.  This means that %IF/%THEN condition determines which SAS block of code gets compiled while IF/THEN condition determines which SAS block of code is executed.  All of the IF/THEN block of code is complied.  Note that numeric and character values are treated the same, quotes are not applied.  See SAS Paper.

22. Is it possible to use the IN operator with macro conditions? - Yes, with the MINOPERATOR and MINDELIMITER options.  Note that there are no quotes with IN. 

23. Are the corresponding four types of DO LOOP also available in macro programming? - Yes, all four types are available - %DO; %END;, %DO %TO %BY; %END;, %DO %WHILE;, %DO %UNTIL;

24. What is an example of SAS macro programming without using any macro syntax? - Within a DATA step, using arrays and do-loops to repeat a block of SAS code for different variables or values resembles SAS macro programming.

25. What is the difference between %EVAL() and %SYSEVALF()? Both %EVAL and %SYSEVALF() evaluate the math expression before assigning it as a macro value.  %EVAL() is for integers and %SYSEVALF() is for continuous values.  ex. %LET RESULT = %EVAL(&NUMER/&DENOM);  

26. What is the difference between positional and keyword macro parameters? - Keyword parameters are ideal for setting default values, additional flexibility in specifying macro variables in any order and better documentation.  Positional parameters must be called in the exact same order as defined and be specified before keyword paramaters, ex. %codev(sdtm, tr, trgrid, runnum=1). 

27. What are useful system options for debugging macro programs? - MPRINT (log shows the code actually created by macro), MLOGIC (log shows flow of macro conditional execution), MPRINTNEST (shows macro nesting such as MPRINT(OUTER.INNER.INRMOST): 'This is the text of the PUT statement') MERROR and SERROR to display warnings, and SYMBOLGEN (log shows macro variable value).  MPRINT and MFILE options can be used to create a macro free version of the code.  In addition, the automatic macro variables _ALL_, _LOCAL_, _GLOBAL_, _USER_, or _AUTOMATIC_ can be displayed.      

28. What are examples of macro statements that can be applied anywhere in the program or open code? - %LET, %PUT along with many macro functions such as %INDEX(), %SCAN() and %LEFT().   

29. If a macro variable contains a comma and is used as a parameter in a macro call, then what is the correct method to prevent an error? - Use the the %BQUOTE() to mask the commas as in %TEST(%BQUOTE(&VAR), B, C);

30. Can you create a macro function to extract the number of observations in a dataset? - Yes,  see the code to create this macro as well as a macro to get the program name, %GETATTR(DSN = SASHELP.CLASS)

31. When working with a series of variables, is there a method for removing duplicate records and assuring keeping only non-missing variables by a group of variables? - Yes, by first applying PROC MEANS with MAX and AUTONAME options and then using PROC SQL to save original and PROC MEANS variable names and then a RENAME with DO Loop to convert back to the original dataset. See the code below.

* a. Better method to get non-missing data;

proc means data=adbmk7 nway noprint; 

 class studyid usubjid categc categn param paramcd; 

 var basedt -- ttrtd0;

 output out=adbmk8 (drop= _type_ _freq_) max=/autoname; run;

* b. Save as macro variables - original and means names;

proc sql;

select unique name into: onamlst separated by '.' from sashelp.vcolumn where libname='WORK' and memname='ADBMK7' and name ^in ('STUDYID' 'USUBJID' 'CATEGC' 'CATEGN' 'PARAM' 'PARAMCD');

select unique name into: mnamlst separated by '.' from sashelp.vcolumn where libname='WORK' and memname='ADBMK8' and name ^in ('STUDYID' 'USUBJID' 'CATEGC' 'CATEGN' 'PARAM' 'PARAMCD');


%put &onamlst &mnamlst;

* c. Rename variables back to original var names;

%let rnval=1;

%macro rnvar;

data nci_adam.adbmks;

set adbmk8;

 %do %while(%length(%scan(&onamlst, &rnval, %str('.'))) > 0);

  rename %scan(&mnamlst, &rnval, %str('.')) = %scan(&onamlst, &rnval, %str('.'));

  %let rnval = %eval(&rnval + 1);



%mend rnvar;


32. In general, what are the methods for cycling through a list of macro values? - One option is to have a list of values separated by '.' stored in one macro variable. Another option is to create a collection of macro variables with suffix and cycle through.  The one macro variable is ideal for short list of values and the multiple macro variables is ideal for many values or many different sets of values.

33. What is one method to store the content of a file into a macro variable - See SAS Blog tip below. 

FILENAME msghtml "path-to-text-file" ; data _null_; length text $32767; retain text ''; infile msghtml flowover dlmstr='//' end=last; input; text=cats(text,_infile_); if last then call symput('MSGBODY',text); run; /* file contents is now in &MSGBODY macro var */

34. What is one method to simulate the IN operator in macro programming? - Use the MINOPERATOR option, %macro filter(expdcnt)/minoperator; %if &expdcnt in 0 1 %then %do; ... %end;  %mend filter;  An alternative is a custom macro %INFCN().

options MINOPERATOR; /* enable macro IN operator for character values */

%macro intest1(macvar1=);

/* Does macro variable MACVAR1 have the value aa, bb, or cc */

%if &macvar1 in aa bb cc

35. What is one method for creating a macro variable from another macro variable? - %if &sysscp = WIN %then %let dlm = \ ; %else %if &sysscp = UNIX %then %let dlm = / ;

36. What is an example of using the SCAN function to select individual macro values? - 

* Create new macro variable from &sg_class = x y, a.x=b.x and a.y=b.y;

%if %length(&sg_class) > 0 and &nm_class = 1 %then %do;

%let sql_class = a.%scan(&sg_class, 1, %str(' ')) = b.%scan(&sg_class, 1, %str(' ')); %put &sql_class; %end;

%else if %length(&sg_class) > 0 and &nm_class = 2 %then %do;

%let sql_class = a.%scan(&sg_class, 1, %str(' ')) = b.%scan(&sg_class, 1, %str(' ')) and a.%scan(&sg_class, 2, %str(' ')) = b.%scan(&sg_class, 2, %str(' ')); %put &sql_class; %end;

%else if %length(&sg_class) > 0 and &nm_class = 3 %then %do;

%let sql_class = a.%scan(&sg_class, 1, %str(' ')) = b.%scan(&sg_class, 1, %str(' ')) and a.%scan(&sg_class, 2, %str(' ')) = b.%scan(&sg_class, 2, %str(' ')) and a.%scan(&sg_class, 3, %str(' ')) = b.%scan(&sg_class, 3, %str(' '));

%put &sql_class; %end;

37. What option is most useful for creating a non-macro version program? - MTRACE is useful to create a non-macro version of a macro calling program.  See also MFILE.

38. What method can be used to standardized variables attributes? - Modifying variable attributes in all datasets of a SAS library

39. An easy way to make a "Top 10" table and bar chart in SAS blog

40. Using %IF-%THEN-%ELSE in SAS programs blog

41. Is there a function to determine valid variable names? - Yes, the NVALID()  function checks the validity of a character string to be used as a variable name. The function returns 1 if the string is a valid variable name, and a 0, otherwise.


43. How can you get a list of all macros stored (time, date and description)? Use Proc SQL to get information about all compiled macros. PROC SQL; SELECT * FROM DICTIONARY.CATALOGS WHERE MEMNAME IN ('SASMACR'); Quit;

44. What are options for protecting your stored compiled macro so that the syntax is not displayed?  Apply the SECURE optionSee SAS paper for example.

45. What are examples of using macro options PARAMBUFF and SYSPBUFF? See SAS paper

46. What are the two methods for accessing SAS macros?  See SAS paper.

 libname macros 'c:\sas\macros\sasautos\catalogs'; * location of source catalog;

 filename catmacro catalog 'macros.macrocat'; * specific macro catalog;

 filename othmacro 'c:\sas\macros\sasautos\project'; * dir of macro programs - common method;

 options mautosource sasautos = (catmacro othmacro "c:\sas\macros\sasautos" SASAUTOS);

* mautosource is required for sasautos option to work;

47. What options exist for multiple WHERE statements in DATA Step and SAS Procedures?  You can have multiple WHERE statements for cummulative subsetting.  Multiple WHERE statements can be conditionally applied using macro logic.  You can apply WHERE ALSO to future subset the data.MRECALL Turns on the capability to search stored macro programs when a macro is not found.

48. What does the MRECALL option do?  It turns on the capability to search stored macro programs when a macro is not found.

49. What is the syntax for creating a format catalog?  OPTIONS MSTORED SASMSTORE MSTORED SASMSTORE MSTORED SASMSTORE=StoreMac; LIBNAME StoreMac ‘/user/id/sasmac/stored/’; %MACRO ANALYSIS (DATA=, DEPVAR=, etc.) / STORE; PROC GLM DATA=; . . . RUN; /* More code, etc. */ %MEND ANALYSIS;

50. What is the method for checking return codes when assigning LIBNAME statements? %macro libmake; *** create the library if it doesn’t exist ***; libname lneeded "C:\necessary\"; %if &syslibrc ne 0 %then %do; x "mkdir c:\necessary"; libname lneeded "C:\necessary\"; %end; %mend;   &sysfilrc is the return code for FILENAME statements.

51. How can you save a list of files within a directory?  Use %SYSFUNC.  See SAS PaperSee SAS Code.

%macro listfiles(dir); %local filrf rc did fnm i fid; %* Assigns a fileref to the directory and opens the directory *; %let rc=%sysfunc(filename(filrf,&dir)); %let did=%sysfunc(dopen(&filrf)); %* Loops through entire directory *; %do i=1 %to %sysfunc(dnum(&did)); %* Retrieve name of each file *; %let fnm=%qsysfunc(dread(&did,&i)); %let fid=%qsysfunc(mopen(&did,&fnm)); %if %qscan(&fnm,2,.) ne %str() and &fid ne 0 %then %do; fileloc="%qsysfunc(finfo(&fid,Filename))"; modifyDT="%qsysfunc(finfo(&fid,Last Modified))"; output; %end; %* If directory name call macro again *; %else %if %qscan(&fnm,2,.) = %str() %then %do; subfolder = 1; %listfiles(&dir&mslash%unquote(&fnm)); %end; %end; %* Closes the directory and clear the fileref *; %let rc=%sysfunc(dclose(&did)); %let rc=%sysfunc(filename(filrf)); %mend listfiles;

data files1; /*process new folders*/ length fileloc ModifyDT $2000 root $1000; root = strip(lowcase("&newdir")); %listfiles(&newdir); run;




1. What statements are useful to execute DOS based commands? - X for unconditional execution and CALL SYSTEM commands within a DATA _NULL_ step for conditionally applied.

2. Is there an easy way to create a list of all SAS program names, date of last update and number of bytes per file? - Yes, DATA_NULL_; CALL SYSTEM('DIR > SASFILES.TXT *.SAS); RUN; DATA FILES; INFILE 'SASFILES.TXT'; INPUT FNAME $ 1-8 FEXT $ 10-12 BYTES 13-22 DATE MMDDYY8.; RUN;

3. Are there alternatives to using the X command? - Yes, use the collection of dataset functions such as DOPEN.




1. How can you create symbolic links? - ln -s source_file myfile, where source_file is the name of the original, main file and myfile is the name of the symbolic link file that will point to the source_file, if in the folder where the link will be saved, without the myfile, the link will be the same name.


2. What is the syntax to identify the job number of a SAS batch run? - PS -EF|GREP SGUPTA

3. How do you convert a text file to a unix batch file? When creating file, add as first line - '#!/bin/bash'.  In unix, type 'bash'.




1. Correcting a conditional statement? - Confirm that the 'IF condition' catches records and the 'THEN statement' changes records.  For character variables, best to standardize by using UPCASE() and COMPRESS() functions, for example, to upper case and remove spaces or if possible user numeric variable in condition.

2. What is the syntax to grouping observations by using formatted values? - See SAS blog.

3. What is a useful function to confirm data values? - Use the VERIFY(TRIM(string),'abcde') function to return the position number of the first NON-MATCHING value.  If 0 then all data values in the string are valid, i.e. string only has one of these values 'abcde'.  The condition IF VERIFY(TRIM(string),'.0123456789') = 0 THEN can be used to identify only numeric values.  Note that VERIFY() is exact opposite to INDEX().  See also ANYALPHA() to confirm only numeric values and ANYALNUM() to confirm only alpha character values.if ANYALPHA()= 0 then only numeric values.  if ANYALNUM()= 0 then only non-numeric values or alpha characters. See COMPRESS() alternative to remove all char or all num values. 

4. What are the key differences between the DO LOOPs? DO TO increments by 1, by # or by the list of values, DO WHILE is a top evaluation so exists loop if condition is not met, and DO UNTIL is bottom evaluation so exists after the loop if condition is not met.

5. How many different types of datasets are created when merging two datasets? - Up to five types are created - ALLAB, ALLA, ALLB, ANOTB, BNOTA.

6. What is an effective approach to collapse categories? - Use PROC FORMAT and FORMAT statement with the SAS Procedure.

7. What four ways to control the variable order when creating a dataset? - Use RETAIN, LENGTH, or ATTRIB before the SET statement or PROC SQL selecting all variables.

8. What options exist to zero-fill data from SAS procedures, ie. create the row or column with zero values? - Use PROC SQL's COALESCE() and FULL OUTER JOIN with complete dataset, PROC FREQ's SPARSE option, or PROC MEANS's COMPLETETYPES, PRELOADFMT or a. PROC FORMAT, b. FORMAT, c. PROC TABULATE's CLASSDATA= with EXCLUSIVE, PRINTMISS and MISSTEXT options. Another SAS paper example.

See PROC TABULATE CLASSDATA= reference.  See Proc SQL method.

9. What is an effective windowing technique to select the first before visit closest to the target?  As an alternative to selecting one visit, an average may be required.
(1) Assign STUDY_DAY based on VISITDT and DOSEDT, TARGET based on VISIT window ranges, DIFF between STUDY_DAY and TARGET, and ABSDIFF of DIFF variables
(2) Proc Sort by TARGET, ABSDIFF and DIFF (Add DESCENDING for after visit closest)
(4) as an alternative to 3), use PROC MEANS; BY PATIENT TARGET; VAR ABSDIFF; OUTPUT MINID(ABSDIFF(VISITDT)) = VISITN MIN=MIN MAX=MAX;  Most any variable in the dataset can be used as the ID variable. 

PROC MEANS DETAILS: 1. Sample dataset has 2 patients and 2 lab tests.  Assumes dose date variable exists.  2. Calculate difference between lab date and dose date.  3. Assign target values based on visit windows. 4. Calculate absolute difference in days. 5. For each patient and lab test identify minimum lab date within each visit window.  Expect to identify one visit record per multiple records. 6. Merge back proc means dataset to get visit.  See SAS paper.

10. What is an effective windowing technique to summarize by target to get one record? - Can apply PROC SQL to get MEAN(HEIGHT) as HEIGHT ORDER BY SUBJID, TARGET.

11. Does the not operator with the IN (^in) work to exclude data values when applied in a WHERE statement? Yes

12. What is a simple technique for adding records to calculate Totals? - Apply the following statements in the DATA Step - IF DOSEGRP ^=: 'Placebo' THEN DO; OUTPUT; DOSEGRP='Total Drug'; DOSE = 999; OUTPUT; END; ELSE OUTPUT;

13. What is a technique for creating a dataset with no variables and no observations? DATA ZERO_VARS; IF _N_ = 0; RUN;

14. Treatment Emergent Adverse Events are generally defined as adverse events after (datetime) the first dose date and before (datetime) the followup completion date. Also, in general, the Adverse Events dataset should be sorted by patient aedate and aeterm to count each new aeterm occurence. Be careful not to double count patients when summarizing Adverse Event occurrences. Also, be aware of uncoded Adverse Events.  

In general, there are two models: 1. By system organ class and preferred term, 2. By system organ class, preferred term and severity.

One technique to get patient counts is to first apply NODUPKEY by SUBJID AETERM and then PROC TABULATE WITH WHERE AESTDT > FDOSEDT > .  Also, to get patient event counts, next you will need to apply NODUPKEY by SUBJID AESTDT AETERM.  As needed, other variables may be added such as AESTTM, AEENDDT, AEENTTM or AESEV.  For tables such as Drug related or Serious, before the NODUPKEY, apply a WHERE condition to only select those records.  For most frequent AEs, subset based on the >= 10% of the total column.

15. What technique is most effective for many-to-many joins? - Proc SQL.

16. What is one technique for preserving the previous value in the current record? - Use the LAG() function.  See example below:

proc sort data=qc_apkplasma1;
 by usubjid pkdtf;

data qc_apkplasma2;
 retain usubjidh ddayfh ddayfp dosedayp;
 set qc_apkplasma1;
 by usubjid pkdtf;

 if ddayf > . then ddayfp = lag(ddayf);
 if doseday > . then dosedayp = daynom;

 if first.usubjid then do;
  usubjidh = usubjid;
  ddayfp = .;
  dosedayp = .;

 if ddayf > . then ddayfh = ddayf;
 if usubjid = usubjidh and ddayf = . then ddayf = ddayfh;

17. What is the default temporary working library name? - WORK.<Dataset_Name>.  To automatically save temporary datasets without specifying a LIBNAME, specify the LIBNAME USER ''; statement or OPTIONS USERS= <LIBNAME>; statement.

18. What techniques and operators exist to apply non-exact matches? - Any of these techniques can be applied - LIKEBETWEENCOLON, SOUNDEX(),  SPEDIS(),  SCAN(), or INDEX().

19. When using the RETAIN statement, in general, what additional statement is required to assure non-missing values? - Apply the IF-THEN statement to assign and preserve only non-missing values. EX. IF AGE > . then AGE_HOLD = AGE;

20. What is one technique for identifying and keeping the minimum record by a group of variables? - One approach is to create an output dataset from PROC TRANSPOSE and merge back to the original dataset and then apply IF VARIABLE ^= MIN(COL1, COL2, COL3) THEN DELETE; to keep the minimum value record by a group of variables.

21.  Is WHERE SEX > ' ' the same as WHERE SEX IS NOT NULL? Yes, it is also the same as WHERE SEX IS NOT MISSING.

22. What are the BY group keywords used in titles and footnotes? #BYVAL(VARIABLE NAME) to display formatted value, #BYVAR(VARIABLE NAME) to display variable label, and #BYLINE(VARIABLE NAME) to display formatted BY line. Selected procedures, such as PROC SGPLOT, automatically use them unless they are turned off with the NOBYLINE system option, so it is not required to be in titles for example.

23. What is the shorthand notation for IF-THEN logic? - IFN(<IF CONDITION IS TRUE>, <THEN RESULT>, <ELSE RESULT>);  Most any valid expression can be used as the condition.  The THEN and ELSE RESULTs should be one value.  This shorthand works for character or numeric variables.

24. What statements will reset titles and footnotes? - TITLE; FOOTNOTE;

25. Is it possible to apply both the IN operator and the colon modifier ':' to match based on values only and exclude blanks? - Yes, IN: works as IF NAME IN: ('Sunil' 'Tim');  The simple comparison version is IF NAME =: 'Sunil';  In PROC SQL, use NAME EQT 'Sunil';

26. Is the input buffer created when reading a SAS data set or only when reading raw data? - Input buffer is created only when reading raw data.

27. What is the shorthand syntax for variable lists? - VAR ROOT1 - ROOT5; for a list of ROOT 1 to ROOT5 vairables or VAR ROOT:;  for all ROOTXXX variables.  Note that there are some limitations applying shorthand syntax in Proc SQL.  Note that DROP and KEEP are always applied before RENAME statements so make sure not to include any dropped variables in the RENAME statement.

See SAS blog - 6 easy ways to specify a list of variables in SAS

28. Is it possible to send emails from within a SAS program based on conditions? - Yes, see SAS paper with DATA Step technique.

FILENAME Mailbox EMAIL '' Subject='Test Mail message';

DATA _NULL_;  FILE Mailbox; PUT "Hello"; PUT "This is a message from the DATA step"; RUN; 



29. What is one technique for creating a record with missing values in a dataset that is defined but without any records? - DATA PD; OUTPUT; SET DBS.PD; RUN;  To create a record with 'None Reported' message do - DATA PD; pt_cnt=&pt_cnt; KEEP subject pd pdn pt_cnt; pd='None Reported'; OUTPUT; SET DBS.PD; RUN;

30. What are some rules for common non-by variables when merging two datasets? - Attributes are defined from the left/first dataset and the value is kept from the right/second dataset. Use MSGLEVEL=I to display message that values are kept from second dataset.

31. Is there an alternative to merging datasets using DATA step and BY statements? - Yes, you can apply multiple SET statements for each dataset.  This method, however, automatically applies the IF A and B condition to keep only the records that exist in both datasets. 

32. What some issues to be aware of with the PDV when merging two datasets and redefining an input variable? - Yes, it is best to first rename an input variable instead of redefining it to prevent incorrectly from retaining the value for the next record.  See SAS paper.

33. Can SAS array names have numbers or special characters? - No, only alpha values are allowed for SAS array names. 

34. As an alternative to using the SET statement to read SAS datasets, what option exists to read external files and write to external files? - The FILENAME statement points to full path files names, the INFILE points to the text file to read and the FILE points to the text file to create.  Note that different options exist for INFILE from the INPUT statement.

35. Which SET option is useful for pointing to selected records instead of having SAS sequentially read each record? - SET XXX POINT=ID.

36. What are useful DATA Set options for protecting it from accidental replacement by an empty new dataset? - (REPLACE=YES REEMPTY=NO)  Other options include (ENCRYPT=YES PWREQ=YES READ=READPWD WRITE=WRITEPWD)

37. What are some differences between IF 0 THEN SET XXX NOBS=X; and IF _N_ = 1 THEN DO; END;? - IF 0 THEN SET XXX NOBS=X; enables, at compile time, loading of the dataset into PDV and the total number of records in the X variable.  IF _N_ = 1 THEN DO; END; enables at execution time, on the first iteration, the ability to run a block of code to initialize variables, for example.  See SAS paper on IF 0 THEN.  See SAS paper on IF _N_ = 1 THEN.

38. Is there an easy way to assign missing values? - Yes, CALL MISSING(NUM1, CHAR2) assigns missing values to both numeric and character variables.  Before CALL MISSING() statement, it is best to assign LENGTH for each variable to define as character or numeric and length.

39. Is there a method to use INFILE options without requiring the data to be read from an external file? - Yes, DATALINES option on the INFILE statement allows for using INFILE options without requiring data to be read from external file. Data rows start after the DATALINES; row.

40. What is one method for creating variables to store the previous, or lags, and next record values or leads? - Use a combination of LAG() and second SET statement to pull the values.  Options include with a BY statement.  See paper for example.

41. What the multiple methods for using a colon wildcards? - Keep all the variables start with 'X', Subset data, Use in IN Operator, Use in GT LT (> <) Operators, Functions and Arrays. 

42. What are some options when you get the Warning: BY Variable with Multiple Lengths? - To prevent possible merge issues due to truncation, specify the dataset with the longest lengths first in the MERGE statement or include a LENGTH statement with the maximum value for that variable before the MERGE statement.

Below are the four step to address differences in variable attributes: NAME, TYPE (Char/Num), LENGTH (Prevent char truncation), LABEL or FORMAT (Update).  data test2 (rename=(Variable2 = Variable 'Input Variables2'n = 'Input Variables'n ));    length variable2 $8. 'Input Variables2'n  $8.;    set test1;    variable2 = Variable;    'Input Variables2'n = 'Input Variables'n;    drop Variable 'Input Variables'n;   run;

43. What is one method to get 10% random sample records? - In the DATA step, IF RANDUNI(-1) LE 0.1;

44. What is one useful method for selecting records based matching a collection of values? - Use the IN operator to create a flag variable and set to 1 and then use this flag variable as needed instead of repeating the matching collection of values.  One short hand notation is IF LBDAY IN (18:24) THEN DO; will select 18 <= to 24 <=.

45. What is one method for adding a collection of variables with constant values in one dataset to all records in a second dataset? - Yes, assure the second dataset only contains the key variable and a constant merging variable.  Then merge many to one.  Then append the new dataset to the second dataset to all one set of records with constant set of variable values. - proc sort data=adcoc1a; by visit; run; proc sort data=adcoc1b; by visit; run; data adcoc1b1; merge adcoc1a (keep= USUBJID visit where=(visit=:'BASELINE')) adcoc1b; by visit; if USUBJID > ''; run; data adcoc2; set adcoc1a adcoc1b1; run; 

46. What are the main reasons for merging two or more datasets? In general, it best to first merge all one to one datasets and then merge one to many dataset to assure correct merging, however, merging one to one and one to many datasets at the same time also works. Merging is done to add variables to original dataset or replace the value of original variables.  In general, these new variables are required to create new derived variables. 

47. In general, what are some options for grouping variables? - Numeric sequence based on suffix values, ex. A1 - A10, order saved in dataset, ex. A -- D, wildcard, ex. A:, hard code in arrays to be referenced in do-loop, ex. array varlst(3) A1 A3 A5;.

48. What is syntax to check for empty dataset? - data _null_;  if eof then do;  call symput('dsempty',1);  put 'NOTE: EOF - no records in data!';  end;  stop; set class end=eof; run; 

49. In general, when assign variables, is IF-THEN-ELSE IF-THEN logic applied? - Yes, in general, mutually exclusive conditions are applied to assign the variable and then exist the process.  To have more flexibility, IF-THEN-IF-THEN can be applied to control and better assign variable to more non-missing values.  See DATA Step statements below:

* Apply detail conditions since not mutually exclusive conditions;

* RAVISIT1 retains assigned AVISIT to next record;

* Two types of overlap days - at the end of previous and start of next;

* For end of previous overlap days, do not use ADY for AVISIT if previous AVISIT is already assigned, ex. &stdy=2 and RAVISIT1 ^="Day 1";

* For start of next overlap days, do not apply if that ADY is already assiged AVISIT, ex. &stdy=2 and AVISIT1 ^="Day 1";

if &stdy ne . and aperiod=1 then do;

 if ADT gt CC01DT and &stdy le 0 then do; AVISIT1="Day 0"; target=0; avisitn1=0; end;

 if (&stdy in (1) or (&stdy=2 and RAVISIT1 ^="Day 1")) then do; AVISIT1="Day 1"; RAVISIT1="Day 1"; target=1; avisitn1=1; end;

 if (&stdy in (3) or (&stdy=2 and AVISIT1 ^="Day 1") or (&stdy=4 and RAVISIT1 ^="Day 3")) then do; AVISIT1="Day 3"; RAVISIT1="Day 3"; target=3; avisitn1=3; end;

 if (&stdy in (5) or (&stdy=4 and AVISIT1 ^="Day 3") or (&stdy=6 and RAVISIT1 ^="Day 5")) then do; AVISIT1="Day 5"; RAVISIT1="Day 5"; target=5; avisitn1=5; end;

 if (&stdy in (7:18) or (&stdy=6 and AVISIT1 ^="Day 5")) then do; AVISIT1="Day 7"; RAVISIT1="Day 7"; target=7; avisitn1=7; end;

 if &stdy in (19:59) then do; AVISIT1="Week 4"; target=28; avisitn1=28; end;

 if &stdy in (60:120) then do; AVISIT1="Month 3"; target=90; avisitn1=90; end;


50. How to reorder the variables in a SASĀ® data set? - There are several options - ATTRIB, ARRAY, FORMAT, INFORMAT, LENGTH, RETAIN

51. Selecting the top n% and bottom n% of observations from a data set

52. What is one DO LOOP technique for iterating the observartion between start day to end day to create one record per study day? -   data adae_adsl1; set adae_adsl; by usubjid; do i=astdy to aendy; ady=i; output; end; run;

ISS/ISE Programming

1. What are first steps for creating ISS?  For ISS, apply most recent MedDRA/Who dictionary codes and standard visit windows. 

2. What are first steps for creating ISE? For ISE, pool primary and secondary end points.




1. What is the PROC SORT option to make sorts case insensitive? - SORTSEQ= option

2. Which function is useful for counting words in a string? - COUNTW()

3. Which function is useful for locating a word and word number? - FINDW()

4. Which function is useful for extracting the first character in a string? - FIRST()

5. Does PROC SQL support dash notation when creating macro variables? - Yes, PROC SQL; SELECT DISTINCT DEPT INTO :D01-:D04;




1. Which SAS procedures are useful for calculating geometric mean values? - Use PROC MEANS or PROC TTEST.  See SAS paper.

2. Which SAS procedure and option will provide standard errors of mean values? - Use the STDERR option in Proc Tabulate.  Add + STDERR for upper limit and - STDERR for lower limit.

3. When plotting to semi-log scale, do you apply the LOG() function and then plot? - No, the y-axis values are not transformed.  The original values are plotted on the LOG y-axis.  the semi-log scale graph is useful to view large scale range of values such as from 1 to 10,000.

4. What is a useful SG procedure for creating series plots by patients? - PROC SGPLOT; SERIES X=VISIT Y=AVAL / GROUP=USUBJID; RUN;

5. What are some techniques for handling missing data? From working with statisticians, I have learned that no one technique is best. Options include ignoring missing values, using the last observation carried forward method or various other imputation methods.  My suggestion is to test various methods using simple and real data examples to confirm your assumptions before finalizing a method.  SAS's procedure PROC CALIS is designed to handle missing data. Some of the imputation method options in PROC CALIS are multiple imputation (MI) and full information maximum likelihood (FIML).

6. At a very high level, what are two general types of statistical analysis? - A. Descriptive statistics helps to understand group characteristics.  B. Inferential statistics helps to understand group(s) differences or constant differences.

7. How do histograms differ from scatter plots?  - On the y-axis, histograms plot the frequency counts and is useful to determine if normal plot or not.  On scatter plots, however,  the actual y value is plotted on the y-axis. 

8. Is the ANOVA or Analysis of Variance, comparing three or more means of variances? - ANOVA is an extension of the T-Test and compares three or more means.

9. Which SAS procedure can be used to calculate a custom percentile such as 2.5 and 97.5? - PROC UNIVARIATE's options such as pctlpts=2.5 97.5 pctlpre=P can be specified.

10. In PROC LIFETEST, what are the options to create the survival and logrank plots? - PLOTS=(SURVIVAL(ATRISK) LOGSURV)

11. In survival analysis, what does censor mean? - This depends on the end point.  If the endpoint is death, then a patient is censored if the patient did not die.  Examples of being censored will be reasons for no death such as completed study, lost to follow-up or drop of study due to adverse event.  Generally, censored patients have CENSOR=0 and non censored patients have CENSOR=1.  See UCLA Proc Lifetest example.

12. Is PROC LIFETEST the only procedure used for survival analysis? - No, PROC PHREG and be used directly and PROC SGPLOT can be used indirectly with output dataset from PROC LIFETEST and the STEP and SCATTER statements.

13. Is it required to always output the PROC LIFETEST to include the number at risk, for example, at each time points? - No, options such as ATRISK are now available to include number at risk at each time points.

14. What is the PROC LIFETEST statement to compare two or more survival plots?  - Use the STRATA statement to plot multiple survival plots.

15. What is the difference between PROC LIFETEST and PROC PHREG? - Both PROC LIFETEST and PROC PHREG create survival plots. One of the key differences is that PROC LIFETEST uses the Kaplan Meier method estimates while PROC PHREG uses the Cox model based on the proportional hazards model to resolve covariates issue.  

16. In an adverse events (AE) table, what do N and % represent? - N counts the number of unique patients with adverse events (proc sort nodupkey; by subject; run;) and % counts the number of unique patients with the adverse event divided by the total number of patients. (proc sort data=ae out=ae_sev; by subject soc aedecod descending aesev; run; proc sort data=ae_sev nodupkey; by subject soc aedecod;run;)  

Another method: * USIBJID maybe counted twice;

proc freq data=adp.adae; tables aphase /list missing; table aphase*trtemfl*atoxgrn/list missing;

where aphase in ('Phase2 Cohort1' 'Phase2 Cohort2') and trtemfl = 'Y'; run;

* USIBJID counted only once by CTCT - This is a useful method to do frequency counts for codelists;  

proc sql; select unique aphase, trtemfl, atoxgrn, count(distinct usubjid) as count from adp.adae where aphase in ('Phase2 Cohort1' 'Phase2 Cohort2') and trtemfl = 'Y' group by aphase, atoxgrn;

* Get worst; select aphase, max(atoxgrn) as worst_ae, count(distinct usubjid) as count from adp.adae where aphase in ('Phase2 Cohort1' 'Phase2 Cohort2') and trtemfl = 'Y' group by aphase; quit;

* A - Unique USUBJID;

data oneae ;*oneaetox; set ae1; by usubjid atoxgrn; if last.usubjid then output oneae; 


* B - Unique USUBJID and Worst TOX is a three step process;

* a. Sort by USUBJID AESOC and Worst TOX;

proc sort data=ae1 out=ae1t; by usubjid aesoc descending ATOXGRn; run;

* b. First record by USUBJID AESOC to get only one AESOC per USUBJID;

proc sort data=ae1t nodupkey; by usubjid aesoc; run;

* c. First record by USUBJID ATOXGRn to get only one ATOXGRn per USUBJID;

proc sort data=ae1t nodupkey; by usubjid ATOXGRn; run;

17. Is there a LOG calculate site to confirm LOG base 2, for example? - Yes

18. For snapshots which are about 1 week after data cutoff, generally data cutoff or datacuts are required to filter the data.  What some things to consider when applying data cutoff? - Generally an endpoint is important which defines which patients or the first # dosed are included.  Of these selected patients, visit dates, for example are applied a condition such as IF <visit date> > '16JUN2016'D THEN DELETE;  May also consider imputing, such as first of the month or first of the year, for partial raw dates.  See CDM.

19. In general, what options exist for imputing values? - Factors include a) Individual/Group and b) Previous/Next Visit.  Options include MIN (Minimum), MAX (Maximum), AVG (Average), BC (Best Case), SLOCF/BLOCF (Screening/Baseline Observation Carried Forward), LOCF (Last Observation Carried Forward), WOCF (Worst Observation Carried Forward) and PROC MI for statistical imputation.  DATA Step with FIRST. or LAST. LAG(), Leading and Retaining Values, LEADS/LAGS or Proc SQL GROUP BY maybe helpful.  

In general, the following non-missing/missing factors are important to impute lab values - DATE, TIME and VALUE a) for any duplicate records by DATE and missing TIME, we keep the non-missing value if one is non-missing and the other is missing, b) for any duplicate records by DATE and non-missing and non-duplicate TIME and both non-missing values, we take the most recent TIME value, c) for duplicate records by DATE and missing or duplicate TIME and non-missing values, we take the average value.

20. What are options for identifying outliers? Options include values outside of the box and wisker's plot.  See New Clinical Programmer for more information.

21. What are general best practices when displaying number of decimal places?  To one decimal place for mean and standard deviation and original decimal places for median, min and max.

22. What are options for using weight variables in analysis? - How to understand weight variables in statistical analyses blog

Data Science Questions

1. Why are these components essential for Data Science - PROC SQL, PROC COMPARE, Basic Stats and Stats procedures, business knowledge and principles, data cleaning, metadata and visuals? - PROC SQL to build and query datasets, PROC COMPARE to compare datasets, basic stats since real improvements and value addition is from statistical principles, business knowledge to understand the problem and apply solutions, data cleaning to realize that data is not always correct and visuals to see patterns and trends in volumes of data.

PROC SQL Tool - join, subset datasets

PROC COMPARE Tool - assure no difference or update

Basic Stats and Stats Procedures - assure no difference or improvement

Business Knowledge and Principles - Pareto Analysis, Six Sigma, Project Management, Inventory/Cost Management, Sales and Service support 

Data Cleaning - duplicates, missing data, etc. 

Metadata - standards, automate

Visuals - assure no difference or improvement




1. How can you surpress notes and error messages? - Use OPTIONS NONOTES ERRORS=0;  Warnings can not be surpressed.  Remember to reset option to display notes. (1) (2)

2.  What are some useful options for modifying the configuration file? - WORK, REGISTER, ALTLOG, SORTSIZE, SET SASAUTOS, SASINITIALFOLDER 

3. Does SASAUTOS automatically search subdirectories when a path is specified? - No, each subdirectory must be included in SASAUTOS to retrieve any macros in subdirectories.

4. When submitting code in batch, is there a method to submit a line of syntax greater than 256 characters? - Yes, by including line breaks to break up the 256 character statement. Another option is to save the long SAS line of code in a separate file and %INCLUDE the code and set the LRECL= option as in this example - FILENAME MYFILE 'PATH TO FILE' LRECL=32767;

5. Should an index be used to increase efficiency when the percentage of matches is about 15%? - Yes, but no when matches are greater than 25% of the total records.

6. When executing a %INCLUDE 'INIT.SAS' file, for example, what is an option to display INIT.SAS in the log? - Yes, %INCLUDE 'INIT.SAS' / SOURCE; OPTIONS SOURCE2; before the %INCLUDE statement.

7. When using SAS indexes, are PROC SORTs required before merging two datasets?  No, although not required, PROC SORT with indexes sorts faster.  In fact, without PROC SORT, sorts take longer and when not indexed.

8. Are hash tables faster than SAS indexes? - Yes, in general, hash tables are up to 10% faster.

9. In general, what are the main benefits of using SAS indexes? - When processing large datasets on repetitive or transactional type processing, sorting for merging or subsetting.

10. How are SAS indexes similar to SET with KEY= option? - SAS indexes also use direct access method but BY KEY variable value instead of BY RECORD value.  SAS indexes use KEY= option which is similar to using the POINT= option to specify record numbers.  Both simple and complex indexes can use KEY= option.

11. What is an automatic method for storing the current program name? - %put The current program is %sysfunc(getoption(sysin));

12. Are using SAS indexes always an advantage? – No, because of the restrictions and maintenance.

13. Which SAS procedures can be used to create indexes? – Proc sql, Proc Datasets, Data Step

14. Can a dataset have more than one index? – Yes, it can have several simple and complex indexes

15. What is the difference between simple and complex index? – simple is one variable and complex is multiple variables

16. In general, where are SAS indexes stored? - The .SAS7BNDX file is stored in the same folder as the dataset.

17. When ever indexed datasets get updated the index must also be updated or basically recreated? - Yes, that is true.

18. Is there a system option to display if SAS uses indexes or not?  Along with MSGLEVEL=I sysytem option, you can use PROC CONTENTS to display if indexes were created.  If a data set has an index, then the CONTENTS procedure prints the Alphabetical Listing of Indexes and Attributes at the bottom of the listing.

19. Which statement is used to delete index files? - In PROC DATASETS, INDEX DELETE bigindex1 bigindex2; In PROC SQL; DROP INDEX STATE FROM INDEXLIB.PRODINDX; QUIT;

20. Which procedure is useful for copying datasets? - PROC DATASETS and PROC COPY, use DATECOPY to preserve the original creation and update dates.

21. What option is useful to inform SAS of already presorted data so SAS does not have to sort it again? - Apply the PRESORTED option in PROC SORT.  This prevents SAS from sorting the dataset again when saving to another dataset.  If dataset is presorted, then you can apply the SORTEDBY= dataset option in the DATA Step to store the BY variable name so that SAS does not resort the dataset for the same BY variable.

22. What is the syntax for deleting datasets in a libary? - PROC DATASETS LIBRARY= BDM MEMTYPE=DATA KILL; QUIT; RUN;

23. What are methods to clean up options, macro variables and work datasets for example?  See SAS paper on housekeeping.

24. What is one method to reset or redirect a libname to another libname? See SAS paper.

LIBNAME DATAONE 'C:\mydata\raw'; *Contains files labeled demog.sd2, lab.sd2, random.sd2, CRF.sd2; 

LIBNAME DATATWO 'C:\mydata\final'; *Contains files labeled demog.sd2, newlab.sd2,newCRF.sd2;


25. What is the method to display the full path name of libnames?  Use sashelp.vlibnam.  See SAS paper





1. When parallel programming to validate lists, tables or graphs, what is a useful strategy to compare results? - For validation purpose, all non-formated output should be in the same group and order as the production output file.

2. In the SAS Log file, other than ERRORS and WARNINGS, which are the top three important NOTES that need attention? - Errors require action to correct, Warnings need to confirmed or require action and Notes are generally FYI, 1. Variable Uninitialized,  2. Variable not found and 3. Missing values are generated.

3. What are useful techniques to better understand the data (Count, Mean, Min, Max and NMissing)?  Apply PROC FREQ DATA=; TABLES _CHARACTER_/LIST MISSING; RUN; PROC MEANS DATA=; VAR _NUMERIC_; RUN;

4. When validating AE listings or tables with Most Frequently Events (>=5%), how exactly is this done? - First a count based on SUBJID and AEDECOD should be done using PROC SORT and NODUPKEY.  Since one SUBJID can have multiple unique and non-unique AEDECODs, this will assure only count per SUBJID.  The saved output dataset should then get PROC FREQ by AEDECOD to get COUNTs of each AEDECOD.  If the COUNT equals the number of patients in the treatment group then that means 100% of the patients had that AEDECOD.  For >=5% cutoff, at least 5% of the patients had that AEDECOD.  

Next a percent value needs to be calculated to select only those >= 5%.  The percent value is determined by the COUNTS from PROC FREQ divided by the number of patients in each treatment group. This will then determine for that treatment group, which AEs had the highest frequencies.  Once the AEs are identified, then a subset condition is applied on AE to select all SUBJID records that match those AEs.  For several treatment groups, generally, the overall or sum of all treatment groups is used to determine >=5%. 

5. For AE tables, how are patient event percentages calculated? - In general, percentages are based on number of patients with adverse events divided by the total number of patients in each treatment group.  For any system body class and preferred term combination, there should not be more patients than in the treatment group.  One of the best way to assure this is to apply PROC SORT NODUPKEY on SUBJECT SYSTEM_BODY_CLASS PREFERRED_TERM.   

6. What are the three main categories of SAS bugs and a list of common SAS bugs? - Syntax, Data and Logic; missing semicolons, misspelled or missing keyword, incorrect SAS statements with the SAS procedure, uninitialized variables within a DAT Step, variable not found within a SAS Procedure, dropping variables when required later, missing values where generated, issues with missing data, numeric and character conversions, invalid data when reading data, character field truncation, illegal mathematical operation, incorrect processing of date or datetime variables, unbalanced quotes, SELECT/DO-END or IF/THEN blocks, by grouping processing without first presorting the dataset, unexpected number of records or variables, and logic errors which are difficult to detect since errors are not created.  See Convert variable values from character to numeric or from numeric to character tip.

7. What are some tips for checking the SAS program? - Syntax only can be checked with OPTIONS OBS=0;  PUT statements within DATA Steps help display data issues based on conditions and options (_INFILE_, _ALL_).  Add comments to describe complex steps.

8. What is a useful SAS EG task to help better understand datasets? - The SAS EG charactertize data task calls both PROC FREQ and PROC MEANS of all variables.

9. What is one way to confirm consistency between two numeric variables to be both missing or non-missing? - This is one useful technique using NMISS().  PROC FREQ DATA=NCI_ADAM.ADBMKS; TABLES USUBJID*PARAM*PEAK3DT*PEAK4DT*PEAK3VALN*PEAK4VALN/LIST MISSING; WHERE NMISS(PEAK3DT, PEAK3VALN) ^IN (0, 2) OR NMISS(PEAK4DT, PEAK4VALN) ^IN (0, 2); RUN;

10. Acceptable proc compare ERROR due to mismatch of ID records.  qc_ad_ate.log, ERROR: Data set WORK.FINAL contains 1 observations not in OVAD.ATE.  Proc compare still lists differences.

11. Acceptable invalid argument to INPUT function NOTE (Dates).  Very common to have difference in date informat and date value, ex. year only in DTC = 2018 which results in missing dates.  

Code: NALTDT = input(XPSTDTC,yymmdd10.); Solution to prevent NOTE: NALTDT = input(XPSTDTC, ?? yymmdd10.);

12. Acceptable invalid argument to INPUT function note (formats). Other example are due to differences in format and data type (num/char) or value which results in missing data.

Code: avalc_order = input(put(avalc, $respo.), best.);  avalc_format = put(avalc, $respf.); 

13. What is one method to append multiple rtf files using Word?  Select insert and then text from files, select rtf files.

  • Home
  • Common Questions
Powered by Wild Apricot. Try our all-in-one platform for easy membership management