Revolutionizing SDTM Programming in Pharma with ChatGPT
Revolutionizing SDTM Programming in Pharma with ChatGPT
By Sarath
Introduction
In the pharmaceutical industry, standardizing…
Revolutionizing SDTM Programming in Pharma with ChatGPT
Revolutionizing SDTM Programming in Pharma with ChatGPT
By Sarath
Introduction
In the pharmaceutical industry, standardizing…
The PROC DATASETS
procedure is a versatile and efficient tool within SAS for managing datasets. Often described as the “Swiss Army Knife” of SAS procedures, it allows users to perform a variety of tasks such as renaming, deleting, modifying attributes, appending datasets, and much more, all while consuming fewer system resources compared to traditional data steps. In this article, we’ll explore key use cases, functionality, and examples of PROC DATASETS
, illustrating why it should be part of every SAS programmer’s toolkit.
Unlike procedures like PROC APPEND
, PROC CONTENTS
, and PROC COPY
, which focus on specific tasks, PROC DATASETS
integrates the functionalities of these procedures and more. By using PROC DATASETS
, you avoid the need for multiple procedures, saving both time and system resources since it only updates metadata instead of reading and rewriting the entire dataset.
The basic structure of PROC DATASETS
is as follows:
PROC DATASETS LIBRARY=;
;
RUN; QUIT;
Here, you specify the library containing the datasets you want to modify. Commands such as CHANGE
, DELETE
, APPEND
, MODIFY
, and RENAME
follow within the procedure.
Renaming datasets and variables is a simple yet powerful capability of PROC DATASETS
. Here’s an example of how you can rename a dataset:
PROC DATASETS LIBRARY=mylib;
CHANGE old_data=new_data;
RUN; QUIT;
To rename a variable within a dataset:
PROC DATASETS LIBRARY=mylib;
MODIFY dataset_name;
RENAME old_var=new_var;
RUN; QUIT;
The APPEND
statement is a highly efficient alternative to using SET
in a data step because it only reads the dataset being appended (the DATA=
dataset), instead of reading both datasets.
PROC DATASETS LIBRARY=mylib;
APPEND BASE=master_data DATA=new_data;
RUN; QUIT;
Deleting datasets or members within a library is simple with PROC DATASETS
. You can delete individual datasets or use the KILL
option to remove all members of a library:
PROC DATASETS LIBRARY=mylib;
DELETE dataset_name;
RUN; QUIT;
PROC DATASETS LIBRARY=mylib KILL;
RUN; QUIT;
You can modify variable attributes such as labels, formats, and informats without rewriting the entire dataset:
PROC DATASETS LIBRARY=mylib;
MODIFY dataset_name;
LABEL var_name='New Label';
FORMAT var_name 8.2;
RUN; QUIT;
You can use PROC DATASETS
to manage audit trails, which track changes made to datasets. For instance, the following code creates an audit trail for a dataset:
PROC DATASETS LIBRARY=mylib;
AUDIT dataset_name;
INITIATE;
RUN; QUIT;
Indexes help retrieve subsets of data efficiently. You can create or delete indexes with PROC DATASETS
:
PROC DATASETS LIBRARY=mylib;
MODIFY dataset_name;
INDEX CREATE var_name;
RUN; QUIT;
Another useful feature is the AGE
command, which renames a set of files in sequence:
PROC DATASETS LIBRARY=mylib;
AGE file1-file5;
RUN; QUIT;
PROC DATASETS
is an indispensable tool for SAS programmers. Its efficiency and versatility allow you to manage datasets with ease, from renaming and deleting to appending and modifying attributes. By leveraging its full potential, you can streamline your SAS workflows and significantly reduce processing times.
Advanced SDTM Mapping Pitfalls and How to Avoid Them
Advanced SDTM Mapping Pitfalls and How to Avoid Them
Introduction
Mapping clinical data to SDTM domains is a complex process involving many technical and logical chall…
As an experienced SAS programmer working with the Study Data Tabulation Model (SDTM), it’s crucial to stay updated with the latest programming techniques. Whether you’re tasked with building SDTM domains from scratch or optimizing existing code, there are several advanced concepts that can improve your workflows and output. In this post, we’ll explore some techniques that can help you overcome common SDTM challenges and boost efficiency in handling clinical trial data.
When dealing with large clinical datasets, speed and efficiency are key. One method to optimize SDTM domain generation is to reduce the data footprint by eliminating unnecessary variables and duplicative observations. Consider the following approaches:
Duplicate records can slow down the processing of datasets and cause inaccuracies in reporting. To remove duplicates, you can use the PROC SQL
, DATA STEP
, or PROC SORT
methods. Here’s a quick example using PROC SORT
:
proc sort data=mydata nodupkey;
by usubjid visitnum;
run;
This example ensures that only unique records for each usubjid
and visitnum
combination are retained, eliminating potential redundancy in your dataset.
Utilizing macro variables efficiently can significantly improve your code’s flexibility, especially when working across multiple SDTM domains. Macro variables allow you to automate repetitive tasks and reduce the risk of human error. Here’s an example using macro variables to generate domain-specific reports:
%macro create_sdtm_report(domain);
data &domain._report;
set sdtm.&domain.;
/* Apply domain-specific transformations */
run;
%mend;
%create_sdtm_report(DM);
%create_sdtm_report(LB);
In this case, the macro dynamically generates SDTM reports for any domain by passing the domain name as a parameter, minimizing manual interventions.
The Demographics as Collected (DC)
domain often presents unique challenges, particularly when distinguishing it from the standard Demographics (DM) domain. While DM represents standardized data, DC focuses on the raw, collected demographic details. Here’s an approach to manage these domains efficiently:
data dc_domain;
set raw_data;
/* Capture specific collected demographic data */
where not missing(collected_age) and not missing(collected_gender);
run;
In this case, the code filters out any missing collected data to ensure the DC
domain contains only records with complete demographic information.
Efficient debugging is crucial, especially when dealing with complex SDTM transformations. The PUTLOG
statement in SAS is a simple yet powerful tool for tracking errors and debugging data issues.
data check;
set sdtm.dm;
if missing(usubjid) then putlog "ERROR: Missing USUBJID at obs " _n_;
run;
In this example, the PUTLOG
statement flags records where the USUBJID
variable is missing, making it easier to spot and address data quality issues during SDTM creation.
In certain domains like Vital Signs (VS) or Lab (LB), handling repeated measures for patients across multiple visits is common. Using arrays can help streamline this process. Here’s a basic example of using an array to process repeated lab measurements:
data lab_repeated;
set sdtm.lb;
array lb_vals{3} lbtestcd1-lbtestcd3;
do i=1 to dim(lb_vals);
if lb_vals{i} = . then putlog "WARNING: Missing value for LBTESTCD at " _n_;
end;
run;
This code uses an array to loop through repeated lab test results, ensuring that missing values are flagged for review. Such array-based techniques are essential when processing large, multidimensional datasets in SDTM programming.
To ensure SDTM datasets are CDISC compliant, it’s vital to validate your datasets using tools like Pinnacle 21 or OpenCDISC. These tools check compliance against SDTM standards, flagging any inconsistencies or issues.
Make sure to incorporate validation steps into your workflows regularly. This can be done by running the validation tool after each major dataset creation and by including clear annotations in your programming code to ensure traceability for audits.
Advanced SDTM programming requires a mix of technical expertise and strategic thinking. Whether you are optimizing large datasets, automating repetitive tasks with macros, or ensuring CDISC compliance, staying updated with these advanced techniques will enhance your efficiency and ensure high-quality deliverables in clinical trials. Remember, SDTM programming is not just about writing code—it’s about delivering accurate, compliant, and reliable data for critical decision-making.
For more tips and tutorials, check out other PharmaSUG resources or SAS support!
PROC COMPARE
Using the DROP
Dataset OptionWhen comparing two datasets using PROC COMPARE
in SAS, there may be cases where you want to exclude specific variables from the comparison. One efficient way to do this is by using the DROP
dataset option. This option allows you to remove certain variables from consideration during the comparison process.
DROP
Dataset Option in PROC COMPARE
The DROP
dataset option is applied directly to the dataset references in the BASE
and COMPARE
options of PROC COMPARE
. When you use DROP
, the specified variables are excluded from the comparison.
Here is the syntax for using the DROP
option:
proc compare base=dataset1(drop=var_to_exclude)
compare=dataset2(drop=var_to_exclude);
run;
In this example, var_to_exclude
is the name of the variable you want to exclude from both datasets (dataset1
and dataset2
) before the comparison is made.
Let’s say you have two datasets, sales_2023
and sales_2024
, and you want to compare them, but you want to exclude a variable called region
from the comparison. Here is how you can do that:
proc compare base=sales_2023(drop=region)
compare=sales_2024(drop=region);
run;
This code ensures that the variable region
will not be included in the comparison, while all other variables in the two datasets will be compared.
You can also exclude multiple variables by listing them inside the DROP
option, separated by spaces:
proc compare base=sales_2023(drop=region quarter)
compare=sales_2024(drop=region quarter);
run;
In this case, both region
and quarter
will be excluded from the comparison.
The DROP
dataset option is a powerful and simple tool for excluding variables from comparison in PROC COMPARE
. It provides flexibility by allowing you to exclude one or more variables without needing to manually specify which variables should be included in the comparison.
By leveraging the DROP
option, you can ensure that only relevant variables are considered when comparing datasets, making your SAS programming more efficient and focused.
PROC SQL in SAS is a powerful tool, especially when you need to add new columns to existing tables. Whether you’re updating tables with new data or making calculated columns, following best practices ensures your code is efficient, maintainable, and easy to understand.
ALTER TABLE
to Add ColumnsThe most straightforward way to add a column to a table is by using the ALTER TABLE
statement. It allows you to add one or more columns without disrupting the existing data structure.
proc sql; alter table your_table_name add new_column1 num format=8. label='New Column 1', new_column2 char(20) label='New Column 2'; quit;
num
for numeric, char(n)
for character) and add useful formats and labels to keep your columns clear and organized.
Before adding columns, check if a column with the same name already exists. SAS will throw an error if you try to add a column that’s already in the table.
proc sql; describe table your_table_name; quit;
Use the DESCRIBE
statement to view the structure and ensure no duplicate column names exist.
After adding a column, you may need to initialize it with default values. You can do this in a single step using the UPDATE
statement to ensure all rows are filled efficiently:
proc sql; alter table your_table_name add new_column num format=8. label='New Numeric Column'; update your_table_name set new_column = 0; /* Default value */ quit;
CREATE TABLE
for Complex Column AdditionIf you need to add columns based on existing data or calculations, it’s often better to create a new table. This prevents any unintentional changes to the original table and allows for cleaner manipulation of data:
proc sql; create table new_table as select existing_column, calculated(new_column) as new_column format=8. label='Calculated Column' from your_table_name; quit;
calculated
to create columns based on other columns in the same query.
If the new column will be used in sorting or joins, adding an index can significantly boost performance. Here’s how to create an index on your new column:
proc sql; create index idx_new_column on your_table_name(new_column); quit;
Always document your SQL code! Adding comments helps future developers (or your future self) understand the purpose of each column addition:
proc sql; alter table your_table_name add new_column char(10) label='New Character Column'; /* Added for analysis purposes */ quit;
By following these best practices for adding columns via PROC SQL, you can keep your SAS programming efficient and clear. Whether you’re maintaining legacy systems or building new reports, these tips will help you seamlessly integrate new data into your tables.
How to Address PROC COMPARE Reporting Same Values as Different in SAS
How to Address PROC COMPARE Reporting Same Values as Different in SAS
Working with large datasets in SAS often requires comparing data between two tables….
5 Approaches to Identify Blank Columns in SAS Datasets
5 Approaches to Identify Blank Columns in SAS Datasets
Author: Sarath
Date: September 5, 2024
When working with large datasets in SAS, it’s crucial to e…