How to Upcase All Variables in a SAS Dataset
How to Upcase All Variables in a SAS Dataset
When working with character data in SAS, you may often need to ensure that all text variables are in uppercase. Fortunately, SAS …
How to Upcase All Variables in a SAS Dataset
How to Upcase All Variables in a SAS Dataset
When working with character data in SAS, you may often need to ensure that all text variables are in uppercase. Fortunately, SAS …
Dynamic macro creation is a powerful technique in SAS that allows you to generate macro variables and macros based on data content or logic at runtime. This not only simplifies repetitive tasks but also provides a way to dynamically control program flow. In this article, we’ll cover various scenarios and provide multiple examples where dynamic macro creation can be beneficial.
Imagine a scenario where you have a dataset and need to dynamically create macros to store variable names or their values. This is useful for automating variable processing tasks, such as generating reports, manipulating data, or performing analyses.
/* Create macros for each unique variable in the dataset */
proc sql;
select distinct name
into :var1-:varN
from sashelp.class;
quit;
%macro display_vars;
%do i=1 %to &sqlobs;
%put &&var&i;
%end;
%mend display_vars;
%display_vars;
Explanation: This code dynamically selects variable names from the sashelp.class
dataset and stores them in macro variables. The macro display_vars
prints out each variable name, allowing flexible processing of variables without knowing them in advance.
Let’s say you have multiple categories or groups within your data, and you need to run a set of analyses or create reports for each unique group. You can dynamically generate macros for each category, making the process scalable and automated.
/* Create macros for each unique 'sex' category in the dataset */
proc sql;
select distinct sex
into :sex1-:sexN
from sashelp.class;
quit;
%macro analyze_sex;
%do i=1 %to &sqlobs;
proc print data=sashelp.class;
where sex = "&&sex&i";
title "Listing for Sex = &&sex&i";
run;
%end;
%mend analyze_sex;
%analyze_sex;
Explanation: This example dynamically creates a macro for each unique sex
value and applies a filter in PROC PRINT
for each group, generating reports for each distinct value in the dataset.
In some cases, you need to execute different code based on the values of certain variables or the content of a dataset. Dynamic macro creation helps generate conditional logic on the fly.
/* Identify numeric variables in the dataset and generate macro code */
proc sql;
select name
into :numvar1-:numvarN
from dictionary.columns
where libname='SASHELP' and memname='CLASS' and type='num';
quit;
%macro analyze_numeric_vars;
%do i=1 %to &sqlobs;
proc means data=sashelp.class;
var &&numvar&i;
title "Analysis of &&numvar&i";
run;
%end;
%mend analyze_numeric_vars;
%analyze_numeric_vars;
Explanation: This code identifies numeric variables in the dataset and dynamically creates macro variables for each. The macro analyze_numeric_vars
runs PROC MEANS
for each numeric variable, adjusting to any changes in the dataset structure.
Dynamic macros are helpful in generating dynamic reports or exporting data where the structure or content changes frequently. You can use dynamic macros to control file names, report titles, or export paths.
/* Generate macros for each unique 'name' value and create reports */
proc sql;
select distinct name
into :name1-:nameN
from sashelp.class;
quit;
%macro create_reports;
%do i=1 %to &sqlobs;
proc print data=sashelp.class;
where name = "&&name&i";
title "Report for &&name&i";
run;
%end;
%mend create_reports;
%create_reports;
Explanation: This code dynamically creates a report for each unique name
in the dataset, with the report’s title reflecting the name
being processed. The code adapts to changes in the dataset, automating the report generation process.
PROC SQL
with INTO
Clauses: This is the most efficient way to generate dynamic macro variables from dataset content.&sqlobs
: The &sqlobs
macro variable is useful for counting the number of records or unique values, ensuring the loop runs the correct number of times.Dynamic macro creation in SAS provides a robust and flexible way to automate repetitive tasks, process data efficiently, and adjust code dynamically based on dataset content. By generating macros based on variables or values within a dataset, you can create dynamic, scalable solutions for various SAS programming challenges.
VISIT
and VISITNUM
Values from SDTM DatasetsAuthor: [Your Name]
Date: [Creation Date]
In clinical trials, the VISIT
and VISITNUM
variables are key identifiers for subject visits. Ensuring that all datasets have consistent visit data and that it aligns with the planned visits recorded in the TV
(Trial Visits) domain is crucial for accurate data analysis. This post presents a SAS macro that automates the extraction of unique VISIT
and VISITNUM
values across all SDTM datasets in a library and compares them to those found in the TV
domain.
The SAS macro program:
VISIT
and VISITNUM
values from all SDTM datasets in the specified library.TV
domain.TV
domain.Here’s the macro that performs the task:
%macro compare_visit(libname=);
/* Step 1: Get the unique VISIT and VISITNUM values from the TV domain */
proc sql;
create table tv_visit as
select distinct VISIT, VISITNUM
from &libname..TV
where VISIT is not missing and VISITNUM is not missing;
quit;
/* Step 2: Get the list of datasets in the library containing both VISIT and VISITNUM */
proc sql noprint;
select memname
into :dslist separated by ' '
from sashelp.vcolumn
where libname = upcase("&libname")
and name in ('VISIT', 'VISITNUM')
group by memname
having count(distinct name) = 2; /* Ensure both VISIT and VISITNUM are present */
quit;
/* Step 3: Check if any datasets were found */
%if &sqlobs = 0 %then %do;
%put No datasets in &libname contain both VISIT and VISITNUM variables.;
%end;
%else %do;
%put The following datasets contain both VISIT and VISITNUM variables: &dslist;
/* Initialize an empty dataset for combined VISIT and VISITNUM values */
data combined_visits;
length Dataset $32 VISIT $200 VISITNUM 8;
stop;
run;
/* Step 4: Loop through each dataset */
%let ds_count = %sysfunc(countw(&dslist));
%do i = 1 %to &ds_count;
%let dsname = %scan(&dslist, &i);
/* Extract unique VISIT and VISITNUM values, excluding UNSCHEDULED visits */
proc sql;
create table visit_&dsname as
select distinct "&&dsname" as Dataset, VISIT, VISITNUM
from &libname..&&dsname
where VISIT is not missing and VISITNUM is not missing
and VISIT not like 'UNSCH%'; /* Exclude UNSCHEDULED visits */
quit;
/* Append to the combined dataset */
proc append base=combined_visits data=visit_&dsname force;
run;
%end;
/* Step 5: Compare combined VISIT/VISITNUM with TV domain */
proc sql;
create table visit_comparison as
select a.*, b.Dataset as In_SDTC_Dataset
from tv_visit a
left join combined_visits b
on a.VISIT = b.VISIT and a.VISITNUM = b.VISITNUM
order by VISITNUM, VISIT;
quit;
/* Step 6: Display the comparison results */
proc print data=visit_comparison;
title "Comparison of VISIT/VISITNUM between TV and SDTM Datasets (Excluding Unscheduled Visits)";
run;
%end;
%mend compare_visit;
/* Run the macro by specifying your SDTM library name */
%compare_visit(libname=sdtm);
This macro performs the following steps:
VISIT
and VISITNUM
values from the TV
domain.VISIT
and VISITNUM
variables by querying the metadata table SASHELP.VCOLUMN
.VISIT
and VISITNUM
values and appends them into a consolidated dataset.TVff
domain and displays any discrepancies.This macro is especially useful when checking that all actual visits recorded in the SDTM datasets align with the planned visits documented in the TV
domain. Ensuring consistency between these values is essential for accurate clinical trial reporting and analysis.
%compare_visit(libname=sdtm);
In this example, the macro will search for VISIT
and VISITNUM
variables in the SDTM datasets located in the sdtm
library and compare them with the values in the TV
domain.
By automating the process of extracting and comparing VISIT
and VISITNUM
values, this macro simplifies what could otherwise be a tedious and error-prone task. It ensures that all visit data is consistent and complete, aligning the planned and actual visits in the SDTM datasets.
Feel free to adapt this macro to meet your specific needs in clinical trials data management!
EPOCH
Values in SDTM Datasets using a SAS MacroAuthor: [Sarath]
Date: [05SEP2024]
The EPOCH
variable is essential in many SDTM datasets as it helps describe the period during which an event, observation, or assessment occurs. In clinical trials, correctly capturing and analyzing the EPOCH
variable across datasets is crucial. This post walks through a SAS macro program that automates the process of finding all EPOCH
values from any dataset within an entire library of SDTM datasets.
This macro program loops through all the datasets in a specified library, checks for the presence of the EPOCH
variable, and extracts the unique values of EPOCH
from each dataset. It then consolidates the results and displays them for review.
EPOCH
variable.EPOCH
variable for each dataset.Here’s the macro that performs the task:
%macro find_epoch(libname=);
/* Get a list of all datasets in the library */
proc sql noprint;
select memname
into :dslist separated by ' '
from sashelp.vcolumn
where libname = upcase("&libname")
and name = 'EPOCH';
quit;
/* Check if any dataset contains the EPOCH variable */
%if &sqlobs = 0 %then %do;
%put No datasets in &libname contain the variable EPOCH.;
%end;
%else %do;
%put The following datasets contain the EPOCH variable: &dslist;
/* Loop through each dataset and extract unique EPOCH values */
%let ds_count = %sysfunc(countw(&dslist));
%do i = 1 %to &ds_count;
%let dsname = %scan(&dslist, &i);
/* Extract unique values of EPOCH */
proc sql;
create table epoch_&dsname as
select distinct '&&dsname' as Dataset, EPOCH
from &libname..&&dsname
where EPOCH is not missing;
quit;
%end;
/* Combine the results from all datasets */
data all_epochs;
set epoch_:;
run;
/* Display the results */
proc print data=all_epochs;
title "Unique EPOCH values across datasets in &libname";
run;
%end;
%mend find_epoch;
/* Run the macro by specifying your SDTM library name */
%find_epoch(libname=sdtm);
The macro works by querying the SASHELP.VCOLUMN
metadata table to check for the presence of the EPOCH
variable in any dataset. It loops through the datasets that contain the variable, extracts distinct values, and aggregates the results into a single dataset.
EPOCH
variable.EPOCH
, it extracts unique values.Imagine you have a large collection of SDTM datasets and need to quickly check which datasets contain the EPOCH
variable and what unique values it holds. Running this macro allows you to do this across your entire library with minimal effort.
%find_epoch(libname=sdtm);
In this example, the macro will search for the EPOCH
variable in the SDTM datasets stored in the library named SDTM
. It will then display the unique values of EPOCH
found in those datasets.
This macro simplifies the task of analyzing the EPOCH
variable across multiple datasets in a library, saving time and reducing manual effort. By leveraging the power of PROC SQL
and macros, you can automate this otherwise tedious process.
Feel free to adapt and expand this macro to suit your specific needs! Happy coding!
Netflix has revolutionized the way we consume entertainment, providing an extensive library of movies, TV shows, documentaries, and more. One topic that often comes up in discussions about Netflix is password sharing. With millions of subscribers world…
Ensuring Data Quality with SAS: Checking for Non-ASCII Characters
Ensuring Data Quality with SAS: Checking for Non-ASCII Characters
Author: Sarath Annapareddy
Date: September 2, 2024
Introduction
In the world of …
Question: You are given a raw dataset from a clinical trial. How would you approach creating an SDTM domain?
Answer: First, I would familiarize myself with the SDTM Implementation Guide to understand the specific structure and variables required for the domain. I would then map the raw data to the corresponding SDTM variables, ensuring to follow CDISC standards. This involves creating a specification document that outlines the mapping rules and any necessary derivations. Finally, I would validate the domain using tools like Pinnacle 21 to ensure compliance.
Question: How do you handle missing data in your analysis datasets?
Answer: Handling missing data depends on the type of analysis. Common methods include imputation, where missing values are replaced with the mean, median, or mode of the dataset, or using a placeholder like “999” for numeric or “UNK” for character variables. The choice of method depends on the nature of the data and the analysis requirements. I would document the method used for handling missing data in the analysis dataset metadata.
Question: You’ve run Pinnacle 21 validation and received multiple warnings and errors. How do you address these?
Answer: I would prioritize the errors, as these typically indicate critical issues that could prevent submission. I would review the Pinnacle 21 documentation to understand the nature of each error and make the necessary corrections in the datasets. Warnings, while less critical, should also be addressed if they impact the integrity or clarity of the data. After making the corrections, I would rerun Pinnacle 21 to ensure all issues are resolved.
Question: How would you approach creating a Define.XML for a study with multiple domains?
Answer: Creating a Define.XML involves several steps:
Question: What steps do you take to create a mapping specification document for SDTM conversion?
Answer:
Question: If a study requires a custom domain not defined in the SDTM Implementation Guide, how would you create it?
Answer:
Question: How would you optimize a SAS program to handle very large datasets?
Answer:
Question: What is your process for annotating aCRFs?
Answer:
Question: You are tasked with creating an Adverse Events (AE) domain. What steps would you follow?
Answer:
Question: Describe how you would clean a dataset that has inconsistent date formats and missing values.
Answer:
Question: How do you ensure that the Define.XML you generate is fully compliant with CDISC standards?
Answer: I would follow these steps:
Question: What steps would you take to validate SDTM mapping for a clinical trial dataset?
Answer:
Question: You receive an ad-hoc request to provide summary statistics for a particular dataset that hasn’t been prepared yet. How do you handle this request?
Answer: I would:
Question: How would you merge multiple datasets with different structures in SAS to create a comprehensive analysis dataset?
Answer:
Question: You discover data integrity issues during your analysis, such as duplicate records or outliers. How do you address these?
Answer:
Question: How would you generate a safety report for a clinical trial using SAS?
Answer:
Question: How do you ensure your SAS programming complies with CDISC standards?
Answer:
Question: Describe how you manage SDTM mappings for multiple studies with varying data structures.
Answer:
Question: How would you create a report summarizing serious adverse events (SAEs) for a clinical trial?
Answer:
Question: You discover discrepancies between the raw data and the SDTM datasets. How do you address this?
Answer:
Answer: SDTM (Standard Data Tabulation Model) is a standard structure for study data tabulations that are submitted as part of a product application to a regulatory authority such as the FDA. SDTM plays a crucial role in ensuring that data is consistently structured, making it easier to review and analyze clinical trial data.
Answer: A Mapping Document in SAS programming typically includes:
Answer: Pinnacle 21 is a software tool used to validate datasets against CDISC standards, including SDTM. It checks for compliance with CDISC rules, identifies errors, and generates reports to help programmers correct any issues before submission to regulatory authorities.
Answer: An annotated CRF (aCRF) is a version of the Case Report Form (CRF) that includes annotations mapping each field to the corresponding SDTM variables. It serves as a reference for how the collected data should be represented in the SDTM datasets.
Answer: CDISC (Clinical Data Interchange Standards Consortium) is an organization that develops standards to streamline the clinical research process. CDISC standards, such as SDTM and ADaM, ensure that data is consistently structured, improving the efficiency of data sharing, analysis, and regulatory review.
Answer: Define.XML is a machine-readable metadata file that describes the structure and content of clinical trial datasets, such as SDTM and ADaM. It is an essential component of regulatory submissions, providing transparency and traceability of the data.
Answer: The cSDRG (Clinical Study Data Reviewer’s Guide) is a document that accompanies Define.XML and provides context to the submitted datasets. It explains the study design, data collection, and any decisions made during the mapping process, helping reviewers understand the data and its lineage.
Answer: To validate SDTM datasets using Pinnacle 21, you load the datasets into the software and run a compliance check. Pinnacle 21 then generates a report highlighting any issues, such as missing variables, incorrect formats, or non-compliance with CDISC standards. You would then address these issues and rerun the validation until the datasets pass all checks.
Answer: SDTM datasets are designed to represent the raw data collected during a clinical trial, organized in a standard format. ADaM datasets, on the other hand, are derived from SDTM datasets and are used for statistical analysis. ADaM datasets include additional variables and structure to support the specific analyses described in the study’s statistical analysis plan (SAP).
Answer: Common challenges when mapping data to SDTM standards include:
Answer: Ensuring the accuracy of Define.XML involves meticulous mapping of each dataset variable, validation using tools like Pinnacle 21, and thorough review of the metadata descriptions. It is essential to cross-check Define.XML against the SDTM datasets, annotated CRF, and mapping specifications to ensure consistency.
Answer: Controlled terminology in CDISC standards refers to the standardized set of terms and codes used across datasets to ensure consistency and interoperability. It is crucial for maintaining data quality and facilitating accurate data analysis and reporting, especially in regulatory submissions.
Answer: Common errors identified by Pinnacle 21 in SDTM datasets include:
Answer: Discrepancies between the aCRF and SDTM datasets are handled by reviewing the mapping logic and ensuring that the SDTM datasets accurately reflect the data collected in the CRF. If necessary, updates to the mapping document or annotations on the aCRF are made to resolve inconsistencies.
Answer: The process for creating a cSDRG involves documenting the study design, data collection processes, and any decisions made during data mapping. This includes explaining any deviations from standard CDISC practices, justifications for custom domains, and providing details on data derivations. The cSDRG is typically created alongside Define.XML and reviewed as part of the submission package.
Answer: Key elements of a successful CDISC implementation include:
Answer: Ensuring data traceability from source to submission in SDTM datasets involves:
Answer: The Study Data Tabulation Model (SDTM) plays a critical role in regulatory submissions by providing a standardized format for organizing and presenting clinical trial data. This standardization facilitates the efficient review and analysis of data by regulatory authorities, such as the FDA, and ensures consistency across submissions.
Answer: Managing updates to SDTM and ADaM standards in ongoing studies involves:
Answer: Best practices for creating Define.XML files include:
Answer: Validation of aCRF and Define.XML involves cross-referencing the annotations and metadata with the SDTM datasets to ensure accuracy. Tools like Pinnacle 21 are used to check for compliance with CDISC standards, and any discrepancies are addressed through revisions to the documents.
Answer: Creating a custom domain in SDTM involves:
Answer: Maintaining consistency between aCRF, SDTM datasets, and Define.XML is crucial for ensuring that the data submission is clear, accurate, and compliant with regulatory requirements. Consistency helps avoid discrepancies that could lead to questions from regulatory reviewers, delays in the review process, or even rejections of the submission.
Answer: To ensure that the SDTM mapping document is comprehensive and accurate, you should:
Answer: When discrepancies are found during the validation of SDTM datasets, the following steps are taken:
Answer: Common challenges when creating SDTM datasets include:
Answer: Documenting the SDTM mapping process involves:
Answer: Controlled terminology ensures that data is consistently coded across datasets, which is essential for accurate data analysis and regulatory review. It helps maintain consistency and facilitates data integration across studies and submissions.
Answer: Creating the cSDRG involves:
Answer: Ensuring the accuracy and completeness of Define.XML involves:
Answer: The aCRF (annotated CRF) plays a crucial role in the context of SDTM and Define.XML by providing a visual representation of how the collected data is mapped to the SDTM domains. It serves as a reference for both the SDTM mapping and the Define.XML, ensuring consistency and traceability throughout the submission process.
Answer: Managing the integration of external data sources into SDTM datasets involves:
Answer: Common pitfalls to avoid when creating Define.XML files include:
Answer: Handling updates to the SDTM Implementation Guide during an ongoing study involves:
Answer: The RELREC (Related Records) domain is used to link related records across different SDTM domains, while the SUPPQUAL (Supplemental Qualifiers) domain is used to capture additional information not included in the standard SDTM variables. Both domains play a crucial role in ensuring that all relevant data is captured and can be analyzed together, even if it doesn’t fit neatly into the predefined SDTM structure.
Answer: Ensuring consistency between SDTM and ADaM datasets involves:
Answer: Validating custom domains in SDTM involves:
Answer: Metadata plays a critical role in Define.XML and cSDRG by providing detailed information about the structure, content, and meaning of the datasets. In Define.XML, metadata describes each dataset, variable, and code list, while in the cSDRG, it helps explain the study design, data collection processes, and any deviations from standard practices. Metadata ensures that the data is well-documented, transparent, and traceable, facilitating regulatory review and analysis.
Answer: Ensuring that SDTM datasets are submission-ready involves:
Answer: Common challenges in implementing CDISC standards in clinical trials include:
Answer: The creation and validation of aCRF involve:
Answer: The SUPPQUAL (Supplemental Qualifiers) domain in SDTM is used to capture additional information that does not fit into the standard SDTM variables. It allows for flexibility in representing data that may be unique to a specific study or does not have a predefined place in the existing SDTM domains. SUPPQUAL ensures that all relevant data is included in the submission, even if it requires customization.
Answer: Managing updates to controlled terminology in an ongoing clinical trial involves:
Answer: Creating a custom domain in SDTM involves:
Answer: Maintaining consistency between aCRF, SDTM datasets, and Define.XML is crucial for ensuring that the data submission is clear, accurate, and compliant with regulatory requirements. Consistency helps avoid discrepancies that could lead to questions from regulatory reviewers, delays in the review process, or even rejections of the submission.
Answer: To ensure that the SDTM mapping document is comprehensive and accurate, you should:
Answer: When discrepancies are found during the validation of SDTM datasets, the following steps are taken:
Answer: Common challenges when creating SDTM datasets include:
Answer: Documenting the SDTM mapping process involves:
Answer: Validating custom domains in SDTM involves: