Presentations by users are the heart of a SAS users group meeting. MWSUG 2025 will feature a variety of papers and presentations organized into several academic sections covering many different topics and experience levels.
Note: Content and schedule are subject to change. Last updated 09-Sep-2025.
- AI and Emerging Technology
- Analysis & Advanced Analytics
- Anything Data
- Banking and Finance
- Beyond the Basics
- Pharma and Life Sciences
- Posters
- Visualization and Reporting
AI and Emerging Technology
Analysis & Advanced Analytics
Paper No. | Author(s) | Paper Title (click for abstract) |
AL-015 | Jayanth Iyengar | Conducting Survival Analysis in SAS using Medicare Claims as a Real-world data source |
AL-031 | Brandy Sinco et al. | Logistic Regression Odyssey on a Small Sample Due to Rare Cancer |
AL-037 | Troy Hughes | GIS Challenges of Cataloging Catastrophes: Serving up GeoWaffles with a Side of Hash Tables to Conquer Big Data Point-in-Polygon Determination and Supplant SAS PROC GINSIDE |
AL-052 | Ross Bettinger | Feature Selection and Classification Using Fuzzy Logic |
AL-056 | Danny Modlin | PROC BGLIMM: The Smooth Transition to Bayesian Analysis |
AL-057 | Danny Modlin | How to Modify SAS9 Statistics Programs to Run in SAS Viya |
AL-058 | Danny Modlin | Large-Scale Time Series Forecasting in Model Studio |
AL-059 | Hangcen Zou | Kernel Bandwidth Selection for Maximum Mean Discrepancy |
AL-079 | Ryan Paul Lafler | Modern Deep Learning Architectures and Transfer Learning for Tabular, Unstructured, Sequential, and Time-Dependent Data |
Anything Data
Banking and Finance
Paper No. | Author(s) | Paper Title (click for abstract) |
BF-051 | David Corliss | Analysis of Economic Turbulence and Disruptive Events in Finance |
BF-083 | Rex Pruitt | Follow the Blueprint for Success when Migrating Banking and Finance Customers to SAS Viya |
BF-087 | Rex Pruitt | Valuable Lessons Learned from Migrating Banking and Finance Use Cases to Viya |
Beyond the Basics
Pharma and Life Sciences
Posters
Paper No. | Author(s) | Paper Title (click for abstract) |
PO-011 | Louise Hadden | ExCITE-ing! Build Your Paper's Reference Section Programmatically Using Lex Jansen's Website and SAS |
PO-012 | Louise Hadden | The World is Not Enough: Base SAS Visualizations and Geolocations |
PO-036 | Troy Hughes | Who's Bringing That Big Data Energy? A 48-Year Longitudinal Analysis of 30,000 Presentations in the SAS User Community To Elucidate Top Contributors and Rising Stars |
PO-060 | Jimin Lee | Market Making Control Problem with Inventory Risk |
PO-086 | Connor Ayscue | SAS Viya Voyage - Onboarding, Learning, and Thriving in the Cloud |
Visualization and Reporting
Paper No. | Author(s) | Paper Title (click for abstract) |
VR-005 | Kirk Paul Lafler | Dashboards Made Easy Using SAS Software |
VR-010 | Louise Hadden | The (ODS) Output of Your Desires: Creating Designer Reports and Data Sets |
VR-034 | Troy Hughes | From Word Clouds to Phrase Clouds to Amaze Clouds: A Data-Driven Python Programming Solution To Building Configurable Taxonomies That Standardize, Categorize, and Visualize Phrase Frequency |
VR-045 | Melinda Macdougall | Maps, maps, and more maps using SAS PROC SGMAP! |
VR-048 | LeRoy Bessler | Wise Graphic Design & Color Use for Data Graphics Easily, Quickly, Correctly Understood |
VR-084 | Justin Bates | Progressively Building a Dynamic Report |
Abstracts
AI and Emerging Technology
AE-001 : Designing Against Bias: Identifying and Mitigating Bias in Machine Learning and AIDavid Corliss, Peace-Work
Bias in machine learning algorithms is one of the most important ethical and operational issues in statistical practice today. This paper describes common sources of bias and how to develop study designs to measure and minimize it. Analysis of disparate impact is used to quantify bias in existing and new applications. New open-source packages such as Fairlearn and AI Fairness 360 Toolkit quantify bias by automating the measurement of disparate impact on marginalized groups, offering great promise to advance the mitigation of bias. These design strategies are described in detail with examples. Also, a comparison algorithm can be developed that is designed to be fully transparent and without features subject to bias. Comparison to this bias-minimized model can identify areas as bias in other algorithms.
AE-006 : Code Smarter, Not Harder: The 5 C's of ChatGPT for the SASsy Professional
Kirk Paul Lafler, SasNerd
Charu Shankar, SAS Institute
In today's fast-paced analytics world, efficiency is everything. This paper explores how ChatGPT and SAS complement each other across the 5 C's to enhance productivity: Communicate Automate structured email reports with SAS, or use ChatGPT for dynamic, polished messaging. Learn when to use each for maximum impact. Code Speed up SAS programming with ChatGPT's syntax suggestions, debugging tips, and code generation all while maintaining best practices. Collaborate Use SAS for version control and shared repositories, while ChatGPT streamlines teamwork with summaries, documentation, and code explanations. Customize Enhance efficiency with SAS macros and fine-tune ChatGPT prompts for tailored reporting and automation. Create Harness AI-driven insights for problem-solving while ensuring accuracy and compliance in SAS analytics. Featuring hands-on demos and real-world examples, this session will equip you with practical strategies to code smarter, not harder.
AE-007 : Benefits, Challenges, and Opportunities with Open Source Technologies in the 21st Century
Kirk Paul Lafler, SasNerd
Ryan Paul Lafler, Premier Analytics Consulting, LLC
Joshua Cook, University of West Florida (UWF)
Stephen Sloan, Dawson D R
Anna Wade, Emanate Biostats
Organizations around the globe are truly facing a paradigm shift with the type of software, the quantity and availability of software technologies, including open source, and the creative ways the many technologies live, play, and thrive in the same sand box together. We'll explore the many benefits, challenges, and opportunities with open source technologies in the 21st century. We'll also describe the challenges facing user communities as they find ways to integrate open source software and technologies, handle compatibility and vulnerability issues, address security limitations, manage intellectual property and warranty issues, and address inconsistent development practices. Plan to join us for an informative presentation about the benefits, challenges, and opportunities confronting open source user communities around the world, including the application and current state of Python, R, SQL, database systems, cloud computing, software standards, and the collaborative nature of community in the 21st century.
AE-013 : The Dependency Whisperer: AI That Sees What You Might Miss
Ming Yan, Eli Lilly
Xingxing Wu, Eli Lilly
Vina Ro, Eli Lilly
Chih-Chan Lan, University of Soutern California
An ongoing challenge in clinical programming is ensuring that downstream analyses are accurately refreshed following updates to SDTM or ADaM datasets. Traditional tools like AstroGrep can locate references to specific datasets or variables, but they lack the ability to distinguish between active code and commented text. Moreover, identifying indirect dependencies often requires multiple searches and manual effort to document all affected areas, which is time-consuming and increases the risk of missing updates potentially leading to incorrect outputs being shared with external parties. This paper introduces an AI-powered tool designed to automatically identify and visualize all datasets and variables impacted by upstream changes. By parsing SDTM, ADaM, and TFL specifications or programs, the tool learns the structure of dependencies across the analysis pipeline. When a user specifies an updated variable and dataset, the tool generates a graphical report highlighting all affected elements. This capability streamlines the refresh process, reduces manual effort, and ensures the accuracy and completeness of deliverables.
AE-041 : Predicting Farm Equipment Purchases and Trades
Koren Roland, Kansas State University
Farmers experience profitability issues every year due to factors such as weather, global politics, and farm management. Timing equipment purchases and trades, especially large equipment like combines, can impact both crop production and depreciation costs. This project aims to analyze historical depreciation and analysis data from the Kansas Farm Management Association identify patterns and predict optimal timing for large equipment purchases. This research uses SAS Viya 4 to preprocess and clean the data, ensuring its reliability. Clustering techniques are used to segment the farms into categories based on shared features and predictive analysis is used to determine the factors that impact the purchase of large farm equipment. Data visualization is used to review the segments and prediction results. The findings from this study allow farm managers and farm economists to make more effective business decisions about equipment purchases.
AE-061 : Leveraging Self-supervised Learning to Synthesize Data
Urvi Mehta, University of Michigan
While traditional supervised learning concept has achieved significant success, it is hindered by two primary limitations: the constant need for labelled data and reliance on manually predefined labels. These constraints can be impractical when dealing with large real-world datasets. This is when self-supervised learning takes center stage, especially when harnessing Generative Adversarial Networks (GANs). Self-supervised learning makes use of the structure and correlations within the data to generate its own labels. Empowering GANs to gather insights from unstructured data, effectively diminishing the reliance on manually annotated information. GANs undergo training through two opposing neural networks: a generator that improves its capacity to produce synthetic data that can't be distinguished from actual data, and a discriminator that improves its capacity to tell the difference between the two. As we mark our remarkable progress in the domain of technology, machine learning and data-driven decision-making, we face critical challenges, with the scarcity of labelled data being a significant hurdle in building robust machine learning models. This scarcity often stems from privacy concerns, resource limitations, and the labor- intensive process of data labeling. Self-supervised learning, powered by GANs for synthetic data generation, offers a promising direction to address these challenges and advance the field of machine learning.
AE-062 : SAS Trustworthy AI Examples
Caleb Petterson, University of Minnesota Twin Cities
This project delivers a collection of reusable, endtoend code examples that illustrate key TrustworthyAI (TAI) principles in realistic scenarios. Implemented in both Python and SAS, this repository spans multiple sensitive industries, including healthcare, finance, education, and public sectors. The repository sets out to provide three key components: a clear, endtoend framework for building TAI systems; techniques for models to handle shifts, anomalies, and intentional manipulation without breaking; shareable, easytofollow notebooks that walk through and explain every step of the TAI process.
AE-064 : Responsible use of AI Systems
Jim Box, SAS Institute
AI is everywhere these days, impacting our lives in some obvious and not so obvious ways. It's vital to understand how to sue these systems responsibly. We'll look a the ways AI systems work, and learn the important questions to ask about how these systems were trained and how they are used.
AE-076 : Charting Your AI Journey: A Roadmap for Supervised, Unsupervised, and Generative Learning through Machine Learning and Deep Learning
Ryan Paul Lafler, Premier Analytics Consulting, LLC
Miguel Bravo Martinez Del Valle, Premier Analytics LLC
Machine learning (ML) continues to reshape business, technology, science, and research across all industries, with its adoption enabling systems to learn from data, automate decisions, and generate insights. This paper presents a structured roadmap through three core domains of machine learning that are increasingly adopted by organizations: supervised, unsupervised, and generative learning. Along this roadmap, readers will identify key algorithms and architectures within each domain and understand the role of parameters and hyperparameters in mitigating overfitting and underfitting. The discussion includes examples of predictive modeling on labeled data using supervised algorithms, knowledge discovery from unlabeled data using unsupervised algorithms, and the extension of these capabilities through generative learning, which enables systems to extract insights and produce new content or data representations. The paper concludes by introducing three generative architectures that define the state of AI in 2025: encoder models (BERT), decoder models (LLMs), and encoder-decoder models (T5), and describes how each supports advanced AI tasks including representation learning, language generation, natural language processing (NLP), text summarization, and translation.
AE-078 : Enhancing Your SAS Viya Workflows with Python: Integrating Python's Open-Source Libraries with SAS using PROC PYTHON
Ryan Paul Lafler, Premier Analytics Consulting, LLC
Miguel Bravo Martinez Del Valle, Premier Analytics LLC
Data scientists, statistical programmers, machine learning engineers, and researchers are increasingly leveraging a growing number of open-source tools, libraries, and programming languages that can enhance and seamlessly integrate with their existing data workflows. One of these integrations, built into SAS Viya , is its pre-configured Python runtime integration, PROC PYTHON, that gives SAS programmers access to Python's open-source data science libraries for wrangling and modeling structured and unstructured data alongside the validated procedures provided in SAS. This paper demonstrates how to install and import external Python libraries into their SAS Viya sessions; generate Python scripts containing methods that can import, process, visualize, and analyze data; and execute those Python methods and scripts using SAS Viya's PYTHON procedure. By integrating the added functionalities of Python's libraries for data processing and modeling with SAS procedures, SAS programmers can enhance their existing data workflows with Python's open-source data solutions.
Analysis & Advanced Analytics
AL-015 : Conducting Survival Analysis in SAS using Medicare Claims as a Real-world data sourceJayanth Iyengar, Data Systems Consultants LLC
Applications of Survival analysis as a statistical technique extend to longitudinal studies, and other studies in health research. The SAS/STAT package contains multiple procedures for performing and running survival analysis. The most well-known of these are PROC LIFETEST and PROC PHREG. As a data source, Medicare claims are often used in Real-world evidence studies and observational research. In this paper, survival analysis and the SAS procedures for performing it will be explored, and survival analyses will be conducted using Medicare claims data sets to assess patient's prognosis amongst Medicare beneficiaries.
AL-031 : Logistic Regression Odyssey on a Small Sample Due to Rare Cancer
Brandy Sinco, University of Michigan
Jessie Dalman, University of Michigan
Tasha Hughes MD, University of Michigan
Christina Angeles MD, University of Michigan
SAS Products: SAS/STAT 9.4, Procs Power, Freq, Logistic, GenMod, BGLIMM Skill Level: Statistician, bachelors degree and above Intended Audience: Statisticians, Anyone involved in data analysis Background: Due to Leiomyosarcoma (LMS) being a rare cancer, a recurrence study had a small N. Based on prior medical experience, surgical oncologists hypothesized that recurrence would be higher among patients with sub-cutaneous LMS subtype than with cutaneous LMS. Because of a dataset with a small N and a prior hypothesis, we expected Bayesian logistic regression to be key to finding a credible interval for recurrence. Methods: First, power was calculated by using SAS Proc Power. As the bio-statistician suspected, the power to detect a 10% difference in recurrence between patients with cutaneous and sub-cutaneous LMS was <80%. The analysis began by looking at individual predictors of recurrence via logistic regression and then by using a multi-variable logistic regression. For variables with small cell sizes x recurrence, the Firth adjustment for computing maximum likelihood estimates was used. Before finalizing the multi-variable logistic regression model, multi-collinearity was evaluated with the variance inflation factor. Next, the results of the classical logistic regression model were confirmed with Bayesian logistic regression. We turned to Bayesian logistic regression because the Bayesian algorithm does not rely on asymptotic statistics from a large sample. The numbers of burn-in and Monte Carlo repetitions were selected to generate Geweke diagnostics to indicate similar means in the beginning and end of the Markov chain and for the proportion of variance due to Monte Carlo simulation to be 2.5%. The thinning parameter was chosen to obtain an auto-correlation time of <3 lags. These diagnostics were to be confirmed with stable trace, auto-correlation, and posterior density plots. Results: The original dataset contained N = 116 LMS patients. Seventy two patients had cutaneous LMS and 44 patients had sub-cutaneous LMS. In this initial dataset, 3 (4.2%) of the cutaneous patients experienced recurrence, compared to 9 (7.8%) of sub-cutaneous patients, corresponding to p = .081 with the Fisher exact test. From logistic regression, patients with sub-cutaneous extension had an odds ratio (OR) = 3.63 (0.86, 15.35); p = .080 and an adjusted odds ratio (AOR) = 4.49 (0.89, 22.74); p = 0.070 for recurrence, compared to patients with cutaneous tumor only. In the Bayesian a nalysis, we reported both of the 95% credible intervals, using the equal tailed and highest posterior density methods. Both credible intervals indicated over 95% probability of tumor sub-type being a key predictor of tumor recurrence. Tumor sub-type was the only predictor that produced credible intervals for the odds ratios with all sides above 1. The equal-tailed 95% credible interval was 5.48 (1.10, 29.74) and the highest posterior density (hpd) credible interval was 5.48 (1.06, 28.20). As the analysis progressed, a surgical oncologist determined that one patient needed to be excluded and another patient had been mis-classified with cutaneous LMS. This final sample had N = 115. While the analyst expected re-running the analyses to be simple and straight-forward, there were some surprises. First, the comparisons of recurrence became statistically significant at p<.05. The recurrence rates became 1 (1.4%) among cutaneous patients, compared to 7 (15.6%) among sub-cutaneous patients, p = 0.006 with the Fisher exact test. Second, classical logistic regression produced a p-value < .05, but with a wide confidence interval. Third, Bayesian logistic regression generated 95% credible intervals that contained odds ratios > 1, although the intervals were much wider than before. Conclusion: Both classical logistic regression and Bayesian logistic regression indicated that patients with the sub-cutaneous subtype had higher odds of recurrence, compared to patients the cutaneous sub-type. In situations where the sample size is small due to a rare disease, both the Firth option in classical logistic regression and Bayesian logistic regression are useful tools to confirm that a variable is an important predictor.
AL-037 : GIS Challenges of Cataloging Catastrophes: Serving up GeoWaffles with a Side of Hash Tables to Conquer Big Data Point-in-Polygon Determination and Supplant SAS PROC GINSIDE
Troy Hughes, Data Llama Analytics
The GINSIDE procedure represents the SAS solution for point-in-polygon determination that is, given some point on earth, does it fall inside or outside of one or more bounded regions? Natural disasters typify geospatial data the coordinates of a lightning strike, the epicenter of an earthquake, or the jagged boundary of an encroaching wildfire yet observing nature seldom yields more than latitude and longitude coordinates. Thus, when the United States Forestry Service needs to determine in what zip code a fire is burning, or when the United States Geological Survey (USGS) must ascertain the state, county, and city in which an earthquake was centered, a point-in-polygon analysis is inherently required. It determines within what boundaries (e.g., nation, state, county, federal park, tribal lands) the event occurred, and confers boundary attributes (e.g., boundary name, area, population) to that event. Geographic information systems (GIS) that process raw geospatial data can struggle with this time-consuming yet necessary analytic endeavor the attribution of points to regions. This text demonstrates the tremendous inefficiency of the GINSIDE procedure, and promotes GeoWaffles as a far faster alternative that comprises a mesh of rectangles draped over polygon boundaries. This facilitates memoization by running point-in-polygon analysis only once, after which the results are saved to a hash object for later reuse. GeoWaffles debuted in the 2013 white paper Winning the War on Terror with Waffles: Maximizing GINSIDE Efficiency for Blue Force Tracking Big Data (Hughes, 2013), and this text represents an in-memory, hash-based refactoring. All examples showcase USGS tremor data as GeoWaffles tastefully blow GINSIDE off the breakfast buffet processing coordinates more than 25 times faster than the out-of-the-box SAS solution!
AL-052 : Feature Selection and Classification Using Fuzzy Logic
Ross Bettinger, Consultant
We investigate the use of fuzzy logic as applied to feature selection and classification. Fuzzy logic, a generalization of Aristotelian logic, can be useful in situations where there is imprecision or vagueness in the problem domain. Fuzzy logic is applied to transform input data into fuzzy sets that are then suitable for processing by a feature selection algorithm. A fuzzy entropy measure is used to perform classification using a similarity classifier. SAS/IML was used to perform all computations.
AL-056 : PROC BGLIMM: The Smooth Transition to Bayesian Analysis
Danny Modlin, SAS
Many analysts are interested in taking models they currently have and transitioning them to the Bayesian realm. Most leap from their favorite classical analysis procedure directly to PROC MCMC, the general-purpose Bayesian procedure. This presentation will feature the BGLIMM procedure available since SAS/STAT 15.1. This will allow the participant to model non-normal responses and include random effects within their Bayesian approach. Discussion will include options of priors and availability of statements. Examples will include models originally written in PROCs REG, GLM, GLMSELECT, GENMOD, MIXED, and GLIMMIX.
AL-057 : How to Modify SAS9 Statistics Programs to Run in SAS Viya
Danny Modlin, SAS
How can existing SAS 9 programs can be modified to execute in SAS Viya. Code can either run as is on the SAS Compute Server, or it can be modernized to process data in memory and in parallel on the SAS Cloud Analytic Services (CAS) server. This presentation is perfect for programmers who are new to SAS Viya and want to continue performing their statistical analyses there. We will address questions that are typically asked. Will existing SAS 9 code work in Viya? How must my programs change to take advantage of the new features in Viya?
AL-058 : Large-Scale Time Series Forecasting in Model Studio
Danny Modlin, SAS
In this workshop, you learn to build time series models for large-scale time series problems with many hierarchically related series. You will experience the capability of Model Studio to diagnose, fit, and assess models for many time series at once. Use the new Hierarchical Modeling Node to create time series models at each of the levels of the hierarchy. Need to extract your reconciled predictions from each level of the hierarchy? No problem. Within the Hierarchical Modeling Node, you can dive into each level of the hierarchy and export these desired predictions.
AL-059 : Kernel Bandwidth Selection for Maximum Mean Discrepancy
Hangcen Zou, Washington University in St. Louis
Distributional shifts between training and testing data can severely affect the performance of machine learning models, making their detection a critical task. The kernel two-sample test based on maximum mean discrepancy (MMD) is a widely adopted approach for this purpose. However, its effectiveness depends heavily on the choice of kernel bandwidth. This project investigates the influence of bandwidth on MMD performance in streaming data contexts and offers practical guidance for efficient bandwidth selection. Through extensive simulations, we assess the robustness and sensitivity of both the standard MMD and a Mahalanobis-aggregated variant under a variety of data-generating conditions. Multiple bandwidth selection strategies are evaluated and compared to inform best practices in real-time detection tasks.
AL-079 : Modern Deep Learning Architectures and Transfer Learning for Tabular, Unstructured, Sequential, and Time-Dependent Data
Ryan Paul Lafler, Premier Analytics Consulting, LLC
Deep learning (DL) offers powerful architectures for extracting patterns and representations from diverse data types, including tabular datasets, images, text, audio, and time-series. This presentation provides a high-level survey of modern deep learning architectures developed with Python's Keras API for TensorFlow, including Artificial Neural Networks (ANNs) for structured data, Convolutional Neural Networks (CNNs) for image and spatial data, and sequence modeling architectures such as recurrent neural networks (RNNs), gated recurrent units (GRUs), long short-term memory networks (LSTMs), and transformer-based models. The focus is on how these architectures address different problem domains and data modalities, with minimal coverage of foundational theory. A key part of this discussion highlights transfer learning as a strategy for leveraging pre-trained models and fine-tuning them for new tasks, enabling faster development and improved performance across applications. By combining the right architecture with transfer learning techniques, AI engineers can accelerate solutions for classification, regression, forecasting, and generative tasks.
Anything Data
AD-004 : Data Literacy 101: Understanding Data and the Extraction of InsightsKirk Paul Lafler, SasNerd
Data is ubiquitous and growing at extraordinary rates, so a solid foundation with data essentials is needed. Topics include the fundamentals of data literacy, how to derive insights from data, and how data can help with decision-making tasks. Attendees learn about the types of data - nominal, ordinal, interval, and ratio; how to assess the quality of data; explore data using visualization techniques; and use selected statistical methods to describe and analyze data.
AD-019 : SAS Macro Using Hash Object for Lookup
Ricky Norman, Self-Employed
This is a presentation of a collection of SAS Macros I developed and have been using for over 15 years. It includes an example of using SAS Hash Object code to look up values. Years ago, I was very excited to see SAS Institute continue to provide innovative solutions and provide an incredibly powerful and versatile alternative to the array! With the availability of several Gigabytes of memory, SAS can load up to a billion records into memory for fast and efficient look-up of data like VLOOKUP in Microsoft Excel but greatly exceeding its one million row limits.
AD-020 : SAS Macro Using Hash Objects for Summarization
Ricky Norman, Self-Employed
This is a presentation of the use of SAS Hash Objects I developed and have been using for over 15 years. Years ago, I was very excited to see SAS Institute continue to provide innovative solutions, providing an incredibly powerful and versatile alternative to the array! SQL provides a summary output one query at a time, but with SAS Data Step, the number that can be output simultaneously is virtually unlimited.
AD-021 : SAS Hash Objects Flattening a Company's Employee Reporting Structure
Ricky Norman, Self-Employed
This is a presentation of SAS Hash Objects I have been using for over 15 years. Years ago, I was very excited to see SAS Institute continue to provide innovative solutions, providing an incredibly powerful and versatile alternative to the array! The purpose of this SAS code is to help with the understanding and visibility of a company's employee reporting structure. I also show a comparison to results provided by SQL code.
AD-026 : Building Better Data Science Workflows: Core Practices with Git, GitHub, and Data Version Control (DVC) for Effective Collaboration
Ryan Paul Lafler, Premier Analytics Consulting, LLC
Miguel Bravo Martinez Del Valle, Premier Analytics LLC
Supercharge your data science workflow with Git, GitHub, and Data Version Control (DVC)! This practical session dives into essential version control tools every data team should master featuring hands-on tips, real-world examples, and integration strategies to efficiently track changes to both code and data. Discover how Git enables clean branching, purposeful commits, and streamlined collaboration. Push those local commits to GitHub to unlock team-based workflows with pull requests, protected branches, and remote repository management. DVC then extends Git by tracking large datasets and machine learning (ML) models stored in local systems or external servers and cloud storage providers without bloating your Git repository. From making meaningful and informative commits to safely stashing changes and managing parallel branches, this session delivers actionable tips, tricks, and techniques to help your team version smarter, work in parallel, reduce merge conflicts, and collaborate more effectively across the stack.
AD-027 : Making a Readable PROC COMPARE Report in Excel
Jane Eslinger, Eslinger Enterprises
When using PROC COMPARE to examine two data sets for differences, the default output is a verbose and segmented report optimized for the ODS Listing destination. When sent to Excel, the report, with one variable holding all the information, becomes hard to comprehend. This paper demonstrates how to capture and manipulate key ODS OUTPUT data sets generated by PROC COMPARE. Then using those data sets with PROC REPORT to create a cleaner, more readable report in Excel perfect for review, documentation, or delivery.
AD-028 : Highlighting the Differences: PROC COMPARE in Excel
Jane Eslinger, Eslinger Enterprises
PROC COMPARE is great for examining the differences across two data sets but the default output doesn't always paint the full picture. Though the printed report is informational, its wordy presentation does not facilitate the identification of patterns in the differences within and across variables. This paper shows how to turn raw comparison data from the OUT= options in PROC COMPARE into a polished spreadsheet that uses color strategically to highlight the details and make variable-level differences easy for the reviewer to spot.
AD-035 : Make You Holla' Tikka Masala: Creating User-Defined Informats Using the PROC FORMAT OTHER Option To Call User-Defined FCMP Functions That Facilitate Data Ingestion Data Quality
Troy Hughes, Data Llama Analytics
You can't just let any old thing inside your tikka masala you need to carefully curate the ingredients of this savory, salty, sometimes spicy delicacy! Thus, when reviewing a data set that contains potential tikka masala ingredients, an initial data quality evaluation should differentiate approved from unapproved ingredients. Cumin, yes please; chicken, the more meat the merrier; coriander, of course; turmeric, naturally; yeast, are you out of your naan-loving mind?! Too often, SAS practitioners first ingest a data set in one DATA step, and rely on subsequent DATA steps to clean, standardize, and format those data. This text demonstrates how user-defined informats can be designed to ingest, validate, clean, and standardize data in a single DATA step. Moreover, it demonstrates how the FORMAT procedure can leverage the OTHER option to create a user-defined informat that calls a user-defined FCMP function to perform complex data evaluation and transformation. Control what you put inside your tikka masala with this straightforward solution that epitomizes data-driven software design while leveraging the flexibility of the FORMAT and FCMP procedures!
AD-047 : Automated double data entry comparisons
Melinda Macdougall, Cincinnati Children's Hospital Medical Center
Clinical research studies largely collect data electronically these days. However, there are still many cases where data is originally collected on paper. This data is then entered into an electronic source to be available for analysis. The data should be entered by two different people and then compared for data quality control. PROC COMPARE is great for initial comparisons. However, options are limited and it can be difficult to create easy to read reports. In this presentation, I will show examples of how to create custom, automated reports for data entry comparisons in SAS.
AD-071 : Dating for SAS Programmers
Josh Horstman, PharmaStat LLC
Every SAS programmer needs to know how to get a date... no, not that kind of date. This paper will cover the fundamentals of working with SAS date values, time values, and date/time values. Topics will include constructing date and time values from their individual pieces, extracting their constituent elements, and converting between various types of dates. We'll also explore the extensive library of built-in SAS functions, formats, and informats for working with dates and times using in-depth examples. Finally, you'll learn how to answer that age-old question... when is Easter next year?
AD-072 : So You Want To Be An Independent Consultant: 2025 Edition
Josh Horstman, PharmaStat LLC
While many statisticians and programmers are content with a traditional employment setting, others yearn for the freedom and flexibility that come with being an independent consultant. While this can be a tremendous benefit, there are many details to consider. This paper will provide an overview of consulting as a statistician or programmer. We'll discuss the advantages and disadvantages of consulting, getting started, finding work, operating your business, and various legal, financial, and logistical issues. The paper has been recently updated to reflect the new realities of independent consulting in 2025 and beyond.
AD-077 : A Python Roadmap for Accessing & Leveraging Big Environmental Data Repositories in the Cloud
Ryan Paul Lafler, Premier Analytics Consulting, LLC
The democratization, growth, and widespread adoption of the Python programming language and its open-source libraries are empowering professionals, researchers, educators, and students to work directly with large environmental data repositories hosted in the cloud. Paired with Open Data initiatives supported by major cloud providers like Amazon, Microsoft, and Google, this shift is making environmental data more accessible than ever, with applications across industries that are more useful than ever. This demo highlights how environmental data can support a wide range of industries and introduces the main file types and data structures commonly used in these datasets. It walks through how any user can initialize a Python connection with a cloud object storage provider, build a data retrieval pipeline, and lazily load terabytes of environmental data directly from Amazon S3, Google Cloud Storage (GCS), and Microsoft Azure into a Python session using libraries including Dask, fsspec, and Xarray.
AD-080 : PROC COMPARE: More Than Just the Bottom Line
Josh Horstman, PharmaStat LLC
John LaBore, SAS Institute
Users of the PROC COMPARE procedure usually are satisfied with finding the following NOTE at the end of their output "NOTE: No unequal values were found. All values compared are exactly equal." However, this bottom line needs to be fully understood. In four specific cases, users may not realize the conditions/assumptions being made by PROC COMPARE, which could possibly lead to discrepancies turning up. This presentation will describe and provide example code/log/output that highlights each of these four cases. In these cases, adding only one word to the PROC COMPARE statement can help ensure that users safely navigate through cases involving these conditions/assumptions.
AD-085 : Multivariate Ratio Edits Based on Parametric and Nonparametric Tolerance Intervals
Daniel Tuyisenge, University of Kentucky
Derek Young, University of Kentucky
Ensuring data quality is essential for producing accurate and reliable federal statistics. A common approach is to use ratio edits, which are widely employed in economic and establishment surveys to identify records with implausible relationships between variables. In multivariate settings, traditional methods often rely on Mahalanobis distance; however, this approach can perform poorly in high dimensions and does not provide interpretable bounds for each variable. To address these limitations, this work presents a framework for multivariate ratio edits based on parametric and nonparametric tolerance intervals (TIs), which ensure a specified proportion of inliers with a given confidence level. Parametric methods construct rectangular and simultaneous multivariate TIs under the multivariate normal model, using trimming to reduce the influence of outliers. In contrast, nonparametric methods employ Tukey depth and Statistically Equivalent Blocks to obtain distribution-free tolerance regions. Monte Carlo simulations are conducted to evaluate Type I/II error rates and volume efficiency across contamination regimes, dimensions, and TI parameters. Results indicate that rectangular central TIs excel under mild contamination, while simultaneous central TIs improve robustness under heavier contamination. Overall, these TI-based edits are interpretable, reproducible, computationally efficient, and readily integrated into federal data processing pipelines, thereby enhancing quality assurance.
Banking and Finance
BF-051 : Analysis of Economic Turbulence and Disruptive Events in FinanceDavid Corliss, Peace-Work
2025 has been a year of considerable economic disruption, resulting in widespread financial uncertainty. This paper presents methods for analyzing these events. Drawing on the mathematics of Chaos Theory, the paper describes the SAS procedures and options suited to predicting the results of rare, extreme events. Methods include Unobserved Compoments models, logistic maps, and data visualizations to describe, understand, and predict to the degree possible to the outcomes of disruptive events in economics and finance.
BF-083 : Follow the Blueprint for Success when Migrating Banking and Finance Customers to SAS Viya
Rex Pruitt, SAS
Migrating financial services organizations from legacy SAS 9.4 environments or open-source platforms like Python/R to SAS Viya requires a strategic, well-orchestrated approach. This session presents a proven blueprint for success, highlighting key considerations, tools, and best practices to ensure a smooth transition. Attendees will learn how to assess and migrate SAS Enterprise Guide projects into SAS Studio Flows, adapt open-source scripts for Viya compatibility, and leverage SAS tools such as Content Assessment and Enterprise Session Monitor to streamline migration efforts. The presentation outlines a step-by-step checklist for both SAS and open-source migrations, emphasizing environment preparation, code adaptation, validation, and operationalization. Additionally, it explores common pitfalls such as misaligned usage education and inadequate partner support and how to avoid them. With a focus on enabling full-scope capability adoption, this session equips pre-sales and technical teams with the insights needed to deliver value quickly and confidently in Viya. Whether you're migrating EG projects or SageMaker pipelines, this blueprint ensures your path to success is clear, efficient, and scalable.
BF-087 : Valuable Lessons Learned from Migrating Banking and Finance Use Cases to Viya
Rex Pruitt, SAS
Migrating banking and finance analytics workloads to SAS Viya introduces a range of technical and operational challenges for SAS Administrators and SAS Users. This paper highlights key lessons learned from real-world migrations, emphasizing the importance of early asset cleanup, infrastructure sizing, and security alignment for administrators. For users, challenges such as legacy code incompatibility, workflow confusion, and training gaps underscore the need for clear documentation, parallel testing, and role-based enablement. By addressing these pain points proactively, organizations can streamline their transition to Viya and unlock its full potential for scalable, modern analytics
Beyond the Basics
BB-002 : SAS Macro Programming: The Basics and BeyondKirk Paul Lafler, SasNerd
The SAS Macro Language is a powerful feature for extending the capabilities of the SAS System. This paper highlights a collection of techniques for constructing reusable and effective macros tools. Attendees are introduced to the techniques associated with building functional macros that process statements containing SAS code; design reusable macro techniques; create macros containing keyword and positional parameters; utilize defensive programming tactics and techniques; build a library of macro utilities; interface the macro language with the SQL procedure; and develop efficient and portable macro language code.
BB-003 : Under the Hood: The Mechanics of SQL Query Optimization Techniques
Kirk Paul Lafler, SasNerd
The SAS software and SQL procedure provide powerful features and options for users to gain a better understanding of what's taking place during query processing. This presentation explores the fully supported SAS MSGLEVEL=I system option and PROC SQL _METHOD option to display valuable informational messages on the SAS Log about the SQL optimizer's execution plan as it relates to processing SQL queries; along with an assortment of query optimization techniques.
BB-008 : Creating Custom Excel Spreadsheets with Built-in Autofilters Using PROC REPORT and ODS EXCEL
Kirk Paul Lafler, SasNerd
Spreadsheets have become the most popular and successful data tool ever conceived. Current estimates show that there are more than 750 million Excel users worldwide. A spreadsheet's simplicity and ease of use are two reasons for the growth and widespread use of Excel around the globe. Additional value-added features have also helped to expand the spreadsheet usefulness among a growing number of users including its collaborative capabilities, being customizable, ability to manipulate data, application of data visualization techniques, mobile device usage, automation of repetitive tasks, integration with other software, data analysis, and filtering capabilities using autofilters. This last value-added feature, filtering with autofilters, is the theme for this paper. An example application will be illustrated that creates a custom Excel spreadsheet with built-in autofilters, or filters that provide users with the ability to make choices from a list of text, numeric, or date values to find data of interest quickly, using the SAS Output Delivery System (ODS) Excel destination and the REPORT procedure.
BB-009 : You've Got Options: Ten Five-Star System Option Hacks
Louise Hadden, Cormac Corporation
SAS provides myriad opportunities for customizing programs and processes, including a wide variety of system options that can control and enhance SAS code from start to finish. This paper and presentation demonstrates methods of obtaining information on SAS system options and moves on to fully explicate ten SAS system option hacks, from COMPRESS to VALIDVARNAME. System options are highly dependent on platforms, security concerns, SAS versions and products: dependencies and defaults will be discussed. SAS practitioners will gain a deeper understanding of the powerful SAS system options they have seen, used, and automatically included in their code. This presentation is suitable for all skill and experience levels; platform and implementation differences are part of the discussion.
BB-016 : Validate the Code, not just the Data : A System for SAS program evaluation
Jayanth Iyengar, Data Systems Consultants LLC
Regardless of the industry they work in, SAS programmers are focused on validating data, and devote a considerable amount of attention to the quality of data, whether its raw source data, submitted SAS data sets, or SAS output, including figures and listings. No less important is the validity of code and the SAS programs which extract, manipulate, and analyze data. Although code validity can be assessed through the SAS log, there other ways to produce metrics on code validity. This paper introduces a system for SAS program validation which produces useful information on lines of code, number of data steps, total run and CPU time and other metrics for project-related SAS programs.
BB-023 : From %let To %local; Methods, Use, And Scope Of Macro Variables In SAS Programming
Jayanth Iyengar, Data Systems Consultants LLC
Macro variables are one of the powerful capabilities of the SAS system. Utilizing them makes your SAS code more dynamic. There are multiple ways to define and reference macro variables in your SAS code; from %LET and CALL SYMPUT to PROC SQL INTO. There are also several kinds of macro variables, in terms of scope and other ways. Not every SAS programmer is knowledgeable about the nuances of macro variables. In this paper, I explore the methods for defining and using macro variables. I also discuss the nuances of macro variable scope, and the kinds of macro variables from user-defined to automatic.
BB-024 : Taking the Mystery Out of and Debugging PROC HTTP
Kim Wilson, SAS
Several great papers have been written about how to get started with PROC HTTP, which includes accessing Microsoft 365 applications, modifying various options for desired results, and more. As a SAS Technical Support Engineer, I often assist SAS customers who are not receiving the expected resource, or they are seeing a return code that is not a 200 OK. This paper describes common errors that you might encounter regarding certificates, authentication, and general errors, as well as overall debugging techniques and suggestions. This paper also helps you gather pertinent information that SAS Technical Support will need when helping to solve the problems occurring with or around PROC HTTP. This is applicable for SAS 9.4 and above.
BB-029 : Efficiency Techniques in SAS 9
Stephen Sloan, Dawson D R
Using space and time efficiently has always been important to organizations and to programmers in general and to SAS programmers in particular. We want to be able to use our available space without having to obtain new servers or other hardware resources, and without having to delete variables or observations to make the SAS data sets fit into the available space. We also want our jobs to run more quickly both to reduce waiting times and also to ensure that scheduled job streams finish on time and that successor jobs are not unnecessarily delayed. Internal mainframe billing algorithms have always rewarded efficiency. As we move toward cloud computing efficiency will become even more important because the billing algorithms in cloud environments charge for every byte and every CPU second, putting an additional financial premium on efficiency. Sometimes we are in a hurry to get our jobs done on time, so we don't initially pay attention to efficiency, sometimes we don't know at the start of a project how much time and space our jobs will use (and the important time is the time allocated to our assignment), and sometimes we're asked to go into existing jobs and make changes that are seemingly incremental but can cause large increases in the amount of space and/or time that is required. Finally, there can be jobs that have been running for a long time and we take the attitude "if it ain't broke, don't fix it" because we don't want to cause the programs to stop working, especially if they're not well-documented. With a reasonably good knowledge of SAS Base, there are things we can do that can help our organizations optimize the use of space and time and run more quickly without causing any loss of observations or variables and without changing the results of the programs.
BB-038 : Undo SAS Fetters with Getters and Setters: Supplanting Macro Variables with More Flexible, Robust PROC FCMP User-Defined Functions That Perform In-Memory Lookup and Initialization Operations
Troy Hughes, Data Llama Analytics
Getters and setters are common in some object-oriented programming (OOP) languages such as C++ and Java, where "getter" functions retrieve values and "setter" functions initialize (or modify) variables. In Java, for example, getters and setters are constructed as conduits to private classes, and facilitate data encapsulation by restricting variable access. Conversely, the SAS language lacks classes, so SAS global macro variables are typically utilized to maintain and access data across multiple DATA steps and procedures. Unlike an OOP program that can categorize variables across multiple user-defined classes, however, SAS maintains only one global symbol table in which global macro variables can be maintained. Additionally, maintaining and accessing macro variables can be difficult when quotation marks, ampersands, percentage signs, and other special characters exist in the data. This text introduces user-defined getter functions and setter subroutines designed using the FCMP procedure, which enable data lookup and initialization operations to be performed within DATA steps. Among other benefits, user-defined getters and setters can facilitate the evaluation of complex Boolean logic expressions that leverage data stored across multiple data sets all concisely performed in a single SAS statement! Getters and setters are thoroughly demonstrated in the author's text: PROC FCMP User-Defined Functions: An Introduction to the SAS Function Compiler. (Hughes, 2024)
BB-039 : SAS Data-Driven Software Design: How to Develop More Modular, Maintainable, Fixable, Flexible, Configurable, Compatible, Reusable, Readable Software through Independent Control Tables and Other Control Data
Troy Hughes, Data Llama Analytics
Data-driven design describes software design in which the control logic, program flow, business rules, data models, data mappings, and other dynamic and configurable elements are abstracted to control data that are interpreted by (rather than contained within) code. Thus, data-driven design leverages parameterization and external data structures (including configuration files, control tables, decision tables, data dictionaries, business rules repositories, and other control files) to produce dynamic software functionality. This hands-on workshop introduces real-world scenarios in which the flexibility, configurability, reusability, and maintainability of SAS software are improved through data-driven design methods, as introduced in the author's 2022 book: SAS Data-Driven Development: From Abstract Design to Dynamic Functionality, Second Edition. Scenarios install the attendee as the newest intern at Scranton, Pennsylvania's, most infamous paper supply company, tasked with converting legacy, hardcoded SAS programs into flexible, configurable data-driven design. Come help Jim, Dwight, Stanley, and Phyllis sell more paper--and learn data-driven best practices in the process!
BB-042 : Going from PROC SQL to PROC FedSQL for CAS Processing: Common mistakes to avoid
Vijayasarathy Govindarajan, SAS Institute
SAS 9 customers are increasingly looking at moving to SAS Viya to harness the power of the new distributed, in-memory, Cloud Analytic Services (CAS) engine. This often helps to speed up existing processes many times over and run analytics on huge datasets faster. One of the key areas of this migration involves updating SAS 9 PROC SQL code to take advantage of the processing capabilities of CAS. This is made possible by a new(er) procedure in the SAS arsenal: PROC FedSQL. There are many differences between PROC SQL and PROC FedSQL for CAS, from supported data types, available functions, applying formats, quoting strings, referencing macro variables etc. In my experience, users new to SAS Viya often make mistakes while migrating code to FedSQL which arise from a few basic misconceptions. This paper aims to clarify the key differences between PROC SQL and PROC FedSQL for CAS. It will also highlight common mistakes when adapting SQL code for CAS, offering guidance on how to avoid them. The goal is to help users leverage the power of CAS effectively without getting bogged down by a lengthy process of fixing small, easily preventable errors when converting code to FedSQL.
BB-043 : Accelerating Your SAS Data Step: Tips and Best Practices for SAS Viya Migration
Vijayasarathy Govindarajan, SAS Institute
The SAS Viya platform provides cloud-enabled, in-memory, and parallel processing capabilities for advanced analytics. While a "lift and shift" approach allows SAS 9 Data Step code to run in SAS Viya, realizing significant performance improvements requires targeted code adjustments. For a successful migration, SAS programmers need to understand the SAS Viya architecture, its similarities and differences with SAS 9, and effective conversion guidelines. This presentation offers practical tips for migrating Data Step code, including optimizing in-memory processing, handling BY-group processing, addressing function and format nuances, incorporating open-source language support, and utilizing performance options. These insights aim to help programmers leverage the full potential of SAS Viya while avoiding common migration challenges. This presentation hopes to equip attendees with actionable insights to transition their SAS 9 Data Step code and ensuring they harness Viya's advanced capabilities to achieve optimal performance outcomes.
BB-044 : SAS Macros and PROC FCMP: A Comparative Inquiry into Reusability and Logic Design
Vijayasarathy Govindarajan, SAS Institute
SAS provides two powerful tools for enhancing code reuse and maintainability: the macro language (%MACRO) for code generation, and PROC FCMP for encapsulating reusable logic. While they may appear to offer overlapping functionality at first glance, they differ fundamentally in both purpose and execution timing. This session presents a comparative analysis of %MACRO and PROC FCMP, focusing on their distinct roles in modular program design. A clear understanding of these differences helps avoid the common misuse of macros for logic that is more appropriately handled by FCMP functions. The session will review typical approaches to creating and using macros, including function-style macros, and demonstrate how PROC FCMP functions and subroutines are defined and executed. Examples will illustrate differences in execution timing and clarify misconceptions about functional overlap. The presentation aims to provide a solid understanding of when and how to use SAS Macros and PROC FCMP, leading to cleaner, more modular, and maintainable SAS programs.
BB-049 : Use ODS Excel, ODS PDF, ODS HTML5, ODS LAYOUT
LeRoy Bessler, Bessler Consulting and Research
The three most frequently used ODS "destinations" are ODS Excel, ODS PDF, and ODS HTML5 (current successor to, and better than, ODS HTML). The fourth most frequently used Should Be / Could Be ODS LAYOUT (within ODS HTML5 or ODS PDF). With it, Anything Anywhere All At Once is possible for composites of tables, graphs, and text. Get acquainted with and get started with any destination you don't already know, or learn some tips to get more value from what you do already know. This is a tour and an introduction.
BB-050 : Cutting Edge Regression Methods: Ridge, LASSO, LOESS, and GAM
David Corliss, Peace-Work
This paper presents a brief introduction to recent advances in regression methods. Techniques demonstrated include ridge regression, LASSO, local polynomial regression (LOESS), and generalized additive models (GAM). Each method is presented separately, with a description of the SAS procedure used to implement them and recommendations for apply the methods in practical situations. A quick introduction to each method followed by two worked examples, with discussion of use cases, and options for SAS procedures and producing graphical output.
BB-063 : What is Machine Learning, Anyway
Jim Box, SAS Institute
Machine Learning models are the backbones of Artificial Intelligence systems, and as such are being talked about everywhere, but do you know how they work? In this session, we'll look at the process of using machine learning models, cover some of the terminology, and discuss how they are different than the standard statistical models you might be more familiar with .
BB-065 : Enhance your Coding Experience with the SAS Extension for VS Code
Jim Box, SAS Institute
Visual Studio Code (VS Code) from Microsoft is an open-source code editor that is very popular among developers for its ease of use across all programming languages which is driven by a robust extension ecosystem. The SAS VS Code extension is an open-source, freely available add-on that allows you to use VS Code to connect to any modern SAS Environment, from SAS 9.4 on your local machine to SAS Viya in the cloud. The key features include Syntax Highlighting, Code Completion, Syntax Help, Data Viewer, and my favorite, SAS Notebooks, which offer an exciting way to share content and comments. We'll look at the extension, how to use it, and explore how you can get involved with the direction of how this product evolves.
BB-066 : How did that Python code get in my SAS program?
Jim Box, SAS Institute
Did you know that you can execute Python code inside a SAS Program? With the SAS Viya Platform, you can call PROC PYTHON and pass variables and datasets easily between a Python call and a SAS program. In this paper, we will look at ways to integrate Python in your SAS Programs.
BB-067 : From Muggles to Macros: Transfiguring Your SAS Programs with Dynamic, Data-Driven Wizardry
Josh Horstman, PharmaStat LLC
Richann Watson, DataRich Consulting
The SAS macro facility is an amazing tool for creating dynamic, flexible, reusable programs that can automatically adapt to change. This presentation uses a series of examples to demonstrate how to transform static "muggle" code full of hardcodes and data dependencies by adding macro language magic to create data-driven programming logic. Don't hardcode data values into your programs. Cast a vanishing spell on data dependencies and let the macro facility write your SAS code for you!
BB-068 : More Muggles, More Macros: Adding Advanced Data-Driven Wizardry to Your SAS Programs
Josh Horstman, PharmaStat LLC
Richann Watson, DataRich Consulting
In their popular 2024 presentation "From Muggles to Macros", Horstman and Watson delivered a spell book full of macro magic to enhance SAS programs with data-driven wizardry. This exciting sequel to that enchanting performance adds to the list of incantations for creating dynamic, flexible, reusable programs that can automatically adapt to change. New charms include the use of control tables, the CALL EXECUTE routine, and of course, more macro language techniques. Don't hardcode data values into your programs. Cast a vanishing spell on data dependencies and let the macro facility write your SAS code for you!
BB-070 : Fifteen Functions to Supercharge Your SAS Code
Josh Horstman, PharmaStat LLC
The number of functions included in SAS software has exploded in recent versions, but many of the most amazing and useful functions remain relatively unknown. This paper will discuss such functions and provide examples of their use. Both new and experienced SAS programmers should find something new to add to their toolboxes.
BB-073 : Enhancing Your PROC REPORT Output: Top Tips
Brian Knepple, J & J MedTech
To get the best results with SAS PROC REPORT, start by choosing the variables you want to display and decide how to organize your data through grouping or sorting. Use the "DEFINE" option to assign labels, formats, and calculations for columns. Apply "BREAK" and "RBREAK" to add summaries and totals for better clarity. Use "COMPUTE" to perform custom calculations or to highlight specific data points. Additionally, the "STYLE" option can help you enhance the appearance of your report. By combining these features, you can create clear, visually appealing reports that effectively communicate your data.
BB-074 : Leveraging SQL and SAS for Analysis-Ready Datasets
Shavonne Standifer, Statistical Business Analyst
Data professionals often use a combination of various technologies. Effective data management is essential for ensuring high-quality analysis and decision-making. SQL is a powerful language for querying and manipulating relational databases, while SAS offers a suite of advanced tools for data preparation, statistical analysis, and reporting. By integrating these technologies, organizations can improve data management, enhance data integrity, and foster collaboration across teams. This paper provides a general guide to utilizing SQL and SAS programming to efficiently create, manage, and maintain analysis-ready datasets. Through step-by-step instructions and real-world examples, readers will gain the skills needed to harness the power of structured query language (SQL) and SAS for streamlining data processes within their organizations.
BB-075 : Automate in a Dash with SAS Time-Saving Techniques for Building Quality Improvement Dashboards
Shavonne Standifer, Statistical Business Analyst
Building programs that leverage the analytic and reporting powers of SAS reduce the time to solution for critical tasks. This paper discusses, through example, how Base SAS tools, such as FILENAME, MACROS, and ODS combined with the "built in scheduler" housed in SAS Enterprise Guide can be used to automate the process between raw data to dashboard view quickly and efficiently. This paper is divided into two main parts. In the first part, we discuss how to use scripting language to bring data from a file location into the SAS environment, how to build programs that clean and subset the data, and how to transform the process with a MACRO. In the conclusion, we will discuss how to use SAS procedures and the ODS to transform the resulting data into a quality improvement dashboard view that can be set to automatically run and send to team members at a scheduled time.
Pharma and Life Sciences
PL-014 : Applications of PROC COMPARE to Parallel Programming and other projectsJayanth Iyengar, Data Systems Consultants LLC
PROC COMPARE is a valuable BASE SAS procedure which is used heavily in the Pharma industry and other areas. By default, the capability of PROC COMPARE is to reconcile two data sets to determine if they have equivalent sets of records and sets of variables. In the clinical field and elsewhere, PROC COMPARE is often used to validate data sets in projects which involve parallel programming, where programmers independently perform the same tasks. In this paper, I will discuss the role PROC COMPARE plays in different SAS tasks, including DATA STEP merges, parallel programming, generation data sets, and more.
PL-017 : Have a Date with ISO ? Using PROC FCMP to Convert Dates to ISO 8601
Richann Watson, DataRich Consulting
Programmers frequently have to deal with dates and date formats. At times, determining whether a date is in a day-month or month-day format can leave us confounded. Clinical Data Interchange Standards Consortium (CDISC) has implemented the use of the International Organization for Standardization (ISO) format, ISO 8601, for datetimes in SDTM domains, to alleviate the confusion. However, converting "datetimes" from the raw data source to the ISO 8601 format is no picnic. While SAS has many different functions and CALL subroutines, there is not a single magic function to take raw datetimes and convert them to ISO 8601. Fortunately, SAS allows us to create our own custom functions and subroutines. This paper illustrates the process of building a custom function with custom subroutines that takes raw datetimes in various states of completeness and converts them to the proper ISO 8601 format.
PL-018 : Worried about that Second Date with ISO ? Using PROC FCMP to Convert and Impute ISO 8601 Dates to Numeric Dates
Richann Watson, DataRich Consulting
Within the life sciences, programmers often find themselves doing a lot of dating matching, converting between character and numeric values, and imputing missing components. Clinical Data Interchange Standards Consortium (CDISC) Study Data Tabulation Model (SDTM) domains have implemented the use of the International Organization for Standardization (ISO) format, ISO 8601, for datetimes. These dates are stored as character strings with missing components denoted using a single hyphen. Although this format helps to standardize how dates and times are captured so that there is no confusion as to what the date represents, it leaves a longing for something more compatible for analysis purposes: determining durations and number of days from a reference point. The conversion of ISO dates to a numeric format requires a serious commitment, especially when partial ISO dates require imputations. Although SAS offers a variety of built-in formats and functions that can appease both sides on a date, i.e., converting complete ISO dates to numeric values or numeric dates to ISO dates, there is no SAS-provided function that will help with the required conversion and imputation of a partial date. Fortunately, with the use of the FCMP procedure, we can create our own custom functions to help achieve our desired goal. This paper illustrates the process of building a custom function that will take a date that is captured in the appropriate ISO format in SDTM (--DTC) and convert that date to a numeric format while also giving partial dates the extra attention to impute missing components. Additionally, this custom function sets the correct date imputation variable (--DTF), so you always know just how much of a blind date your derived value really is.
PL-022 : Mind the Gaps: Automating Multiple Imputation in Clinical Trial Workflows
Crisa Chen, Eli Lilly and Company
Jiangang Cai, Eli Lilly and Company
Handling missing data is a persistent challenge in clinical trials, particularly when it occurs across multiple endpoints, timepoints, or treatment periods. Beyond the standard assumption of Missing At Random (MAR), real-world trial data often include sporadically missing values, structurally missing patterns, or missingness driven by intercurrent events (ICE), such as treatment discontinuation, rescue medication use, or major protocol deviations. These situations require careful pre-imputation processing to avoid biased estimates or invalid imputations. Without a systematic and scalable approach, implementing Multiple Imputation (MI) across diverse datasets can become error-prone, labor-intensive, and difficult to reproduce. To address these challenges, we developed a modular and parameter-driven SAS macro pipeline that automates the MI process within the ADaM data workflow. This solution standardizes the handling of intercurrent events, supports flexible imputation strategies, and streamlines the generation of analysis-ready outputs. By embedding MI into a structured pipeline, the approach enhances consistency, scalability, and reproducibility across studies, while reducing manual effort and programming variability.
PL-032 : Last Observation Carried Forward (LOCF) in Longitudinal Clinical Studies: Adopting a Functional Approach to Imputing Missing Values Using PROC FCMP, the SAS Function Compiler
Troy Hughes, Data Llama Analytics
Last observation carried forward (LOCF) is a ubiquitous method of imputing missing values in longitudinal studies, and is commonly implemented when a subject (i.e., a patient or participant) misses a scheduled visit, and data cannot be collected (or generated). In general, the last "valid" value from a previous visit is retained for the later visit on which the data could not be obtained, and this conservative estimation succeeds in cases where the actual value would have been little changed. Nuanced criteria may stipulate which prior values count as "valid" (e.g., after the start of treatment) as well as for how long (e.g., how many days, visit weeks, consecutive missed visits) a value can be used to impute other values. Given these complexities, LOCF solutions implemented in SAS historically adopt a procedural approach, and often require multiple DATA steps and/or procedures to impute data both across observations and within subjects. Conceptually, however, a functional approach can be envisioned in which LOCF could be calculated using a function call that is, delivering the same functionality through a single line of code while hiding (abstracting) the complexity of the calculation inside the function's definition. The FCMP procedure can deliver this functionality, enabling SAS practitioners to build user-defined functions even those that perform inter-observation calculations and this text demonstrates a user-defined subroutine that dynamically calculates LOCF while relying on CDISC and ADaM standards, data structures, and nomenclature. The software design concepts herein are adapted from the renowned SAS Press book: PROC FCMP User-Defined Functions: An Introduction to the SAS Function Compiler. (Hughes, 2024)
PL-033 : Geocoding with the Google Maps API: Using PROC FCMP To Call User-Defined SAS and Python Functions That Geocode Coordinates into Addresses, Calculate Routes, and More!
Troy Hughes, Data Llama Analytics
Software interoperability describes the ability of software systems, components, and languages to communicate effectively with each other, and must be prioritized in today's multilingual and open-source development environments. PROC FCMP, the SAS Function Compiler, enables Python functions to be wrapped in (and called from) SAS user-defined functions (and subroutines), and the full panoply of SAS environments supports FCMP including SAS Display Manager, SAS Enterprise Guide, SAS Studio, SAS Viya, and the latest Cary show pony, SAS Workbench. Productivity and the pace of development are maximized when existing open-source code such as Python user-defined functions can be run natively from a Python interactive development environment (IDE) rather than having to be needlessly recoded into the Base SAS language. This text demonstrates SAS and Python user-defined functions that collaboratively call the Google Maps Platform APIs to geocode street addresses into latitude/longitude coordinates, and to calculate driving distances between locations. The scenarios in this text demonstrate the application of the Google Maps Platform for clinical trials research, and the technical concepts are adapted from Chapter 8 of the renowned SAS Press book: PROC FCMP User-Defined Functions: An Introduction to the SAS Function Compiler. (HUGHES, 2024)
PL-046 : COVID-19 Explored Using SAS and ODS Graphics: InfoGeographic and Data Graphic Analysis & Pictures
LeRoy Bessler, Bessler Consulting and Research
See the COVID-19 data for 2020 through 2023 analyzed and displayed in ways nowhere else available. Where was it? How severe? What might have contributed to it? Did hot spots cause it to proliferate? Come and see what the data and its visualization reveals.
PL-069 : Jazz Up Your Profile: Perfect Patient Profiles in SAS using ODS Statistical Graphics
Josh Horstman, PharmaStat LLC
Richann Watson, DataRich Consulting
Patient profiles are often used to monitor the conduct of a clinical trial, detect safety signals, identify data entry errors, and catch protocol deviations. Each profile combines key data collected regarding a single subject everything from dosing to adverse events to lab results. In this presentation, two experienced statistical programmers share how to leverage the SAS Macro Language, Output Delivery System (ODS), the REPORT procedure, and ODS Statistical Graphics to blend both tabular and graphical elements. The result is beautiful, highly-customized, information-rich patient profiles that meet the requirements for managing a modern clinical trial.
PL-081 : The Rare Disease Clinical Research Network (RDCRN) works to advance medical research on rare diseases by providing support for clinical studies and facilitating collaboration, study enrollment and data sharing.
Kelly Olano, Cincinnati Children's Hospital Medical Center
Pierce Kuhnell, Cincinnati Children's Hospital Medical Center
Laurie Smith, Cincinnati Children's Hospital Medical Center
Rare diseases, defined in the United States as conditions affecting fewer than 200,000 individuals, collectively impact approximately 30 million Americans and 350 million people worldwide. Despite their prevalence, research in this domain faces significant challenges, including limited patient populations, underdiagnosis, insufficient funding, and regulatory hurdles. The Rare Disease Clinical Research Network (RDCRN), established under the Rare Diseases Act and funded by the NIH, addresses these challenges by fostering collaboration across 20 research teams conducting over 125 active studies on more than 200 rare diseases. The Data Management Coordinating Center (DMCC) at Cincinnati Children's Hospital plays a pivotal role in standardizing data collection, ensuring compliance, and supporting analysis and reporting. Key initiatives include migrating legacy data into REDCap and developing tools for data quality and regulatory compliance. Current applications focus on leveraging SAS Viya for providing standardized and custom reports, and analysis datasets with continuing advancements for automated reporting and interactive dashboards, enhancing data accessibility and usability. These efforts aim to accelerate research, improve patient outcomes, and strengthen the infrastructure for rare disease studies.
PL-088 : An analysis of Emergency Department Visits for the 10 Health Division Regions across the US during the period 2018 2025
Baoyuan Zhou, University of Illinois Urbana-Champaign
Extreme heat and cold events are one of the most important causes of climate-related deaths worldwide. According to Chen et al. (2024), five million deaths were attributed to extreme heat and cold globally between 2000-2019. Future climate projections indicate that heat related deaths will increase, and cold related deaths will decrease under warmer climates. Understanding the impacts of extreme heat on health, and on health services demand would provide a better estimation of future climate-related illness burden. In this project we use data from the Heat and Health tracker from the Center of Disease Control and Prevention (CDC) website and other related data sources to investigate the impact of heat waves and extreme heat events on Emergency Department Visits (EDVs) associated to heat related illnesses after standardizing by population. Data was available at a daily basis at a regional level, and it was spatially aggregated to the 10 Health Department Regions (HDRs) across the US, by using an area weighted average. Maximum daily temperature and daily heat Index, as calculated using the National Weather service approach, were analyzed to understand their seasonal cycle and variability across the HDRs. The potential association between the extreme heat events, as determined by the peak times in maximum temperature and heat index, and the EDV time series were investigated and used in the analysis. Log-linear mixed effect models were fitted to the EDV time series accounting for seasonal effects, the impact of the climate variables and vulnerability factors related to socio-economic conditions. Dependence variability between the response and predictors among regions was accounted for as random effects in the proposed models. Prediction errors and goodness of fit were also assessed to evaluate model performance.
Posters
PO-011 : ExCITE-ing! Build Your Paper's Reference Section Programmatically Using Lex Jansen's Website and SASLouise Hadden, Cormac Corporation
One challenge in writing an SAS White Paper is creating the perfect reference section, properly acknowledging those who have inspired and paved the way. Luckily, clever use of such tools as Lex Jansen's website, SAS's ability to read in and manipulate varied data sources, and Microsoft Word citation manager, every author can succeed in proper referencing in their white papers. This paper and ePoster will demonstrate how to accomplish this goal.
PO-012 : The World is Not Enough: Base SAS Visualizations and Geolocations
Louise Hadden, Cormac Corporation
Geographic processing in SAS has recently undergone some major changes: as of Version 9.4 Maintenance Release M5 many procedures formerly a part of SAS/Graph are now available in BASE SAS. At the same time, SAS Graphics have added some new procedures such as PROC SGMAP in BASE SAS that build on the functionality of SAS/GRAPH's PROC GMAP and incorporate ODS graphics techniques including attribute maps and image annotation. This paper and poster will replicate a map of the world created by the author with SAS/GRAPH and PROC GMAP with the annotate facility using PROC SGMAP to map three different metrics on a map of the world. New SAS mapping and SG procedure techniques will be demonstrated, following Agent 007's adventures across the globe.
PO-036 : Who's Bringing That Big Data Energy? A 48-Year Longitudinal Analysis of 30,000 Presentations in the SAS User Community To Elucidate Top Contributors and Rising Stars
Troy Hughes, Data Llama Analytics
This analysis examines presentations at SAS user group conferences between 1976 and 2023. It includes presentations referenced on www.LexJansen.com (aka "the LEX") during this timeframe, which are drawn from multiple conferences, including: SAS User Group International (SUGI, may she rest in peace), SAS Global Forum (SGF, may she be revived), SAS Explore, Western Users of SAS Software (WUSS), Midwest SAS Users Group (MWSUG), South Central SAS Users Group (SCSUG), Southeast SAS Users Group (SESUG), Northeast SAS Users Group (NESUG), Pacific Northwest SAS Users Group (PNWSUG), and Pharmaceutical Software Users Group (PharmaSUG). This analysis identifies top contributors, including authors who have presented most abundantly at specific conferences, as well as across all conferences. For example, the SAS superstars and most prolific presenters of all time are ranked and recognized for bringing that big data energy (BDE) Kirk Paul Lafler, Arthur L. Carpenter, Louise Hadden, Charlie Shipp, and Ronald J. Fehd! Rising stars, who may be new to the conference scene, yet are contributing significantly, are also identified. In addition to quantifying and extolling the contributions of these authors, this analysis aims to assist the leaders of future conferences in identifying key speakers to invite. Threats to data quality are also discussed the reality of third-party data, as conferences often feed the LEX with raw presentation metadata that either are unstandardized or have patently wrong information (like incorrect author names or missing coauthors). Finally, and perhaps with some irony, Python 3.11.5 (and Pandas 2.1.0) was used exclusively to extract, clean, transform, and analyze all data.
PO-060 : Market Making Control Problem with Inventory Risk
Jimin Lee, Washington University in St. Louis
A high-frequency market maker reaps profit from the bid-ask spread, rapidly transacting their buy and sell limit orders. The optimal control problem concerns the placement of buy and sell limit order quotes by the market maker to maximize their total profit. It also concerns minimizing the market maker's inventory of shares, which incurs high liquidation cost at the end of the day. There have been various proposed solutions, with some taking place in discrete time and others in continuous time. Most solutions, however, assume that limit orders have size of 1, when in real life, this is not the case. In this project, we aim to incorporate the size of limit orders placed by market makers in solving the optimal control problem and implement it in python. We will calibrate the model parameters and test the performance of the proposed market making strategy against real-world data.
PO-086 : SAS Viya Voyage - Onboarding, Learning, and Thriving in the Cloud
Connor Ayscue, SAS
Embarking on the journey to SAS Viya can feel like navigating uncharted waters. But with the right guide, it becomes a voyage of opportunity. This poster presents the SAS Viya Adoption Guide, a customer-facing resource designed to help users and administrators explore, build, and accelerate their analytics capabilities in the cloud. From detailing the onboarding process to utilizing the SAS "Use and Optimize Your Software" site and SAS community engagement, this guide offers a comprehensive toolkit for mastering SAS Viya. Whether you're transitioning from SAS 9 or starting fresh, this resource aims to assist in making your Viya Adoption journey a smooth, informed, and empowering experience across the analytics lifecycle.
Visualization and Reporting
VR-005 : Dashboards Made Easy Using SAS SoftwareKirk Paul Lafler, SasNerd
Organizations around the world develop business intelligence and analytics dashboards, sometimes referred to as enterprise dashboards, to display the status of "point-in-time" metrics and key performance indicators. Effectively designed dashboards extract real-time data from multiple sources for the purpose of highlighting important information, numbers, tables, statistics, metrics, performance scorecards and other essential content. This paper explores essential rules for "good" dashboard design, the metrics frequently used in dashboards, and the use of best practice programming techniques in the design of quick and easy dashboards using SAS software. Learn essential programming techniques to create real-world dashboards using Base-SAS software including PROC SQL, macro, Output Delivery System (ODS), ODS HTML, ODS Excel, ODS Layout, ODS Statistical Graphics, PROC SGPLOT, PROC SGPIE, and other technologies.
VR-010 : The (ODS) Output of Your Desires: Creating Designer Reports and Data Sets
Louise Hadden, Cormac Corporation
SAS procedures can convey an enormous amount of information sometimes more information than is needed. Most SAS procedures generate ODS objects behind the scenes. SAS uses these objects with style templates that have custom buckets for certain types of output to produce the output that we see in all destinations (including the SAS listing). By tracing output objects and ODS templates using ODS TRACE (DOM) and by manipulating procedural output and ODS OUTPUT objects, we can pick and choose just the information that we want to see. We can then harness the power of SAS data management and reporting procedures to coalesce the information collected and present the information accurately and attractively.
VR-034 : From Word Clouds to Phrase Clouds to Amaze Clouds: A Data-Driven Python Programming Solution To Building Configurable Taxonomies That Standardize, Categorize, and Visualize Phrase Frequency
Troy Hughes, Data Llama Analytics
Word clouds visualize the relative frequencies of words in some body of text, such as a website, white paper, blog, or book. They are useful in identifying contextual focus and keywords; however, word clouds as commonly defined and implemented suffer numerous limitations. First, multi-word phrases such as "data set" or "Base SAS" are unfortunately segmented into single words "data," "set," "Base," and "SAS." Second, desired capitalization often cannot be specified, such as visualizing "PROC PRINT" even when its lowercase "proc print" is observed in text or code. Third, spelling variations (e.g., single and plural nouns, various verb conjugations, abbreviations and acronyms) are not mapped to each other. Similarly, and fourth, comparable words or phrases (e.g., "PROC PRINT" and "PRINT procedure") are not mapped to each other, representing a further lack of entity resolution. This text and its Python Pandas solution seek to overcome data quality, data integrity, and data standardization issues that plague word clouds, by defining and applying configurable taxonomies data models that can impart more meaning and precision to ultimate word/phrase cloud visualizations. The result is a phrase cloud that amazes an amaze cloud!
VR-045 : Maps, maps, and more maps using SAS PROC SGMAP!
Melinda Macdougall, Cincinnati Children's Hospital Medical Center
SAS expanded the mapping capability with the introduction of PROC SGMAP in the SAS 9.4M5 release. This procedure expands upon the GMAP PROCEDURE and allows for more customizations. The data sets included in the MAPSGFK library provide detailed boundary location data points for locations across the world. These data sets provide easy access to plotting maps at various levels, including continents, countries, states, counties, and cities. These maps can be combined with publicly available data from sources such as data.census.gov to create many useful plots. In this presentation, I will walk through examples of maps at multiple levels: US, state (Ohio), metropolitan area (Greater Cincinnati), and county (Hamilton County, Ohio).
VR-048 : Wise Graphic Design & Color Use for Data Graphics Easily, Quickly, Correctly Understood
LeRoy Bessler, Bessler Consulting and Research
Let me take you straight past the defaults to The Best. Learn principles and methods to make data graphics that are easily, quickly, correctly understood, and how to avoid color decisions that obstruct graphic communication. See widely usable graphic designs that reveal what the data can tell your viewer or you. The ideas are software-independent, but are demonstrated with SAS ODS Graphics.
VR-084 : Progressively Building a Dynamic Report
Justin Bates, Cincinnati Children's Hospital
Beginning SAS programmers will be walked through the basics of using SAS to build simple reports. The built-in data sets in SASHELP will be discussed. Exporting SAS data to Excel will be reviewed. A dynamic report will be built using macros.