MWSUG XX Paper Presentations

Paper presentations are the heart of the MWSUG conference. The conference will feature papers presented by users just like you on an array of SAS®-related topics including material of interest to beginners and experts alike. Paper presentations are organized into several academic sections.

Click on the section names below to view the paper titles and authors for the MWSUG XX conference. Click on an individual paper title to view the abstract. Note that this list is subject to change.



Application Development

Day/Time Author Paper Title (click for abstract)
Mon., 8:00-8:50amEberhardt, PeterA Cup of Coffee and PROC FCMP: I Cannot Function Without Them
Mon., 9:00-9:50amFallon, Jeffery A.Using AJAX and SAS Stored Processes to Create Dynamic Search Suggest Functionality Similar to Google's
Mon., 10:00-10:50amCochran, Ben T.Using PROC OLAP to Build Cubes with Non-Additive Measures
Mon., 11:00-11:50amVarney, Brian K.The Super Genius Guide to Generating Dummy Data
Mon., 2:00-2:50pmBrobst, Stephen A.High Performance Analytics with In-Database Processing
Mon., 3:00-3:50pmEllwood, JoanneNo More Blue Screens - Running SAS® on Windows Servers
Tues., 8:30-9:20amZiton, KarenDemonstration of Organic Growth Modeling for B to B Marketing
Tues., 9:30-10:20amFrick, Michael C.Turn-Key Performance Metrics using Base SAS® and Excel VBA
Tues., 10:30-11:20amKincaid, Charles D.Revolutionary BI: A Vision for Business Intelligence
Tues., 1:00-1:50pmLewerenz, EricExample of Website “Screen Scraping”
Tues., 2:00-2:50pmMorris, RichUsing SAS as an Archival Repository for DB2 under z/OS (or other DBMS)
Tues., 3:00-3:50pmFallon, Jeffery A.When the List Grows Too Long: A Strategy to Utilize Freeform User Input in Your SAS® Stored Process Web Applications
Tues., 4:00-4:50pmMurico, CarmenSAS® 9.2 Enterprise BI Framework


Data Visualization

Day/Time Author Paper Title (click for abstract)
Mon., 8:00-8:50amMatange, Sanjay A Guided Tour of ODS Graphics
Tues., 8:30-9:20amBessler, LeRoyVisual + Detail = Effective Communication: Web-enabled Graph + Spreadsheet, Using SAS/GRAPH®, ODS, PROC PRINT, and Excel
Tues., 9:30-10:20amPenix, D. J.Seamlessly Delivering Web Based Information to an Organization
Tues., 10:30-11:20amYang, Dongsheng
& Tang, Anne S.
Using Graph Template Language to Customize ODS Statistical Graphs
Tues., 1:00-1:50pmVarney, Brian K.Visualizing Key Performance Indicators Using the GKPI Procedure
Tues., 2:00-3:20pmBessler, LeRoyCommunication-Effective Reporting with Email/BlackBerry/iPhone, PowerPoint, Web/HTML, PDF, RTF/Word, Animation, Images, Audio, Video, 3D, Hardcopy, or Excel


JMP®

Day/Time Author Paper Title (click for abstract)
Mon., 8:00-8:50amNorris, RogerApplication of JMP Custom Design Platform to Optimize a Crystallization Process for Competing Responses
Mon., 9:00-9:50amSweeney, Mike
& Gardner, Sam
Retention Modeling and Understanding the Lifetime Value of Your Insurance Customers
Mon., 10:00-10:50amGardner, SamImproving Insurance Loss Ratios: Using JMP® and SAS® to See the Solution
Mon., 11:00-11:50amSall, JohnChoice Experiments for Market Research and Other Features in JMP® 8
Mon., 2:00-2:50pmWeisz, JonData-Driven Story-Telling: Showcasing Visualization and Analytic Techniques with SAS® and JMP®
Mon., 3:00-3:50pmPlaumann, HeinzDrive Better Decisions with Market Information: Technology Forecasting


Pharmaceutical and Healthcare Applications

Day/Time Author Paper Title (click for abstract)
Tues., 9:00-9:20amRamos, Pedro GregorioRelationship between Digestive and Psychological Disorders
Tues., 9:30-9:50amUgiliweneza, BeatriceAnalysis of Breast Cancer and Surgery as Treatment Options
Tues., 10:00-10:20amRamoju, SireeshaGout Analysis
Tues., 10:30-11:20amZelenskiy, SvetlanaHigh Dietary Glycemic Load is Associated with Increased Risk of Colon Cancer
Tues., 1:00-1:50pmGao, YuboObtaining the Patient Most Recent Time-Stamped Measurements
Tues., 2:00-2:50pmMo, LijiaThe Impact of the Food Safety Information on U.S. Poultry Demands
Tues., 3:00-3:20pmWenerstrom, BrentAnalysis of Emergency Room Waiting Time in SAS®


SAS® 101

Day/Time Author Paper Title (click for abstract)
Mon., 9:00-9:50amDerby, NathanielA Little Stats Won't Hurt
Mon., 11:00-11:50amLafler, Kirk PaulSAS® Programming Tips, Tricks and Techniques
Mon., 2:00-2:50pmMacDougall, MaryHandy SAS® Procedures to Expand your Analytics Skill Set
Mon., 3:00-3:50pmReilly, RandallFun with SAS® Date/Time Formats and Informats
Mon., 4:00-4:50pmWright, Philip A.Eliminating Redundant Custom Formats (or How to Really Take Advantage of PROC SQL, PROC CATALOG, and the DATA Step)


SAS Presents

Day/Time Author Paper Title (click for abstract)
Mon., 8:00-8:50amStokes, Maura
& Rodriguez, Bob
& Balan, Tonya
Methods, Models, and More: New Analyses Available with SAS/STAT® 9.2
Mon., 8:00-8:50amZender, CynthiaCSSSTYLE: Stylish Output with ODS and SAS® 9.2
Mon., 9:00-9:50amHatcher, Diane
& McNeil, Sandy
Getting from SAS® 9.1.3 to SAS® 9.2: Migration or Promotion
Mon., 10:00-10:50amCrevar, MargaretHow to Maintain Happy SAS® Users
Mon., 11:00-11:50amRodriguez, BobSAS® IML Studio and R Integration
Tues., 8:30-9:20amJolley, Linda
& Stroupe, Jane
Dear Miss SASAnswers: A Guide to Efficient PROC SQL Coding
Tues., 8:30-9:20amYang, YuanGroup Sequential Analysis Using the New SEQDESIGN and SEQTEST Procedures
Tues., 9:30-10:20amCrevar, Margaret
& Ihnen, Leigh
Best Practices for Configuring Your I/O Subsystem for Your SAS®9 Applications
Tues., 10:30-11:20amFoley, Richard
& Kent, Paul
The XML Super Hero: An Advanced Understanding of Manipulating XML with SAS®
Tues., 1:00-3:50pmDerr, BobIntroduction to Logistic Regression


Statistics and Data Mining

Day/Time Author Paper Title (click for abstract)
Mon., 9:00-9:50amKennedy, Kevin
& Pencina, Michael
A SAS® Macro to Compute Added Predictive Ability of New Markers in Logistic Regression
Mon., 10:00-10:50amCorliss, David J.ARIMA in Time Series
Mon., 11:00-11:50amGu, Fei
& Little, Todd
& Kingston, Neal M.
Using PROC CALIS and PROC CORR to Compare Structural Equation Modeling Based Reliability Estimates and Coefficient Alpha When Assumptions are Violated
Mon., 2:00-2:50pmZhao, JamesEffective Use of RETAIN Statement in SAS® Programming
Mon., 3:00-3:20pmWang, XiaoOutcome Research for Diabetic Inpatients by SAS® Enterprise Miner™ 5.2
Mon., 4:00-4:50pmStout, MichaelMail Merge using SAS®
Tues., 9:30-10:20amChen, XianzheComparison of Decision Tree to Logistic Regression Model: An Application in Transportation
Tues., 10:30-11:20amThompson, DougMethods for Ranking Predictors in Logistic Regression
Tues., 1:00-1:50pmWright, Philip A.Using the DATA Step's ATTRIB Statement to both Manage and Document Variables in a SAS® Dataset (lightly)
Tues., 2:00-2:50pmLiu, Wensui
& Vu, Chuck
& Kharidhi, Sandeep
A Class of Predictive Models for Multi-Level Risks


Tutorials and Solutions

Day/Time Author Paper Title (click for abstract)
Mon., 8:00-8:50amDavis, ScottGetting By with a Little Help from My Regular Expressions
Mon., 10:00-10:20amPolus, DavidHow to Recruit SAS® Programmers
Mon., 2:00-2:50pmPruitt, RexUsing Base SAS® and SAS® Enterprise Miner™ to Develop Customer Retention Modeling
Mon., 3:00-3:50pmLafler, Kirk Paul
& Shipp, Charles Edwin
Connect with SAS® Professionals Around the World with LinkedIn and sasCommunity.org
Mon., 4:00-4:50pmFrick, Michael C.DOs and DON’Ts of Generating Performance Metrics
Tues., 8:30-9:20amMina, MichaelMaking Your LinkedIn Profile Effective
Tues., 9:30-10:20amDerby, NathanielGetting Correct Results from PROC REG
Tues., 10:30-11:20amDavis, ScottWhere Does This WHERE Go?
Tues., 1:00-1:50pmEberhardt, PeterThings Dr. Johnson Did Not Tell Me: An Introduction to SAS® Dictionary Tables
Tues., 2:00-2:50pmEberhardt, PeterThe SAS® DATA Step: Where Your Input Matters
Tues., 3:00-3:50pmCzyzyk, JosephDecision Making with Uncertain Data Using PROC OPTMODEL



MWSUG XX Paper Abstracts

Applications Development

A01 : High Performance Analytics with In-Database Processing
Stephen Brobst, Teradata Corporation (presenter)
Keith Collins, SAS Institute
Paul Kent, SAS Institute
Michael Watzke, Teradata Corporation
Monday, 2:00-2:50pm, Salon H


A new era of high performance analytics has emerged. In-database processing for deep analytics enables much faster turnaround for developing new analytic models and dramatically reduces the cost associated with data replication into "shadow" file systems. This talk examines these trends and quantifies the impacts. We also provide best practices for execution of a phased deployment strategy of these capabilities.


A02 : Using PROC OLAP to Build Cubes with Non-Additive Measures
Ben T. Cochran, The Bedford Group
Monday, 10:00-10:50am, Salon H


Most of the time, OLAP cubes are built from data that is additive, meaning that as you drilldown, the sum of all the lower levels will add up to the value at the highest level of the hierarchy. This is not always the case. Sometimes applications need drilldown capabilities on data that is non-additive. And, sometimes data is additive in one dimension, but not another. Take for example, a car leasing company that has 2,000 cars to lease. They want to build a cube with two dimensions: Time and Geography. Across the Geography dimension, the number of cars is additive. Let’s say that the levels in the Geography dimension are: Company, Region, State and City. At the Company level, the number of cars is 2,000. When we drilldown to the Region level, the total at all the regions adds up to 2,000. When we drill down to the next level, the total number of cars in all the states adds up to 2,000. etc. This same measure (total number of cars) is NOT additive in the Time dimension. Let’s say that the levels of the Time dimension are: Year, Quarter and Month. If we take the number of cars that are leased each month, they could add up to more than 2,000. And likewise, if we add up all the cars leased each Quarter, they could add up to more than 2,000. But, still this company wants to build a cube with this data. This paper looks at strategies and methods to building a cube with non-additive data. Then, a step by step approach is taken to actually build the cube.


A03 : A Cup of Coffee and PROC FCMP: I Cannot Function Without Them
Peter Eberhardt, Fernwood Consulting Group Inc.
Monday, 8:00-8:50am, Salon H


How many times have you tried to simplify your code with LINK/RETURN statements? How much grief have you put yourself through trying to create macro functions to encapsulate business logic? How many times have you uttered "If only I could call this DATA Step as a function"? If any of these statements describe you, then the new features of PROC FCMP are for you. If none of these statements describe you, then you really need the new features of PROC FCMP. This paper will get you started with everything you need to write, test, and distribute your own "data step" functions with the new (SAS® 9.2) PROC FCMP. This paper is intended for beginner to intermediate programmers, although anyone wanting to learn about PROC FCMP can benefit.


A04 : No More Blue Screens - Running SAS® on Windows Servers
Joanne Ellwood, Progressive Insurance
Monday, 3:00-3:50pm, Salon H


Running SAS Foundation products on Windows Servers with many users can be very challenging. Users can experience slowdowns and suspended blue screens when trying to log onto the server. Runaway SAS sessions can consume large amounts of CPU while doing no real work. This presentation will show how SAS server administrators and users can manage these problems using standard Windows utilities, successfully avoiding the dreaded non-scheduled reboot of a SAS application server. Strategies discussed in the presentation may have application to other SAS installations such as BI Server.


A05 : Using AJAX and SAS Stored Processes to Create Dynamic Search Suggest Functionality Similar to Google's
Jeffery Fallon, Cardinal Health
Monday, 9:00-9:50am, Salon H


This paper shows SAS application developers the steps involved in using SAS Stored Processes to develop a web application with search suggest functionality. After a brief introduction to AJAX, the reader is given a guided look at how to call a SAS Stored Process from JavaScript using the XMLHttpRequest object. The paper then discusses the Stored Process code behind the search suggest functionality, and concludes with a discussion of performance and scalability.


A06 : When the List Grows Too Long: A Strategy to Utilize Freeform User Input in Your SAS® Stored Process Web Applications
Jeffery Fallon, Cardinal Health
Tuesday, 3:00-3:50pm, Salon H


In web based reporting applications, we often allow users to select the values for which they want to see results from a list of values. With this approach there are two problems that immediately surface. The first is that lists become unwieldy beyond around 1000 values. The second is that some users have a large number of values that they want to select. Freeform text input is a viable option, but application developers are often reluctant to yield the control that a list gives them over of their application input. In addition, developers are often fearful of what users might throw at their applications. This paper shows developers how to use SAS text processing functions to powerfully transform even the scariest freeform text into application ready input.


A07 : Turn-Key Performance Metrics using Base SAS® and Excel VBA
Michael Frick
Tuesday, 9:30-10:20am, Salon H


In today’s competitive environment, everyone is being asked to do more with less. Here I describe a turn-key process which automatically produces weekly and monthly performance metrics with virtually no manual intervention. At the push of a button, base SAS is used to assemble the data and then dynamically generate custom VBA source code to produce approximately 200 Excel reports. As there is some initial set-up time, the methodology would be most useful to those who need to repetitively generate lots of tables and charts that have similar structure with a minimum of ongoing manual effort.


A08 : Revolutionary BI: A Vision for Business Intelligence
Charles Kincaid, COMSYS
Tuesday, 10:30-11:20am, Salon H


Delivering BI reports has many similarities with delivering other information over the web (e.g. consumer products or search results). Many companies are successful delivering that information. How do they do it? We’ll discuss the techniques that have succeeded in the Web 2.0 world, e.g. Google, Amazon, Apple. How can these ideas be applied to our own internal web interactions? This session is intended to get people to think in a new way about business intelligence reporting and business analytics that increases the probability of BI success.


A09 : An Example of Website “Screen Scraping”
Eric Lewerenz, My InnerView
Tuesday, 1:00-1:50pm, Salon H


Have you ever needed to collect information from a website without having to tediously cut-and-paste from several different web pages? This paper highlights a cobbled-together method the author used in solving a specific business problem. For beginner and intermediate SAS programmers, this paper may serve as an introduction to a wide range of different SAS functionality, including macros, regular expressions, the URL access method, the DO/%DO loop, PROC TRANSPOSE, and the INDEX and SUBSTR functions.


A10 : Using SAS® as an Archival Repository for DB2 under z/OS (or other DBMS)
Rich Morris, Progressive Insurance
Tuesday, 2:00-2:50pm, Salon H


Your DBA says that the older data has to be removed from the data base in order to make room for new data, but the Legal Department says that you still need to be able to access the older data. What do you do? Copying the data from DB2 to a SAS set is easy enough – but how do you keep track of where it went? And how do you get the data back again? What happens when you need to combine data from the live data base with data from the archives? This paper discusses the reasoning and code that went into building an in-house DBMS archival/retrieval process at Progressive Insurance. This process makes extensive use of macros, and the general methodology is currently being used to archive data from at least five DB2 data warehouses (and one SQL Server database).


A14 : The Super Genius Guide to Generating Dummy Data
Brian Varney, COMSYS
Monday, 11:00-11:50am, Salon H


A common necessity in development and programming is having representative data to program and develop against. Situations that can hamper this could be that the data has not been collected yet, the data is too sensitive to share or just lack of resources to prepare and provide the data from the source data tables. This paper is intended to provide methods to generate representative data whether one has the project data or only metadata in such a way that sensitive data is not revealed.


A15 : Demonstration of Organic Growth Modeling for B to B Marketing
Karen Ziton, Elite Technology Solutions
Tuesday, 8:30-9:20am, Salon H


Objective: Large B to B Conglomerate distributor wants to rank customers for internal organic growth and cross selling opportunities among products and groups of products.

Data available: 2 years of summed sales on all products and product groups within one corporate division. Also about 10 Business demographic data variables from a national source were appended on most existing customer sales records.

Process: Initial data exploration on products and product groups looking at frequencies and volume distributions, most frequent product combinations and analysis on external variables. Propensity to Buy models created on top products and / or product groups using a binary dependent variable and logistic regression created using SAS STAT. A score was created and ranked by those most likely to buy. The distribution was analyzed along with some error statistics. For customers that purchased in the same product groups, a Volume-based Decision Tree model was created using SAS Enterprise Miner to predict expected dollar purchases. The expected purchase amounts were compared to actual purchase amounts to create a ranking by Potential volume amount.

Deliverable: Besides some distribution charts and scoring explanations, the client was delivered a list of customers most likely to buy the targeted products in order of likelihood and a list of which current customers of those products had the highest potential to buy more of that product.


A16 : SAS® 9.2 Enterprise BI Framework
Carmen Murico, SAS Institute Inc.
Monday, 4:00-4:50pm, Salon H


The SAS demonstration will highlight the following:


Data Visualization

G01 : Visual + Detail = Effective Communication: Web-enabled Graph + Spreadsheet, Using SAS/GRAPH®, ODS, PROC PRINT, and Excel
LeRoy Bessler, Ph.D., Assurant Health (invited speaker)
Tuesday, 8:30-9:20am, 401


Graph = quick, easy inference
Precise Numbers = correct inference
Excel = data recipient self-directed post-processing

You can tap the power of SAS/GRAPH and ODS to deliver your SAS data in a web-enabled graph linked to a SAS-created spreadsheet. The web-enabled trend plot includes a hyperlink to the spreadsheet. Furthermore, the web-enabled graph provides pop-up text so that the user can read off EXACT values for plot points, rather than estimate them, without being forced to jump over to the spreadsheet. Finally, the spreadsheet is hyperlinked back to the graph.

The step-by-step presentation of the coding required assumes no prior experience with SAS/GRAPH or ODS. This tutorial is suitable for users of either Version 9.1.3 or Version 9.2 of SAS/GRAPH and ODS.


G02 : Communication-Effective Reporting with Email/BlackBerry/iPhone, PowerPoint, Web/HTML, PDF, RTF/Word, Animation, Images, Audio, Video, 3D, Hardcopy, or Excel
LeRoy Bessler, Ph.D., Assurant Health (invited speaker)
Tuesday, 2:00-3:20pm, 401


This presentation explores all the channels one can use to deliver information. It emphasizes communication-effective design and construction. It is intended for new or experienced users of Version 9.1.3 or 9.2 of SAS®, SAS/GRAPH®, and ODS.


G04 : Visualizing Key Performance Indicators Using the GKPI Procedure
Brian Varney, COMSYS
Tuesday, 1:00-1:50pm, 401


The GKPI procedure is new in SAS 9.2 SAS/Graph. This new procedure can be used to create graphical key performance indicator (KPI) charts which include sliders, bullet graphs, dials, speedometers, and traffic lights. This paper is intended to server as an introduction to the GKPI procedure by discussing the syntax and demonstrating examples. In addition, this paper will discuss how results from the GKPI procedure can be integrated into existing SAS environments.


G05 : Using Graph Template Language to Customize ODS Statistical Graphs
Dongsheng Yang, Cleveland Clinic Foundation Anne Tang, Cleveland Clinic Foundation
Tuesday, 10:30-11:20am, 401


Different from the traditional SAS/Graph, SAS 9.2 ODS statistical graphics provide an easy and quick way to explore and visualize data. However, if higher quality of plots for reports and publications are desired, Graph Template Language is a powerful tool to customize and build stand-alone ODS statistical graphs. This paper describes how to output SAS codes and data of an ODS statistical graph from a statistical procedure. Then Graph Template Language can be used to customize graph templates and style templates. The example includes how to customize Receiver Operating Characteristic curves (ROC) step-by-step.


G06 : Seamlessly Delivering Web Based Information to an Organization
D.J. Penix, Pinnacle Solutions, Inc.
Tuesday, 9:30-10:20am, 401


The SAS® Enterprise BI Server and Futrix platforms provide a complete portfolio of business intelligence capabilities and apply the power of SAS analytics and data integration to create easy-to-use solutions. This presentation will demonstrate how organizations can quickly build, query, and view reports in a web-based environment. The applications can also integrate with JMP®, a data visualization tool from SAS. You will also see how metadata within the system can be leveraged to deliver advanced security, audit tracking, and other optimization features that can greatly assist organizations faced with regulatory or other external challenges.


G07 : A Guided Tour of ODS Graphics
Sanjay Matange, SAS Institute Inc.
Monday, 8:00-8:50am, Room 401


You may be asking "What is ODS Graphics and what's in it for me?". First, ODS Graphics means Easy and High Quality graphics for all SAS users. With SAS 9.2, many SAS/STAT®, SAS/QC® and SAS/Base® procedures produce relevant, high quality graphs with one simple option. You can make minor edits and enhancements to these graphs by using the interactive ODS Graphics Editor. No knowledge of SAS/GRAPH syntax is required so far.

For the SAS/GRAPH® user, you can use the Statistical Graphics (SG) procedures to create graphs for pre analysis stage, or to visualize results of your own custom analysis. For users who prefer an interactive tool to create graphs, ODS Graphics Designer is the application of choice. You can create your graph interactively for immediate use or to run in a batch process with variable substitution. Advanced graph users who need control over all aspects of the graph can use the Graph Template Language (GTL) directly. GTL forms the underpinnings of all the graphs mentioned above.

This presentation is a guided tour of this exciting arena with examples and code. It will provide you the information you need to decide which tool is right for you.


JMP®

J01 : Improving Insurance Loss Ratios: Using JMP and SAS to See the Solution
Sam Gardner, SAS/JMP
Monday, 10:00-10:50am, Huron


This presentation will explore an insurance industry case study where unexplained variability in loss ratios across an insurance company branches has led to reduced profitability. JMP and SAS are used together to collect the data and explore the causes of this variability. Solutions to improve profitability are found using visually interactive graphs and solid statistical analysis. A routine reporting tool that is developed in SAS and JMP tracks the impact of these solutions.


J02 : Application of JMP Custom Design Platform to Optimize a Crystallization Process for Competing Responses
Roger Norris, Eli Lilly and Company
Monday, 8:00-8:50am, Huron


Elanco’s manufacturing plant at Clinton, Indiana, produces a variety of finished products that are used as feed additives in the Poultry, Beef and Pork industries. During 2008, a specific concentrated product began to experience rapidly increasing demand. This product has a variety of forms with differing quality requirements. The process has both mechanical and chemical features likely to produce changes in quality and throughput. Optimization of both quality and throughput at the same level of input conditions was quite unlikely. To understand and find a region to control this process allowing for rapid production with excellent quality control, two experiments were utilized. The first experiment utilized the JMP custom design to produce an efficient, yet powerful experiment to stretch variables in a pilot facility and map potentially competing responses (throughput and quality predictors). The goal was to find a region where both requirements could be satisfied. The second experiment employed a more traditional design within the existing operating instructions to confirm the pilot results and test for robustness. The result was the implementation of a revised set of operating instructions for this specific form of product. The revised settings maintained a highly capable product with respect to quality specifications at substantially increased throughput.


J03 : Drive Better Decisions with Market Information: Technology Forecasting
Heinz Plaumann, BASF
Monday, 3:00-3:50pm, Huron


We are often involved in forecasting and trend analyses, planning business growth. This includes such elements as market growth, planning for capital investments, acquisitions and divestitures, examining probabilities for our profitability and Return on Investment. Often Technology Forecasting is neglected in this analysis. Such forecasting may be based strongly on life cycle analysis: When will our current product offering become obsolete? When is the correct time to undertake an R&D project and bring something new to market? Strategically, do we want to be first to market or “fast-followers”, perhaps buy technology rather than develop? The Fisher-Pry model for forecasting has proven useful to us in a number of areas. A simple explanation of the model is given along with examples of general public interest as well as some specific to our business interest in the chemical industry. The model is fairly simple to use, especially in its linearized form, and gives us information about the rate at which a new product or technology is replacing the old, as well as an estimate time for “half replacement”, when the new has successfully replaced half of the old. Problems and limitations of the model are also given.


J04 : Choice Experiments for Market Research and Other Features in JMP® 8
John Sall, SAS Institute Inc.
Monday, 11:00-11:50am, Huron


Choice (or Conjoint) experiments are one of the principal tools of market research. Now that you can easily implement them on the Web, they should be more accessible for testing even routine engineering decisions. Yet, most change initiatives for product innovation like Design for Six Sigma never get as far as testing feature decisions on customers. Design considerations such as polar factor constraints, factor change limits, and multiple-branch surveys are shown. Analytic features such as Firth estimates, segmentation, subject-side effects, conditional profile optimization, and simulation are also covered.


J05 : Data-Driven Story-Telling: Showcasing Visualization and Analytic Techniques with SAS® and JMP®
Jon Weisz, SAS
Monday, 2:00-2:50pm, Huron


When faced with data that is complex due to many records, columns, or both, the tendency is to use statistical techniques to reduce data to models and statistics. This makes the results hard for non-statisticians to understand and can also lead to missing “opportunities” in the data. This talk will focus on a data story using US housing value data from Freddie Mac to show how JMP® and SAS® can work well together to move from data to information to insights to communication of results to the non-statistical world.


J06 : Retention Modeling and Understanding the Lifetime Value of Your Insurance Customers
Mike Sweeney, Elite Technology Solutions
Sam Gardner, SAS Institute – JMP Team
Monday, 9:00-9:50am, Huron


This discussion will focus on the importance of understanding the retention potential of your customers and resulting Lifetime Value to your company. Accurate predictions of retention, at inception, can help guide refined pricing decisions, balance expense allocation, and assist in targeted marketing campaigns. Further, being able to project which types of customers are likely to leave in the near future allows for focused attention on critical problem areas. Discussion will include business issues and tools available in SAS and JMP to help structure solutions to this universal business problem.


Pharmaceutical and Healthcare Applications

P02 : Obtaining the Patient Most Recent Time-Stamped Measurements
Yubo Gao, University of Iowa Hospitals and Clinics
Tuesday, 1:00-1:50pm, Salon A


Each time when a patient visited clinic, the clinic took several measurements, and the measurements usually varied with visits. Researchers sometimes are interested in the most recent time-stamped measurements even though they are taken in different times. Some paper has tried to obtain the most recent measurements, but failed to report the associated time. This paper corrects that shortcoming by adding the corresponding date for each most recent measurement. This will tell more important information than that without dates.


P03 : The Impact of the Food Safety Information on U.S. Poultry Demands
Lijia Mo, Kansas State University
Tuesday, 2:00-2:50pm, Salon A


The impact of poultry product recall events on consumer demand in the USA was tested empirically for the four major categories of poultry: broiler (young chicken,) eggs, turkey and other chicken (mature or non-broiler chicken.) FSIS recall and the MEDIA recall impacted only turkey, and demonstrated that consumers of turkey were a special behavior group in poultry consumers.


P05 : Gout Analysis
Sireesha Ramoju, University of Louisville
Tuesday, 10:00-10:20am, Salon A


The purpose of this project is to examine the metabolic disorder called Gout. This disease is created by a buildup of uric acid crystals deposited on the articular cartilage joints, tendons and surrounding tissues. Medical treatment for gout usually involves Short-term treatment and Long-term treatment. Kernel Density Estimation is used to examine the ratio of the disease, medication and cost. The Logistic Regression and Linear Regression analysis are used to find the various demographic factors that affect the Gout disease. The data are from MEPS (http://www.meps.ahrq.gov). SAS software is used for the analysis of Kernel Density Estimation, Logistic Regression and Linear Regression. Finally the study shows that Gout is affecting men mostly in the age group of 40 – 70 and in women after menopause, majority of Gout affected patients are using Allopurinol as a long term treatment, and those who are diagnosed with digestive disorders will have more chances of having the gout disease.


P06 : Relationship between Digestive and Psychological Disorders
Pedro Gregorio Ramos, University of Louisville
Tuesday, 3:30-3:50pm, Salon A


Objective: to find relationships between diagnoses related to the digestive system and psychological conditions in order to determine patient profiles that could lead to better treatments and service utilization. Method: The data set for the year 2005 was obtained from the Nationwide Inpatient Sample (NIS), the largest all-payer inpatient care database in the United States. A ten percent sample was obtained with a sampling module from SAS Enterprise Miner. It contains data on 7,995,048 hospital stays from approximately one-thousand hospitals. From it, a random ten per cent sample from the data set for the year 2005 was obtained. This set was filtered to contain the records of patients having at least a digestive condition using a SAS query module on the variable, DRG, or Diagnosis Resource Code. Another SAS code module was used to create the binary variable, GI, to identify these records from those not related to digestive conditions. Then, this subset was filtered to identify records containing a psychological condition as a diagnosis. Such records were marked with another binary variable: PSY. SAS Enterprise Guide was used to visualize the data. The SAS Kernel Density function was used to compute the distribution of Length of Stay and Total Charges by the presence of psychological conditions given a digestive condition was the reason to visit a hospital. Finally, SAS code was used to run log-linear regression on these data sets. Results: the proportion of patients with a psychological condition given that digestive condition was first diagnosed is 8%; the proportion of patients with a digestive condition given that psychological disorder was first diagnosed is 10%. Similarly, the proportion of patients having both conditions is 10%. Conclusion: the proportion of patients having both kinds of conditions is relatively small, indicating that further studies with a more precise focus should be conducted to determine if a relationship exists between digestive and psychological disorders.


P08 : Analysis of Breast Cancer and Surgery as Treatment Options
Beatrice Ugiliweneza, University of Louisville
Tuesday, 3:00-3:20pm, Salon A


In this paper, we analyze breast cancer cases from the MEPS, the NIS and the Thomson Medstat Market Scan® data using SAS and Enterprise Guide 4. First, we find breast cancer cases using ICD9 codes. We are interested in the age distribution, in the total charges of the entire treatment sequence and in the length of stay at the hospital during treatment. Then, we study two major surgery treatments: Mastectomy and Lumpectomy. For each one of them, we analyze the total charges and the length of stay. Then, we compare these two treatments in terms of total charges, length of stay and the association of the choice of treatment with age. Finally, we analyze other treatment options. The objective is to understand the methods used to obtain some useful information about breast cancer and also to explore how to use SAS and Enterprise Guide 4 can be used to examine specific healthcare problems.


P09 : Analysis of Emergency Room Waiting Time in SAS®
Brent Wenerstrom, University of Louisville
Tuesday, 9:30-9:50am, Salon A


Background: Life and death may be on the line for patients visiting an emergency department (ED). The time it takes a patient to see a doctor can be critical. This is made more difficult by the fact that waiting times have been increasing over the past ten years. Objective: We would like to determine what factors impact the current waiting in emergency departments through the use of Enterprise Guide®. Methods: We are using survey data from the 2006 Ambulatory Health Care Data survey conducted by the National Center for Health Statistics (NCHS) containing about 36,000 data records. We used linear regression to model our data with the help of the PROC GLM function. Results: We found that self reported pain levels did not correlate with waiting time, but that ED prioritization, time of day of visit, arrival by ambulance and previous waiting times from the same emergency department correlated with waiting time.


P10 : High Dietary Glycemic Load is Associated with Increased Risk of Colon Cancer
Svetlana Zelenskiy, Case Western Reserve University
Tuesday, 10:30-11:20am, Salon A


High dietary glycemic load (GL) has been inconsistently associated with the risk of colon cancer in epidemiologic studies. We seek to further clarify this relationship in a population-based incident case-control study. The study sample consisted of 360 incident colon cancer cases and 420 population controls. Cases were recruited through the Kentucky Cancer Registry, and controls were recruited via random digit dialing. Glycemic load was assessed based on a self-administered food frequency questionnaire. On average, the cases had a significantly higher GL (mean = 149.0g, SD = 102.5g) than the controls (mean = 132.3g, SD = 79.1g) (p = 0.0119). In multivariate unconditional logistic regression model adjusted for age, gender, race, body mass index (BMI), family history of colorectal cancer, and total caloric intake, the odds ratio (OR) for the 2nd through the upper quintiles of GL were: 1.23 (95% CI: 0.78, 1.97), 1.01 (95% CI: 0.63, 1.63), 1.21 (95% CI: 0.75, 1.95), 1.74 (95% CI: 1.06, 2.85), respectively (p for trend = 0.0618), as compared to those at the bottom quintile of GL intake. Our results support the hypothesis that a diet with a high glycemic load increases the risk of colon cancer.


SAS® 101

B01 : A Little Stats Won't Hurt You
Nathaniel Derby, Statis Pro Data Analytics
Monday, 9:00-9:50am, 401


This paper gives an introduction to some basic but critically important concepts of statistics and data analysis for the SAS programmer who pulls or manipulates data, but who might not understand what goes into a proper data analysis. We first introduce some basic ideas of descriptive statistics for one-variable data, and then expand those ideas into many variables. We then introduce the idea of statistical significance, and then conclude with how all these ideas can be used to answer questions about the data. Examples and SAS® code are provided.


B02 : SAS® Programming Tips, Tricks and Techniques
Kirk Paul Lafler, Software Intelligence Corporation
Monday, 11:00-11:50am, 401


The Base SAS system offers users a comprehensive DATA step programming language, an assortment of powerful PROCs, a macro language that extends the capabilities of the SAS system, and user-friendly interfaces such as SAS Display Manager and Enterprise Guide. This presentation explores a collection of proven tips, tricks and techniques related to effectively using the SAS system and its many features. Attendees will examine keyboard shortcuts to aid in improved productivity; the use of subroutines and copy libraries in the DATA step to standardize and manage code inventories; data summarization techniques; the application of simple reusable coding techniques using the macro language; troubleshooting and code debugging techniques; along with other topics.


B03 : Handy SAS® Procedures to Expand your Analytics Skill Set
Mary MacDougall, National City Corporation, now part of PNC
Monday, 2:00-2:50pm, 401


A good handyman can accomplish a lot with a general-purpose tool like a hammer or screwdriver, but for some projects, it’s critical to have a special-purpose tool like a pipe wrench in your toolbox. PROC REPORT is great for everyday reporting tasks, but a senior analyst should also be familiar with procedures that are effective for specific data preparation and analysis tasks. This paper gives an introduction to my favorite Base SAS and SAS/STAT procedures: UNIVARIATE, RANK, TRANSPOSE, FORMAT, and SURVEYSELECT, with examples from the field of direct marketing. So the next time you need to Decile, Flatten, Bin, Sample or Check for Outliers, you will know which Proc to start with.


B04 : Fun with SAS® Date/Time Formats and Informats
Randall Reilly, Covance Clinical Phamacology
Monday, 3:00-3:50pm, 401


This paper was written as a tutorial for programmers new to SAS who encounter dates and times in a variety of formats. It was also written as a reminder to the more experienced SAS programmer on various ways to convert dates and times to SAS format.


B07 : Eliminating Redundant Custom Formats (or How to Really Take Advantage of PROC SQL, PROC CATALOG, and the DATA Step)
Philip A. Wright, University of Michigan
Monday, 4:00-4:50pm, 401


Custom formats are an invaluable asset to the SAS programmer. Their functionality provides for much more than simply a mechanism for explicitly labeling values in a dataset. There can be, however, a major limitation--the data step can accommodate only 4,096 formats at a time. It is unlikely that a SAS programmer would generate this many formats in code, but this is not the only method that generates formats. PROC IMPORT and other third-party data conversion programs may well generate a distinct custom format for every variable in a dataset, and datasets with more than 4,096 variables are not uncommon. Oftentimes, however, these formats can be quite redundant--the same coding scheme was used for many similar variables. Eliminating redundant custom formats may well get you below the 4,096 limit. PROC SQL, PROC CATALOG and the DATA step are the tools that can be used to eliminate the redundant formats.


SAS Presents

S01 : Best Practices for Configuring Your I/O Subsystem for Your SAS®9 Applications
Margaret Crevar, SAS Institute Inc.
Leigh Ihnen, SAS Institute Inc.
Tuesday, 9:30-10:20am, Salon G


The increased power of SAS®9 applications allows information and knowledge creation from very large amounts of data. Analysis that used to consist of 10s-100s of GBs of supporting data has rapidly grown into the tens of terabytes. This data expansion has resulted in more and larger SAS® data stores. Setting up file systems to support these large volumes of data, as well as ensuring adequate storage space for the SAS temporary files, can be very challenging. This paper will present some best practices for configuring the I/O subsystem for your SAS®9 applications, ensuring adequate capacity, bandwidth, and performance to keep your SAS®9 users moving.


S02 : How to Maintain Happy SAS® Users
Margaret Crevar, SAS Institute Inc.
Monday, 10:00-10:50am, Salon G


Today's SAS® environment has high numbers of concurrent SAS processes and ever-growing data volumes. It is imperative to proactively manage system resources and performance to keep your SAS community productive and happy. We have found that ensuring your SAS applications have the proper computer resources is the best way to make sure your SAS users remain happy.


S03 : Introduction to Logistic Regression Using SAS® Software
Bob Derr, SAS Institute Inc.
Tuesday, 1:00-3:50pm, Salon G


Logistic regression is one of the basic modeling tools for a statistician or data analyst. This tutorial focuses on the basic methodology behind logistic regression and discusses parameterization, testing goodness of fit, and model evaluation using the LOGISTIC procedure. The tutorial concentrates on binary response models, but direction for handling ordinal responses is also provided. This tutorial discusses numerous ODS graphics now available with the LOGISTIC procedure, as well as newer features of SAS 9.2 such as ROC comparisons and odds ratios with interactions. The tutorial includes numerous examples.


S04 : The XML Super Hero: An Advanced Understanding of Manipulating XML with SAS®
Richard Foley, SAS Institute Inc.
Paul Kent, SAS Institute Inc.
Tuesday, 10:30-11:20am, Salon G


SAS® provides multiple ways of working with XML data. Understanding the various techniques of using SAS to read and write XML will make you the champion of your team. Such techniques include the use of ODS TagSets, Xquery, XSL, and even various Taxonomies. This paper will provide examples and look into how SAS uses these tools to help you become an XML superhero.


S05 : Getting from SAS® 9.1.3 to SAS® 9.2: Migration or Promotion
Diane Hatcher, SAS Institute Inc.
Sandy McNeil, SAS Institute Inc.
Monday, 9:00-9:50am, Salon G


If you are running a metadata server in your SAS® 9.1.3 environment, you must upgrade your metadata when you move to SAS® 9.2. There are two possible approaches to take. The first approach is a migration, which will essentially copy your SAS 9.1.3 environment over to SAS 9.2 as part of your SAS 9.2 installation process. The second approach is to make a fresh start by installing SAS 9.2 first, then promoting specific content from SAS 9.1.3 afterwards. This paper will describe the tools, processes, and considerations for each approach. The goal is to help you make an educated assessment to determine which approach will work best for you.


S06 : Dear Miss SASAnswers: A Guide to Efficient PROC SQL Coding
Linda Jolley, SAS Institute Inc.
Jane Stroupe, SAS Institute Inc.
Tuesday, 8:30-9:20am, Salon G


She's back! Advice columnist Ms. SAS® Answers has received more questions from curious users--questions such as "What used all that CPU time?", "Which should I use, MERGE or JOIN?", "Should I use a subquery?", and "Why should I learn SQL anyway?" She will share her answers to help you harness the potential of Structured Query Language.


S07 : Methods, Models, and More: New Analyses Available with SAS/STAT® 9.2
Maura Stokes, SAS Institute Inc.
Robert Rodriguez, SAS Institute Inc.
Tonya Balan, SAS Institute Inc.
Monday, 8:00-8:50am, Salon F


The new release of SAS/STAT® software contains numerous updates and five new experimental procedures. This talk briefly outlines the wealth of enhancements, ranging from dozens of new statistical graphics to new exact tests to a new version of the CALIS procedure, and then focuses on specific examples. Learn about fitting zero-inflated Poisson models with the GENMOD procedure, performing AB/BA crossover design analysis with the TTEST procedure, performing ML estimation for generalized linear mixed models and incorporating the new EFFECT statement in PROC GLIMMIX, and generating and comparing ROC curves with the LOGISTIC procedure. In addition, this talk demonstrates the use of the experimental MCMC procedure for Bayesian analysis and the experimental SEQTEST and SEQDESIGN procedures for group sequential analysis. This talk also describes the many resources available in the documentation and on the Web to get you started with this exciting new release.


S08 : R U There? (Interface to R in IML Studio)
Robert Rodriguez, SAS Institute Inc.
Monday, 11:00-11:50am, Salon G


Learn about the brand new interface to the R language in SAS/IML® Studio.


S09 : CSSSTYLE: Stylish Output with ODS and SAS® 9.2
Cynthia Zender, SAS Institute Inc.
Monday, 8:00-8:50am, Salon G


It has been the standard for most company Web sites to use cascading style sheets (CSS) to standardize Web site look and feel. Now, with ODS and SAS® 9.2, you can design a single CSS file to be used with ODS RTF, PDF, and HTML. In the past, CSS usage was limited to files created with ODS HTML. This paper provides an introduction into the use of the new CSSSTYLE option in SAS 9.2. This option allows you to use cascading style sheet (CSS) style specifications for RTF and PDF files, in addition to HTML files. This paper will include a brief introduction to CSS syntax and some of the features, such as @media CSS sections, that are particularly useful when creating ODS output. Then, all the methods of using CSS with ODS are discussed, with an emphasis on the new CSSSTYLE option. In addition to these topics, a job aid will be provided that outlines the most commonly used CSS properties and property values.


S10 : Group Sequential Analysis Using the New SEQDESIGN and SEQTEST Procedures
Yuan Yang, SAS Institute Inc.
Tuesday, 8:30-9:20am, Huron


In a fixed-sample clinical trial, data on all individuals are analyzed at the end of the study. In contrast, a group sequential trial provides for interim analyses before completion of the trial. Thus, a group sequential trial is useful for preventing unnecessary exposure of patients to an unsafe new drug or to a placebo treatment if a new drug shows significant improvement. This paper reviews basic concepts of group sequential analysis and introduces two SAS/STAT® procedures: the SEQDESIGN and SEQTEST procedures. Both procedures are experimental in SAS® 9.2. The SEQDESIGN procedure creates group sequential designs by computing boundary values with a variety of methods, including the O'Brien-Fleming, Whitehead, and error spending methods; it also provides required sample size. The SEQTEST procedure compares the test statistic with the boundary values at each stage so that the trial can be stopped to reject or accept the hypothesis; it also computes parameter estimates, confidence limits, and p-values after the trial stops.


Statistics and Data Mining




D02 : Comparison of Decision Tree to Logistic Regression Model: An Application in Transportation
Xianzhe Chen, North Dakota State University
Tuesday, 9:30-10:20am, Huron


This paper applies a decision tree model and a logistic regression model to a real transportation problem, compares results of these two methods and discusses practical model building and validating procedures. There are eight kinds of commodities which need to be shipped from elevators located in North Dakota to six different locations in Minnesota by either rail or truck. A decision tree model and a logistic regression model are built for this data set, respectively. The response variable is the transportation mode choice, i.e. rail or truck. The decision tree model is implemented by SAS Enterprise Miner and the logistic model is built by PROC GENMOD. In order to develop a tree model, the data set is partitioned into three data sets for training, validation and test purposes. This decision tree model is used to score a prospect data set which does not have the value of the target variable. It is used to predict the probability of response for every record in the prospect data set. The logistic regression model is also used to score the prospect data set. The results from decision tree model and logistic regression model are compared. And the differences and advantages of each model used in application have been discussed as well.


D03 : ARIMA in Time Series
David Corliss, Marketing Associates
Monday, 10:00-10:50am, Salon F


Statistical models using an Auto-Regressive Integrated Moving Average (ARIMA) have found very wide applications, especially in Time Series analysis. However, a time series is not required: only the use of equally spaced intervals for the independent variable. This can be done by binning data into standard ranges, such as income by $10,000 intervals. ARIMA models often work well when regression fails: when the data is highly skewed, autocorrelated, or both. In this paper, several NT-ARIMA models are given, including medical, financial and scientific applications. A macro is given that allows the SAS’s PROC ARIMA to operate with non-time-based data.


D04 : Using PROC CALIS and PROC CORR to Compare Structural Equation Modeling Based Reliability Estimates and Coefficient Alpha When Assumptions are Violated
Fei Gu, University of Kansas (Lawrence KS)
Todd Little, University of Kansas (Lawrence KS)
Neal M. Kingston, University of Kansas (Lawrence KS)
Monday, 11:00-11:50am, Salon H


PROC CORR is widely used to calculate the Cronbach's Alpha, and it has been described as a lower bound for test reliability. However, previous research has shown that when certain assumptions are violated coefficient alpha can overestimate or underestimate reliability. Raykov has shown that structural equation modeling can be used to estimate reliability. This research illustrates Raykov's SEM method in PROC CALIS and shows that under certain violations of assumptions coefficient alpha estimates can show a substantial positive bias in some extreme circumstances, and the magnitude of bias of coefficient alpha estimates are larger than that of structural equation modeling-based reliability estimates.


D05 : A SAS® Macro to Compute Added Predictive Ability of New Markers in Logistic Regression
Kevin Kennedy, St. Luke's Hospital
Michael Pencina, St. Luke's Hospital
Monday, 9:00-9:50am, Salon F


In many applications it is imperative to access if a variable (or several variables) improve performance of the logistic regression model. Statistical significance does not imply clinical significance or meaningful improvement in model performance and thus cannot suffice. Traditionally, comparison of areas under the receiver-operating-characteristic curves (AUCs) has been used to evaluate how a new marker improves model performance. However, this metric is not very meaningful and very large independent associations of the new marker with the outcome may be needed to result in a meaningful change in AUC. To address this issue Pencina and D’Agostino in their 2008 Statistics in Medicine paper propose two new statistics that can be used to evaluate the importance of new markers. The Integrated Discrimination Improvement, or IDI, is a measure of the new model’s improvement in average sensitivity without sacrificing average specificity. The Net Reclassification Improvement, or NRI, measures the correctness of reclassification of people based on their predicted probabilities of events using the new model with the option of imposing meaningful risk categories. A related approach based on the concept of net benefit has been proposed by Vickers et al. and allows for a graphical representation of the gain incurred using the new model. This paper discusses these new measures and then presents an easy to use SAS macro to obtain output of traditional comparisions (AUC) and the new measures (IDI, NRI, and Vickers’ Decision Curve).


D09 : Mail Merge using SAS®
Michael Stout, DePuy Orthopaedics, Inc.
Monday, 4:00-4:50pm, Salon F


Have you ever wanted to add a formatted letter to the front of our SAS output? Using the SAS® ODS® facility is a excellent method for combining letters with SAS® output. Creating letters using a word document and then collating them for small to medium mailings can be tedious and time consuming. SAS® ODS®, PROC SQL, PROC IMPORT, inline styles and Microsoft Excel® come to the rescue and make it fairly easy to create standardized letters on the fly. This paper shows one method that may be used to dynamically generate letters while saving time and improving the quality and presentation of the reports being prepared for distribution.


D10 : Methods for Ranking Predictors in Logistic Regression
Doug Thompson, Assurant Health
Tuesday, 10:30-11:20am, Huron


Logistic regression is often used to develop predictive models of binary outcomes, e.g., buy/no buy, lapse/renew. One of the first things that clients typically want to know about such models is which variables in the predictor set were most strongly associated with the predicted outcome. To answer this question, it is necessary to rank the predictors by some measure of strength of association with the predicted outcome. There is little consensus on how best to rank predictors in logistic regression. This presentation provides several options: Ranking by standardized coefficients, odds ratios, individual c-statistics, adequacy, and pseudo-partial correlations. The methods are illustrated with examples using SAS PROC LOGISTIC. The interpretation of each ranking method is outlined, and upsides and downsides of each method are described.


D11 : A Class of Predictive Models for Multi-Level Risks
Wensui Liu, J.P. Morgan Chase
Chuck Vu, Acxiom
Sandeep Kharidhi, Acxiom
Tuesday, 2:00-2:50pm, Huron


In the financial service industry, discriminant analysis and its variants based upon binary outcome, such as logistic regression or neural networks, are largely used to develop predictive models. However, the two-state assumption of such models over-simplifies customers’ behavioral outcomes and ignores the existence of multi-level risk. In many situations, the financial impact of a certain customer is directly related to the frequency and the severity of his/her adverse behaviors. Therefore, it is of interest to model and predict such multi-level risks. Several modeling techniques, from Poisson to Ordered Logit models, have been widely discussed in numerous research literatures about how to predict the multi-level risks. Our paper is also an attempt contributed to this end. Several modeling strategies together with their SAS implementations and related scoring scheme will be illustrated. Our purpose is to demonstrate an application of these complex statistical models with the business touch and how to implement them in a production environment.


D12 : Outcome Research for Diabetic Inpatients by SAS® Enterprise Miner™ 5.2
Xiao Wang, University of Louisville
Monday, 3:00-3:20pm, Salon F


The main purpose of this paper is to evaluate and predict the diabetic inpatient outcomes in Medicare. In the study, we used data sets about inpatient claim or beneficiary demography information for the year 2004, both of which come from Chronic Condition Data Warehouse. In this study, we used the Text Miner node to generate procedure and diagnosis clusters, preparing for kernel density estimation of the total charges, association analysis of the various procedures and prediction of the outcomes. We also used the link graphs and the rules table in the association analysis and different kinds of predictive models to analyze the outcomes. We utilized the CATX function to put all possible diagnosis or procedure codes into one text string .We also used the RXMATCH function, Random Sampling, SAS SQL and as well as Base SAS. Results show that many organ diseases and neurological disorders raise the costs of inpatient care. Although the expenditures on kidney disease are unexpectedly lower than those spent on diabetes itself, kidney disease has an important effect on the total charges, especially beyond 40,000 dollars. The procedures such as Hemodialysis and Angiocardiography are frequently used; most procedures related to cardiac diseases are utilized with other procedures. Another discovery is that only procedures and diagnoses are important in prediction of mortality and total charges. The utilization day count is highly related to the total charges and conversely.


D13 : Using the DATA Step's ATTRIB Statement to both Manage and Document Variables in a SAS® Dataset (lightly)
Philip Wright, University of Michigan
Tuesday, 1:00-1:50pm, Huron


There are several different ways in which to order variables in a SAS dataset. Some of them are quite similar, one is dangerous, and each of them must be used prior to a set, merge, or update statement. One of these statements--the attrib statement--can serve the role of several other statements, is not dangerous, and can actually serve as light documentation in data step code. Generating a complete attrib statement for a large data set, however, can be a daunting task. The use of a macro that generates a complete attrib statement for a previously-generated data set can be of great use to manage the data set while the data step is further developed. Using the macro to generate a complete attrib statement and subsequently using the generated attrib statement to order/re-order and aid the conversion of variable types will be demonstrated.


D14 : Effective Use of RETAIN Statement in SAS® Programming
James Zhao, Merck
Monday, 2:00-2:50pm, Salon F


The RETAIN statement allows values to be kept across observations enabling complex data manipulation. In SAS data step, it is quite straightforward to do variable comparison or data calculations within observations. However, sometimes it is necessary to perform calculations across observations. The RETAIN statement could be used to keep a data value from the current iteration of the data step to the next data step. Otherwise, SAS automatically sets such values to missing before each iteration. However, the RATAIN statement is one of those tricky SAS statements which, if not used wisely, would result in unexpected, and often unnoticed, data processing errors. This paper presents four major areas of usage for the RETAIN statement. They include: (1) Carry over values from one observation to another; (2) Compare values across observations; (3) Assign initiate values for variables; and (4) Arrange the variable order in output dataset. Sample program codes and tips are provided and discussed.


Tutorials and Solutions

T01 : Getting By with a Little Help from My Regular Expressions
Scott Davis, COMSYS
Monday, 8:00-8:50am, Salon A


Much has been written about the power and flexibility of regular expressions. Regular expressions represent an entirely new drawer in the programmer’s toolbox that gives you yet another way of attacking a problem and developing a solution. Imagine a rather common scenario where during a testing phase, output from the current run is compared to output from the previous run. If updates to the program do not pertain to a particular table, then the output from both runs should be the same. What about the logs though? Excluding things like dates, what happens when changes in the program result in vastly differing logs. Compare utilities cannot adequately match up these logs with the additional information. Enter some regular expressions along with metadata to help solve the comparison problem. This goal of this paper is to explore a particular application of regular expressions (the comparison of vastly differing logs). It assumes a basic understanding of them.


T02 : Connect with SAS® Professionals Around the World with LinkedIn and sasCommunity.org
Kirk Paul Lafler, Software Intelligence Corporation
Charles Edwin Shipp, Shipp Consulting
Monday, 3:00-3:50pm, Salon A


Accelerate your career and professional development with LinkedIn and sasCommunity.org. Establish and manage a professional network of trusted contacts, colleagues and experts. These exciting social networking and collaborative online communities enable users to connect with millions of SAS users worldwide, anytime and anywhere. This paper explores exciting features found in both virtual communities. Topics include creating a profile and social network content, developing a network of friends and colleagues, joining special-interest groups, accessing a Wiki-based web site where anyone can add or change content on any page on the web site, sharing biographical information between both communities using a built-in widget, exchanging ideas in Bloggers Corner, viewing scheduled and unscheduled events, using a built-in search facility to search for desired wiki-content, collaborating on projects and file sharing, reading and responding to specific forum topics, and much more.


T03 : Using Base SAS® and SAS® Enterprise Miner(tm) to Develop Customer Retention Modeling
Rex Pruitt, PREMIER Bankcard, LLC
Monday, 2:00-2:50pm, Salon A


In this paper I will describe how to develop the components necessary using SAS tools and business analytics to effectively identify a “Good Customer.” Objective (Target): Develop the components necessary using MIS Analytics to effectively identify a “Good Customer.” This “Good Customer Score” will be used in modeling exercises designed to help improve the cost effectiveness and development of Retention efforts at PREMIER. Estimated Opportunity Value: Reduce the attrition of PREMIER’s “Top Good Customers” >= 2 Years on Book = $15+ Million annually (see Table #2). Recommendation: Add the “Good Customer Score” (see Definition Matrix – Table #1) to the Data Warehouse and begin using it to develop and implement specific targeted Retention strategies. Portfolio Scoring & Ranking - The accuracy of the new “Good Customer Score” is supported by the statistical correlation to Behavior Score (3rd party score), as well as other scores, when identifying those customers who will perform in the top 25% of the Portfolio ranked by Good Customer Score. Interestingly, the Thindex score (another 3rd party score) has the strongest correlation as noted in the Chi-Square table (see Table #3). This was not an expected outcome at first glance. Additionally, the score comparison exercise performed using modeling in E-Miner was validated with a KS Statistic of > 46 and a prediction accuracy of > 81%.


T04 : DOs and DON’Ts of Generating Performance Metrics
Michael Frick
Monday, 4:00-4:50pm, Salon A


The author draws on 10 years of working experience with both executive leadership and operational groups to provide a practical list of the do’s and don’ts for generating performance metrics that will drive the desired improvements. A sample of topics covered are (1) the need to separate process improvement metrics (owned by planning) from process execution metrics (owned by operations), (2) the need for alignment of metrics with organizational boundaries and operational job responsibilities, (3) the balancing act between too many and too few metrics on a scorecard, (4) dangers of going to an automated electronic scorecard system, and (5) why medians are a “don’t.”


T05 : How to Recruit SAS® Programmers
David Polus, COMSYS Clinical
Monday, 10:00-10:20am, Room 401


Consulting in the Clinical Programming world is more competitive than ever. The demand for a good programmer with clinical knowledge is always increasing. At times, it seems the ideal candidate for a position doesn’t exist anywhere. What makes clinical programming so different, anyway? This paper will review how to recruit, identify and hire the best candidates for the position. Once they’re in the fold, mentoring and training become the keys to long term retention. Personal growth and a sense of belonging ensure a happy consultant, and a happier client. Finally, there are times when you have to perform damage control with a particular consultant. A well designed and executed Action Plan can help strengthen your relationship with both the client and the consultant.


T06 : Making Your LinkedIn Profile Effective
Michael Mina
Tuesday, 8:30-9:20am, Salon F


LinkedIn is the primary social networking tool for career-driven professionals. In today’s knowledge economy, it is an essential part of any personal branding strategy. Professionals use LinkedIn to network with each other and with recruiters, to find job openings only available to LinkedIn members, and to share their knowledge with others, enhancing their own reputations. This presentation will focus on those features available in a free LinkedIn membership. In this presentation, participants will learn how to use LinkedIn as part of a comprehensive career-driven social networking strategy.


T07 : Getting Correct Results from PROC REG
Nathaniel Derby, Statis Pro Data Analytics
Tuesday, 9:30-10:20am, Salon F


PROC REG, SAS®'s implementation of linear regression, is often used to fit a line without checking the underlying assumptions of the model or understanding of the output. As a result, we can sometimes fit a line that is not appropriate for the data and get erroneous results. This paper gives a brief introduction to fitting a line with PROC REG, including assessing model assumptions and output to tell us if our results are valid. We then illustrate how one kind of data (time series data) can sometimes give us misleading results even when those model diagnostics appear to indicate that the results are correct. A simple method is proposed to avoid this.


T08 : Things Dr. Johnson Did Not Tell Me: An Introduction to SAS® Dictionary Tables
Peter Eberhardt, Fernwood Consulting Group
Tuesday, 1:00-1:50pm, Salon F


SAS maintains a wealth of information about the active SAS session, including information on libraries, tables, files and system options; this information is contained in the Dictionary Tables. Understanding and using these tables will help you build interactive and dynamic applications. Unfortunately, Dictionary Tables are often considered an ‘Advanced’ topic to SAS programmers. This paper will help novice and intermediate SAS programmers get started with their mastery of the Dictionary tables.


T09 : Where Does This WHERE Go?
Scott Davis, COMSYS
Tuesday, 10:30-11:20am, Salon F


The clever folks at SAS® have given the user community a variety of ways to accomplish tasks. Being the creative bunch that we are, SAS® programmers are, have yet to tire of discovering and using those different paths to get to the same end. In that spirit this paper focuses on a very popular SAS® statement, the WHERE statement, and areas that can pose a problem when it is combined with PROC SORT. In a typical clinical study (read: ALL) there will be adverse events. Many times programmers will find themselves doing validation on an adverse event table that either they created or was created by someone else. We all know the power of the NODUPKEY option in PROC SORT, and countless experts have forewarned about the dangers that come with its use. For those of us who have heeded those warnings and proceeded with care, it turns out that another danger awaits when using a subsetting where clause in your PROC SORT. This paper explores the placement of a where statement in a PROC SORT during input and output and how very different numbers can be achieved depending on your data.


T10 : The SAS® DATA Step: Where Your Input Matters
Peter Eberhardt, Fernwood Consulting Group
Tuesday, 2:00-2:50pm, Salon F


Before the warehouse is stocked, before the stats are computed and the reports run, before all the fun things we do with SAS® can be done, the data need to be read into SAS. A simple statement, INPUT, and its close cousins FILENAME and INFILE, do a lot. This paper will show you how to define your input file and how to read through it, whether you have a simple flat file or a more complex formatted file.


T11 : Decision Making with Uncertain Data Using PROC OPTMODEL
Joseph Czyzyk, Central Michigan University Research Corporation
Tuesday, 3:00-3:50pm, Salon F


How do you plan for an uncertain future? A common approach for tackling this problem is to generate a set of possible outcomes for the future, commonly called scenarios. Each scenario has some probability of occurrence, and the decision-maker is responsible to use the scenarios to make a wise decision in the current period to be ready for the future. If the decision problem can be formulated as a linear program, the uncertainty can be captured in the model using a stochastic programming formulation. We will present how to use PROC OPTMODEL to write the linear programming formulation of a planning problem. We will show how scenarios can be written to capture the uncertainty in future periods of the planning horizon. The stochastic program will be formulated and solved using PROC OPTMODEL. We will show how optimizing over all of the future scenarios together at one time provides a better solution than optimizing over each scenario individually. We will see that the stochastic programming approach optimizes over all of the scenarios at once by minimizing the expected cost over all the scenarios. The stochastic solution is, in some sense, the best "hedge" against future uncertainties.