So, how can you set a threshold in PROC COMPARE when you check for differences? By default, SAS shows all differences, but sometimes you are only interested in differences that surpass a certain threshold. In the example above, a couple of differences were very small.
LISTEN DATA SAS MERGE HOW TO
How to Use the METHOD and CRITERION Options in PROC COMPARE We have created two tables and we will use the variable my_ID in the ID statement. Remember that the variables that act as the ID variables must have the same name in both datasets. This way, it is not necessary to merge the two datasets before you can compare them.īelow we show an example of how to use PROC COMPARE in combination with the ID statement. These common variables act as an ID for the observations. With the PROC COMPARE procedure and the ID statement, SAS compares observations from two datasets based on one or more common variables. So, how can you compare datasets based on a common ID? However, if your datasets aren’t ordered, or if not all IDs are present in both datasets, then a comparison row-by-row will generate incorrect results. For example, a customer ID, a product number, a date, etc. On many occasions, the datasets you want to compare might have a common variable that acts as an ID, and you want to compare observations with a similar ID. Then, the second row of the base dataset with the second row of the comparison datasets, and so on. That is to say, SAS compares the first row of the base dataset with the first row of the comparison dataset. This summary also reports the total number of compared values with differences.īy default, PROC COMPARE compares the values in the base dataset and the comparison dataset based on the order they appear. If the variable is numeric, then this summary the biggest difference between a pair of compared values ( MaxDif). The summary also specifies the number of differences were found per variable ( Ndif). It tells you both the number of variables with all values equal, as well as the number of variables with one or more differences. In other words, it compares the values of all observations for each variable in both datasets. The Value Comparison Summary summarizes the comparison of values for all observations. The other two rows have one or more variables with unequal values. These are the first two rows of our datasets.
There are two rows that are identical taking into account only the common variables. Because the number of observations in both datasets differs, SAS only compares the values of the common variables in the first four rows. The image above shows that the work.my_first_dataset has four observations, whereas work.my_second_dataset has five. It summarizes the moment when the datasets were created and modified, the number of variables, the number of observations, and the dataset labels. The Data Set Summary shows you which two datasets were taken into account and compares their meta-data. ( Later we will discuss how to specify matching observations) Data Set Summary Then, it compares the values of the second row in both datasets, etc. In other words, it compares the values in the first row of the base dataset with the values in the first row of the comparison dataset. Because we didn’t specify how to match observations, SAS matches observations by rows. Remember, PROC COMPARE compares the values of matching observations. We will discuss each section in further detail. PROC COMPARE creates a report with six sections: For this purpose, we create two datasets work.my_first_dataset and work.my_second_dataset. This is how the steps above look like in SAS code: proc compare base= base-datasetīelow, we give an example of how to use PROC COMPARE and explain the information it generates. Finish and execute the procedure with the RUN statement.Use the COMPARE=-option to specify the name of the second dataset.