Use arrays to fill-in missing values
Suppose we know (or suspect) that an age variable for some observations may be missing. We could use the following code to “fill in” the missing years.
The first statement uses the nmiss and dim functions to check to see if there are any missing variables. NMISS returns the number of missing values, while DIM returns the number of elements in the array. If there are no missing values, or if every value is missing, there is nothing to do. If there are missing values the we fill in the missing values.
The variables lastgood and nextgood are set up to contain the last and next non-missing value in the array. The “do i=first to last” processes the entire array. As we check each value in the array, if the age is not missing, we assign the variable lastgood the value of i. This marks the position in the array containing the last non-missing value. If the first element or elements of the array are missing (lastgood equals 0), we want to continue to check the elements of the array until we find a non-missing value, then return to the beginning of the array and replace the missing values.
To do this, we use another index variable k, to keep track of our place in the array. When we find the first non-missing value, we assign that index value to the nextgood variable. We then start with the previous array element and go backwards, replacing the missing values with the correct age.
If the missing value is not the first element of the array, we simple add 1 to the age for the previous year.
if (nmiss(of age75-age90) ne 0) and (nmiss(of age75-age90) ne dim(age)) then do; lastgood = 0; nextgood = 0; do i=first to last; if (age{i} ne .) then lastgood = i; else do ; if lastgood eq 0 then do; k=i+1; do until (nextgood ne 0); if age{k} eq . then k = k + 1; else do; nextgood = k; do fix=nextgood-1 to first by -1; age{fix} = age{fix+1} - 1; end;/* DO loop to fill in missing values */ end; /* ELSE age{k} missing */ end; /* DO UNTIL */ end; /* lastgood eq 0, i.e. first value missing */ else age{i} = age{i-1} + 1; end; end; end; /* Missing and non-missing values */ /* found in array */
Another example:
The next piece of code works with the fuel array.
array fuel{75:90} fuel75-fuel90;
If there are missing values in this array, they will be replaced according to these rules:
1. If the missing values are at the beginning of the array, they will be assigned the first non-missing value.
2. If the missing values are at the end of the array, they will be assigned the last non-missing value.
3. If the missing values are preceeded and followed by non-missing values, they will be replaced by the average of the preceeding and following values.
The processing for case 1 and 2 are similar to the age array. We start at the beginning and/or end and proceed to check the elements of the array until we find a non-missing value. At that point, we use the index variable that has been marking our position in the array to control a loop which fills in the missing values.
To process any missing values in case 3, we need to keep the index of the last element which was not missing, then proceed across the array until the next non-missing element is found, average them, and then using the index for the 2 positions replace the missing values.
Please note that you can have DO loops within DO loops when processing arrays. However, it is very important that distinct index variables be used. In addition, a DROP statement should be used in the data step so that any variables used to keep track of positions in the array, or intermediate mathematical calculations are not kept in the dataset after the data step is completed and they are not needed.
if (nmiss(of fuel75-fuel90) ne 0) and (nmiss(of fuel75-fuel90) ne dim(fuel)) then do; lastval=0; nextval=0; if fuel{first} eq . then do; i = first + 1; do while (nextval = 0); if fuel{i} ne . then do; nextval = i; do j = i-1 to first by -1; fuel{j} = fuel{j+1}; end; /* DO j loop */ end; /* fuel{i} ne . */ else i = i + 1; end; /* DO WHILE */ end; /* First element missing */ lastval=0; nextval=0; if fuel{last} eq . then do; i = last - 1; do while (lastval = 0); if fuel{i} ne . then do; lastval = i; do j = i+1 to last by 1; fuel{j} = fuel{j-1}; end; /* DO j loop */ end; /* fuel{i} ne . */ else i = i - 1; end; /* DO WHILE */ end; /* Last element missing */ lastval=0; nextval=0; do i=first to last; if fuel{i} ne . then lastval = i; else do; k=i+1; average = 0; do until (average ne 0); if fuel{k} eq . then k=k+1; else do; average = (fuel{lastval} + fuel{k}) / 2; do l=lastval+1 to k-1; fuel{l} = average; end; /*Average -> missing */ end; /* ELSE (fuel not missing) */ end; /* DO UNTIL */ end; /* array element missing */ end; /* DO first to last */ end; /* Missing values in array */