Using SAS and Python to normalize database

This post was kindly contributed by SAS ANALYSIS - go there to comment and to read the full post.

Many occasions, data needs to be normalized to speed up query operations before entering a database. Interestingly, SAS’s data step array cannot accept mixed data types including both numeric and character ones, which makes it unusable for such a purpose. However, SAS’s procedures provide handy tools to automate this process without referring to any loop. Large text files have to depend on Python, given its’ excellent memory management.

The examples below show how to transform a tiny SASHELP.CLASS dataset into a normalized and digestible format, by either SAS or Python.

****************(1) USING PROCEDURES FROM SAS***************************;
proc import datafile = "c:\tmp\class.csv" out = class replace dbms = csv;
run;

proc transpose data = class out = normalized;
   by name;
   var sex -- weight;
run;

proc export data = normalized outfile = "c:\tmp\normalized_sas.txt" 
   dbms = dlm replace;
   putnames = no; 
   delimiter = " ";
run;

################(2) USING LOOPS FROM PYTHON###############################
new_file = {}
with open('c:/tmp/class.csv', 'r') as myfile:
  varname = myfile.readline().strip().split(',')
  varname.pop(0)
  name = []
  for each_line in myfile:
    row = each_line.strip().split(',')
    this_name = row.pop(0)
    inner_dict = {}
    for i in range(len(row)):
      inner_dict[varname[i]] = row[i]
  new_file[this_name] = inner_dict
  name.append(this_name)
myfile.close()

out_file = open('c:/tmp/normalized_python.txt', 'w')
for that_name in name:
    for that_var in varname:
        print(that_name, that_var, new_file[that_name][that_var], 
        file = out_file)
out_file.close()
 


This post was kindly contributed by SAS ANALYSIS - go there to comment and to read the full post.