Folks, I was going thru few visualizations on Tableau’s website today and I came across this visualization Exploring the SSA Baby Names Dataset by one of the acclaimed Tableau professional….It made me thinking to explore of How many of those SSA baby names are of Indian American (Desi) descent…
I found a website online that had a list of popular Indian baby names…I read the data into SAS and made a Tableau Story out of it… Please take a few moments to play with this interesting viz…I hope you like it….
Here’s the SAS code that went into the prep of the data…
/*Read Indian Baby Names by parsing the http URL */
libnamesharad “C:UsersSharadDesktopnamesbystate”;
proc sql; drop table name; quit;
“http://www.modernindianbabynames.com/modern_baby_name/starting_with/ANY/MF/Sikh/1560/&i.”;
retain start recind recst recend hier;
length SN Name Meaning Gender Origin $ 100;
retain SN Name Meaning Gender Origin;
input record $varying200.len;
put record $varying200.len;
if index(record,‘) then start=1;
if index(record,‘
‘) then start=0;
if index(record,‘) then delete;
if index(record,‘‘) then do; recvalst=1; hier+1;; end;
if index(record,‘
‘) then do; recvalend=1; delete; end;
if index(record,‘ ‘ ) then do; recst=1; hier=0;delete; end;
if index(record,‘
‘) then do; recst=0; hier=0;; end;
if hier=1then do; record=tranwrd(record,‘‘,”); SN=strip(record); end;
else if hier=2then do; record=tranwrd(record,‘‘,”); Name=strip(record); end;
else if hier=3then do; record=tranwrd(record,‘‘,”); Meaning=strip(record); end;
else if hier=4then do; record=tranwrd(record,‘‘,”); Gender=strip(record); end;
else if hier=5then do; record=tranwrd(record,‘‘,”); Origin=strip(record); end;
record=tranwrd(record,‘‘,”);
record=tranwrd(record,‘
‘,”);
if index(record,‘
‘) and start then do; recend=1; hier=0; output; end;
keep SN Name Meaning Gender Origin;
proc append data=_null base=sharad.&type force; run;
Make a list of Indian Names that definetly sound Indian or Closely Indian
dataSharad.Def_IndiaNames;
infilecards4 dlm=’09’xmissover;
lengthName $ 100 IndianorNot $ 1;
Name=strip(propcase(Name));
—-and 1000’s of other records—
Join all available Indian Names
sharad.bengali sharad.hindi sharad.sikh;
Name=translate(Name,”,“‘”);
ifcompress(Name)=” thendelete;
procsort data=Sharad.ALLNames noduprecs; by Name; run;
Re-purpose the data a bit
dataSharad.IndianNames(rename=(dMeaning=Meaning dGender=IGender dOrigin=Origin));
lengthdMeaning $ 100 dGender $15dOrigin $ 100;
retaindMeaning dGender dOrigin;
ifindex(strip(dMeaning),strip(Meaning)) eq 0then dMeaning=catx(‘ OR ‘,strip(dMeaning),strip(Meaning));
ifindex(strip(dOrigin),strip(Origin)) eq 0then dOrigin=catx(‘ ,’,strip(dOrigin),strip(Origin));;
ifindex(strip(dGender),strip(Gender)) eq 0then dGender=catx(‘ OR ‘,strip(dGender),strip(Gender));;
ifdGender in (“Boy OR Girl”,“Girl OR Boy”) thendGender=“Boy OR Girl”;
keepName dMeaning dGender dOrigin;
Read US Gov SSA Baby Names data fields
filenameallst “C:UsersSharadDesktopnamesbystateallallstates.txt”;
infileallst dlm=‘,’dsd missoverfirstobs=2;
lengthState $ 2 Gender $1Year $4 Name $ 50;
inputState Gender Year Name Occurences;
Merge US Gov SSA Baby Names data with Indian Names Data
createtable sharad.IndNames as
selectA.*,IGender,Meaning,Origin
leftjoin Sharad.IndianNames B
Merge US Gov SSA Baby Names data with Hand picked Indian Data
createtable sharad.DefinitelyIndian as
whenA.name=B.name and IndianorNot=‘Y’then ‘Indian Name’
whenA.name=B.name and IndianorNot=‘P’then ‘Likely an Indian Name’
endas IndianDescent length=10
leftjoin Sharad.Def_IndiaNames B
quit;
Read more →