World Statistics, FTW!

This post was kindly contributed by The SAS Dummy - go there to comment and to read the full post.

Yesterday, I was in the #raganSAS audience as David Pogue told me What’s New and What’s Next in the world of technology.
David is a great presenter, and he really had the audience engaged as he talked about augmented reality,
his world according to Twitter, and an iPhone app that comes pretty close to teaching the world
to sing in perfect harmony
(plus a cheater app that helps the world to sing like T. Pain).

On the world-harmony-for-profit theme, he shared information about web sites such as Kiva.org that
facilitate microfinancing around the world. There are other microfinance sites that help people
closer to home (for us in the USA), but as Pogue said, only Kiva.org can give you that “rosy glow” when you
know you’re helping people in developing countries.

Kiva.org opens financial doors for people who might not have another source of funding; but it also presents a
platform rich in data for analysis and reporting. The folks at Kiva.org support web services that allow
you to build applications that reference the data that they collect. They also offer “data snapshots“: downloadable
versions of all of the data they have on the loans, loan recipients, and the lenders who participate.

If you could get this data into SAS, what insights could you glean? What cool stats could you produce?
What stories could you tell with charts and plots?

So, now we come to your homework assignment…if you choose to accept it. I’ve already done the grunt work
of writing a SAS program that transforms the raw data (from its XML format) into SAS data sets. I’ve even
written a sample step that produces a simple chart based on the current data.

My plot with SGPANEL

What can you do with this data using SAS? There are two data sets: lenders (over 400,000 records)
and loans (over 165,000 records). They contain columns
relating to geography (location of lenders and loan recipients), quantity (how many loans, what amounts),
categories (loan purpose/industry, gender of recipient), and time (when the loan was granted/funded). You can
read about the data on Kiva.org, and then create interesting reports using SAS.

Bonus assignment: can you improve my SAS program that pulls the data into SAS? I promise you: there is lots of
room for optimization. (If I held off of this post until I perfected it, it would be
ready for World Statistics Day 2011.)
My implementation uses the XML libname engine, DATA step, and PROC SQL. It could be
more automated (download the zip file with FILENAME URL, extract and process) and more efficient (faster
appends, perhaps joining and summarizing for easier analysis). The program encounters a
few errors when it runs, probably due to character encoding in the XML data. What would you do differently?

Here’s how you can get started:

  • Download my SAS program and XML map files from this ZIP file here (small, just about 3K).
  • Extract the ZIP file to a new folder that your SAS session can access
    as the Kiva “root” folder (example: “C:\public\Kiva” or “/u/userid/Kiva”).

  • Download the data snapshot from Kiva.org (big, about 150MB ZIP file). You need the XML format (not the JSON format).
  • Extract the data snapshot files into your Kiva “root” folder.
  • Modify my kivaProgram.sas file to set the Kiva data root folder, and set the number of
    loan XML files and lender XML files (as described in the comments in the program).

(By the way, I wrote this program entirely using SAS Enterprise Guide 4.3. So I know that you can run it from there,
or within whatever SAS 9.2 environment you have access to.)

What better way to celebrate World Statistics Day than to compute some statistics for the world? Post
your experiences back here in the comments, or use sasCommunity.org to share more details and post the link.

This post was kindly contributed by The SAS Dummy - go there to comment and to read the full post.