Posts Tagged ‘ python ’

Two alternative ways to query large dataset in SAS

June 19, 2015
By

I really appreciate those wonderful comments on my SAS posts by the readers (123). They gave me a lot of inspirations. Due to SAS or SQL’s inherent limitation, recently I feel difficult in deal with some extremely large SAS datasets (it means that I exhausted all possible traditional ways). Here I...
Read more »

Tags: ,
Posted in SAS | Comments Off on Two alternative ways to query large dataset in SAS

saslib: a simple Python tool to lookup SAS metadata

June 3, 2015
By

saslib is an HTML report generator to lookup the metadata (or the head information) like PROC CONTENTS in SAS.
  • It reads the sas7bdat files directly and quickly, and does not need SAS installed.
  • Emulate PROC CONTENTS by jQuery and DataTables.
  • Extract the meta data from all SAS7bdat files under the specified directory.
  • Support IE(>=10), firefox, chrome and any other...
    Read more »

Tags: ,
Posted in SAS | Comments Off on saslib: a simple Python tool to lookup SAS metadata

Deploy a minimal Spark cluster

March 20, 2015
By

Requirements

Since Spark is rapidly evolving, I need to deploy and maintain a minimal Spark cluster for the purpose of testing and prototyping. A public cloud is the best fit for my current demand.
  1. Intranet speed
    The cluster should easily copy the data from one server to another. MapReduce always shuffles a large chunk of data...
    Read more »

    Tags: ,
    Posted in SAS | Comments Off on Deploy a minimal Spark cluster

Deploy a MongoDB powered Flask app in 5 minutes

February 2, 2015
By
Deploy a MongoDB powered Flask app in 5 minutes

This is a quick tutorial to deploy a web service (a social network) by the LNMP (Linux, Nginx, MongoDB, Python) infrastructure on any IaaS cloud. The repo at Github is at https://github.com/dapangmao/minitwit-mongo-ubuntu.

Stack

The stack is built on the tools in the ecosystem of Python below. 

ToolNameAdvantage
CloudDigitalOceanCheap but fast
Server...
Read more »

Tags:
Posted in SAS | Comments Off on Deploy a MongoDB powered Flask app in 5 minutes

Spark practice(4): malicious web attack

January 8, 2015
By

Suppose there is a website tracking user activities to prevent robotic attack on the Internet. Please design an algorithm to identify user IDs that have more than 500 clicks within any given 10 minutes.
Sample.txt: anonymousUserID timeStamp clickCount
123    9:45am    10
234 9:46am ...
Read more »

Tags: ,
Posted in SAS | Comments Off on Spark practice(4): malicious web attack

Spark practice (3): clean and sort Social Security numbers

December 24, 2014
By
Spark practice (3): clean and sort Social Security numbers

Sample.txt
Requirements:
1. separate valid SSN and invalid SSN
2. count the number of valid SSN
402-94-7709 
283-90-3049
124-01-2425
1231232
088-57-9593
905-60-3585
44-82-8341
257581087
327-84-0220
402-94-7709

Thoughts

SSN indexed data is commonly seen and stored in many file systems. The trick to accelerate the speed on Spark is to build a numerical key and use the sortByKey operator. Besides, the accumulator provides a...
Read more »

Tags: ,
Posted in SAS | Comments Off on Spark practice (3): clean and sort Social Security numbers

Spark practice (2): query text using SQL

December 12, 2014
By
Spark practice (2): query text using SQL

In a class of a few children, use SQL to find those who are male and weight over 100.
class.txt (including Name Sex Age Height Weight)
Alfred M 14 69.0 112.5 
Alice F 13 56.5 84.0
Barbara F 13 65.3 98.0
Carol F 14 62.8 102.5
Henry M 14 63.5 102.5
James M 12 57.3 83.0...
Read more »

Tags: ,
Posted in SAS | Comments Off on Spark practice (2): query text using SQL

Spark practice (1): find the stranger that shares the most friends with me

December 7, 2014
By
Spark practice (1): find the stranger that shares the most friends with me

Given the friend pairs in the sample text below (each line contains two people who are friends), find the stranger that shares the most friends with me.
sample.txt
me Alice
Henry me
Henry Alice
me Jane
Alice John
Jane John
Judy Alice
me Mary
Mary Joyce
Joyce Henry
Judy me
Judy Jane
John Carol
Carol me
Mary Henry
Louise Ronald
Ronald Thomas
William Thomas

Thoughts

The scenario is commonly seen for a social network...
Read more »

Tags: ,
Posted in SAS | Comments Off on Spark practice (1): find the stranger that shares the most friends with me

Proc-x is looking for sponsors!

Dear readers, proc-x is looking for sponsors who would be willing to support the site in exchange for banner ads in the right sidebar of the site. If you are interested, please e-mail me at: tal.galili@gmail.com

Welcome!

SAS-X.com offers news and tutorials about the various SAS® software packages, contributed by bloggers. You are welcome to subscribe to e-mail updates, or add your SAS-blog to the site.

SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.