3 Ways to Convert SAS Datasets to Plain Codes

1. If you have SAS Enterprise Guide installed Try one of Chris Hemedinger’s EG plug-ins, “Data Set to DATA Step”. Chris also wrote a post for it, Turn your data set into a DATA step program. 2. If you need to embed the dataset to SQL Clauses Use one of Eric Gebhart’s ODS tagsets, “SQL […]

Use max/min functions to avoid conditions

The functions such as max() and min() play a role such as
if a < b:
a = b
Using them in programing will bring flexibility and simply coding.

1. Best Time to Buy and Sell Stock Total

Say you have an array for which the ith element is the price of a given stock on day i.
If you were only permitted to complete at most one transaction (ie, buy one and sell one share of the stock), design an algorithm to find the maximum profit.
The stock prices fluctuate, which results into profitable gaps(see the graph). The first question only wants to capture the possible profit given only one transaction. There are two other variants originating from it: unlimited transactions and two transactions.
from ggplot import *
import pandas as pd

# If the stock could be traded just once
from sys import maxint
def stock1(a):
min_price = maxint
max_profit = 0
for x in a:
min_price = min(x, min_price)
max_profit = max(x - min_price, max_profit)
return max_profit

def stock1A(a):
max_price = 0
max_profit = 0
for x in reversed(a):
max_price = max(x, max_price)
max_profit = max(max_price - x, max_profit)
return max_profit

#------------------------EXTENSION--------------------------------------------
# If the stock could be traded multiple times and you can only have one stock all time
def stock2(a):
max_profit = 0
for i in range(len(a)):
profit = max(a[i] - a[i-1], 0)
max_profit += profit
return max_profit

# If the stock could be traded at most twice and you can only have one stock all time
from collections import defaultdict
def stock3(a):
min_price = maxint
fst_max_profit = 0
h = defaultdict(list)
for i in range(len(a)):
min_price = min(a[i], min_price)
fst_max_profit = max(a[i] - min_price, fst_max_profit)
h[i].append(fst_max_profit)
max_price = 0
snd_max_profit = 0
for i in reversed(range(len(a))):
max_price = max(a[i], max_price)
snd_max_profit = max(max_price - a[i], snd_max_profit)
h[i].append(snd_max_profit)
return max([sum(h[i]) for i in h.keys()])

if __name__ == "__main__":
a = [1, 3, 6, 3, 1, 2, 4, 4, 2, 5]
print stock1(a)
print stock1A(a)
print stock2(a)
print stock3(a)

# Draw the plot of stock price
df = pd.DataFrame()
df['price'] = a
df['day'] = df.index + 1
print ggplot(aes(x='day', y='price'), data = df) + geom_line()

2. Minimum Window Substring

Given a string S and a string T, find the minimum window in S which will contain all the characters in T in complexity O(n).
For example,
S = “ADOBECODEBANC”
T = “ABC”
Minimum window is “BANC”.
def min_wds(s, t):
if not isinstance(s, str) or not isinstance(t, str):
raise ValueError("Must be strings")
if len(s) == 0 or len(t) == 0:
return ""
a = {}
for i in xrange(len(s)):
try:
a[s[i]] = i
except:
pass
start = len(s)
end = 0
cnt = 0
for x in t:
if a.has_key(x):
start = min(start, a[x])
end = max(end, a[x])
cnt += 1
if cnt == len(t):
return s[start:end+1]
return ""

if __name__ == '__main__':
print min_wds("ADOBECODEBANC", "ABC")

3. Maximum Subarray

Find the contiguous subarray within an array (containing at least one number) which has the largest sum.
For example, given the array [−2,1,−3,4,−1,2,1,−5,4],
the contiguous subarray [4,−1,2,1] has the largest sum = 6.
This question only asks about the maximum length for the non-repetitive subarray. Thus, two max functions will solve it quick.
from sys import maxint
def find_max(A):
overall_max = -maxint
running_max = 0
for x in A:
running_max += x
running_max = max(0, running_max)
overall_max = max(running_max, overall_max)
return overall_max

if __name__ == "__main__":
a = [-2,1,-3,4,-1,2,1,-5,4]
print find_max(a)

Use Python generator to solve the ‘find-all’ problems

The typical find-all questions are widely seen in daily coding. Time complexity and space complexity are not the topmost concerns for these questions. The biggest challenge is that the element number of the result is largely unknown and therefore the upper/lower limits are hard to be sought for the iteration. Thus, the recursion is ideal to replace the iteration, while the generator function in Python supplies a great vehicle to hold the output data.
In summary, the strategy of recursion + generator can be realized with the recursion part and the following driver to wrap it up.

1. Combination Sum

Given a set of candidate numbers (C) and a target number (T), find all unique combinations in C where the candidate numbers sums to T.
The same repeated number may be chosen from C unlimited number of times.
All numbers (including target) will be positive integers.
For example, given candidate set 2,3,6,7 and target 7,
A solution set is:
[7]
[2, 2, 3]
The first yield operator within the exit condition specifies the type of the returned object, while the two yield operators should keep the same type.
The while loop is used to add more conditions to accommodate the 2sum-like subtraction-oriented searching.
def cmb_sum_rcs(arr,  target, start = 0):
"""
A function as the recursion piece
"""

if target == 0:
yield [] # the list as the final format to return
# Use 'while' instead of 'for' to enter iteration given a boundary
while start < len(arr) and arr[start] <= target:
# 'target' instead of 'start' is the driver to jump out the iteration
for p in cmb_sum_rcs(arr, target - arr[start], start):
# Concatenate the elements as a list
yield p + [arr[start]]
start += 1

def cmb_sum(arr, target):
"""
A function to wrap up the recursion piece
"""

return list(cmb_sum_rcs(sorted(arr), target)) # sorting realizes all combinations

if __name__ == "__main__":
print cmb_sum([2, 3, 6, 7, 1], 7)

2. Letter Combinations of a Phone Number

Given a digit string, return all possible letter combinations that the number could represent.
Input:Digit string “23”
Output: [“ad”, “ae”, “af”, “bd”, “be”, “bf”, “cd”, “ce”, “cf”].
This questions is similar to the flatten function at the previous post: the loop occurs at the very beginning.
def phone_nums_rcs(a, arr, start = 0):
"""
A function as the recursion piece
"""

# Retreive the corresponding key-board values from the lookup array
str = a[int(arr[start])]
for x in str:
# Jump out once the last char is reached
if start >= len(arr) - 1:
yield x
else:
for p in phone_nums_rcs(a, arr, start + 1):
yield x + p # concatenate the strings

def phone_nums(arr):
"""
A function to wrap up the recursion piece
"""

a = ["","","abc","def","ghi","jkl","mno","pqrs","tuv","wxyz"]
return list(phone_nums_rcs(a, arr))

if __name__ == "__main__":
print phone_nums("239")

3. Permutations

Given a collection of numbers, return all possible permutations.
Compared with the previous implementation of the permutation function, the coding style of generator is easier to understand and maintain the logic.
For example,
[1,2,3] have the following permutations:
[1,2,3], [1,3,2], [2,1,3], [2,3,1], [3,1,2], and [3,2,1].
def permu_rcs(arr, start = 0):
"""
A function as the recursion piece
"""

if start >= len(arr):
# The copy of the inital input array is passed instead of the reference
yield arr[:]
for i in range(start, len(arr)):
arr[start], arr[i] = arr[i], arr[start]
for x in permu_rcs(arr, start + 1):
yield x
arr[start], arr[i] = arr[i], arr[start]

def permu(arr):
"""
A function to wrap up the recursion piece
"""

return list(permu_rcs(arr))

if __name__ == "__main__":
print permu([0, 1, 2])

How analytics is taking over business

I had the chance to interview Natalie Kortum and Jack Chen of Dell at the Analytics 2013 conference in Orlando about how analytics is taking over business today and in the future. They both think we’ll see some big changes in the next five years. One…