Category: SAS

SAS Visual Analytics Guest Access with IWA Fallback

Yesterday I wrote a post about configuring a SAS® 9.4 M2 installation on Linux for Integrated Windows Authentication (IWA) with mid-tier fallback form-based authentication to handle situations where IWA was not available or was disabled. I also repeated this configuration with a SAS Visual Analytics 7.1 installation (based on SAS 9.4 M2). This means that […]

IWA with SAS 9.4 M2 on Linux

I’ve just finished a challenging but very rewarding experience configuring a SAS 9.4 M2 platform on Linux to use Integrated Windows Authentication (IWA), for both server and mid-tiers ….. without using Quest Authentication Services. The SAS platform has supported IWA on Linux since SAS 9.3 but until recently has only supported it when you “purchase, […]

How to split one data set into many

Back in the day when the prison system forced inmates to perform “hard labor”, folks would say (of someone in prison): “He’s busy making little ones out of big ones.” This evokes the cliché image of inmates who are chained together, forced to swing a chisel to break large rocks […]

The post How to split one data set into many appeared first on The SAS Dummy.

SAS employees are now easier to spot on Communities

Since the launch of Communities on SAS, hundreds of SAS employees have been among you. Some SAS employees made themselves known by selecting a telling user name (such as Cynthia@SAS), but others remained camouflaged or incognito, keeping their secret identities like the SAS superheroes they are. That’s about to change. […]

The post SAS employees are now easier to spot on Communities appeared first on The SAS Dummy.

Spark practice(4): malicious web attack

Suppose there is a website tracking user activities to prevent robotic attack on the Internet. Please design an algorithm to identify user IDs that have more than 500 clicks within any given 10 minutes.
Sample.txt: anonymousUserID timeStamp clickCount
123    9:45am    10
234 9:46am 12
234 9:50am 20
456 9:53am 100
123 9:55am 33
456 9:56am 312
123 10:03am 110
123 10:16am 312
234 10:20am 201
456 10:23am 180
123 10:25am 393
456 10:27am 112
999 12:21pm 888

Thought

This is a typical example of stream processing. The key is to build a fixed-length window to slide through all data, count data within and return the possible malicious IDs.

Single machine solution

Two data structures are used: a queue and a hash table. The queue is scanning the data and only keeps the data within a 10-minute window. Once a new data entry is filled, the old ones out of the window are popped out. The hash table counts the data in the queue and will be updated with the changing queue. Any ID with more than 500 clicks will be added to a set.
from datetime import datetime
import time
from collections import deque

def get_minute(s, fmt = '%I:%M%p'):
return time.mktime(datetime.strptime(s, fmt).timetuple())

def get_diff(s1, s2):
return int(get_minute(s2) - get_minute(s1)) / 60

def find_ids(infile, duration, maxcnt):
queue, htable, ans = deque(), {}, set()
with open(infile, 'rt') as _infile:
for l in _infile:
line = l.split()
line[2] = int(line[2])
current_id, current_time, current_clk = line
if current_id not in htable:
htable[current_id] = current_clk
else:
htable[current_id] += current_clk
queue.append(line)
while queue and get_diff(queue[0][1], current_time) > duration:
past_id, _, past_clk = queue.popleft()
htable[past_id] -= past_clk
if htable[current_id] > maxcnt:
ans.add(current_id)
return ans

if __name__ == "__main__":
print find_ids('sample.txt', 10, 500)

Cluster solution

The newest Spark (version 1.2.0) starts to support Python streaming. However, the document is still scarce — wait to see if this problem can be done by the new API.
To be continued

Example 2015.1: Time to refinance?

In the US, it’s typical to borrow a fairly substantial portion of the cost of a new house from a bank. The cost of these loans, the mortgage rate, varies over time depending on what the financial wizards see in their crystal balls. What this means over time is that when the mortgage rates go down, the cost of living in your own house magically decreases–you take a new loan at the lower rate and pay off your old loan with it– then you only have to pay off the new loan at the lower rate. You can find mortgage rate calculators on the web very easily– if you don’t mind their collecting your data and being bombarded with ads if you let their cookies trace you.

Instead, you can use SAS or R to calculate what you might pay for a new loan with various posted rates. There are some sophisticated tools available for either package if you’re interested in the remaining principal or the proportion of each payment that’s principal. Here, we just want to check the monthly payment.

R
We’ll begin by writing a little function to calculate the monthly payment from the principal, interest rate (in per cent), and term (in years) of the loan. This is basic stuff, but the code here is adapted from a function written by Thomas Girke of UC Riverside.


mortgage J N M monthPay return(monthPay)
}

To compare the monthly costs for a series of loans offered by a local bank, we’ll input the bank’s loans as a data frame. To save typing, we’ll use the rep() function to generate the term of the loan and the points.


offers = data.frame(
principal = rep(275000, times=9),
term = rep(c(30,20,15), each=3),
points = rep(c(0,1,2), times=3),
rate = c(3.875, 3.75, 3.5, 3.625, 3.5, 3.375, 3, 2.875, 2.75))

> offers

principal term points rate
1 275000 30 0 3.875
2 275000 30 1 3.750
3 275000 30 2 3.500
4 275000 20 0 3.625
5 275000 20 1 3.500
6 275000 20 2 3.375
7 275000 15 0 3.000
8 275000 15 1 2.875
9 275000 15 2 2.750

(Points are an up-front cost a borrower can pay to lower the mortgage rate for the loan.) With the data and function in hand, it’s easy to add the monthly cost to the data frame:


offers$monthly = with(offers, mortgage(rate=rate, term=term, principal=principal))

> offers

principal term points rate monthly
1 275000 30 0 3.875 1293.152
2 275000 30 1 3.750 1273.568
3 275000 30 2 3.500 1234.873
4 275000 20 0 3.625 1612.610
5 275000 20 1 3.500 1594.889
6 275000 20 2 3.375 1577.282
7 275000 15 0 3.000 1899.100
8 275000 15 1 2.875 1882.611
9 275000 15 2 2.750 1866.210

In theory, each of these costs are fair, and the borrower should choose based on monthly costs they can afford, as well as whether they see a better value in having money in hand to spend on a better quality of life or to invest it in savings or in paying off their house sooner. Financial professionals often discuss things like the total dollars spent or the total spent on interest vs. principal, as well.

SAS
The SAS/ETS package provides the LOAN procedure, which can calculate the detailed analyses mentioned above. For simple calculations like this one, we can use the mort function in the data step. It will find and return the missing one of the four parameters– principal, payment, rate, and term. To enter the data in a manner similar to R, we’ll use array statements and do loops.


data t;
principal = 275000;
array te [3] (30,20,15);
array po [3] (0,1,2);
array ra [9] (.03875, .0375, .035, .03625, .035,
.03375, .03, .02875, .0275);
do i = 1 to 3;
do j = 1 to 3;
term = te[i];
points = po[j];
rate = ra[ 3 * (i-1) +j];
monthly = mort(principal,.,rate/12, term*12);
output;
end;
end;
run;

proc print noobs data = t;
var principal term points rate monthly; run;

principal term points rate monthly

275000 30 0 0.03875 1293.15
275000 30 1 0.03750 1273.57
275000 30 2 0.03500 1234.87
275000 20 0 0.03625 1612.61
275000 20 1 0.03500 1594.89
275000 20 2 0.03375 1577.28
275000 15 0 0.03000 1899.10
275000 15 1 0.02875 1882.61
275000 15 2 0.02750 1866.21

An unrelated note about aggregators:We love aggregators! Aggregators collect blogs that have similar coverage for the convenience of readers, and for blog authors they offer a way to reach new audiences. SAS and R is aggregated by R-bloggers, PROC-X, and statsblogs with our permission. If you read this on another aggregator that does not credit the blogs it incorporates, please come visit us at SAS and R. We answer comments there and offer direct subscriptions if you like our content. In addition, no one is allowed to profit by this work under our license; if you see advertisements on this page other than as noted above, the aggregator is violating the terms by which we publish our work.->

SAS is #1…In Plans to Discontinue Use

I’ve been tracking The Popularity of Data Analysis Software for many years now, and a clear trend is the decline of the market share of the bigger analytics firms, notably SAS and SPSS. Many people have interpreted my comments as implying … Continue reading