Yesterday I wrote a post about configuring a SAS® 9.4 M2 installation on Linux for Integrated Windows Authentication (IWA) with mid-tier fallback form-based authentication to handle situations where IWA was not available or was disabled. I also repeated this configuration with a SAS Visual Analytics 7.1 installation (based on SAS 9.4 M2). This means that […]
Category: SAS
IWA with SAS 9.4 M2 on Linux
I’ve just finished a challenging but very rewarding experience configuring a SAS 9.4 M2 platform on Linux to use Integrated Windows Authentication (IWA), for both server and mid-tiers ….. without using Quest Authentication Services. The SAS platform has supported IWA on Linux since SAS 9.3 but until recently has only supported it when you “purchase, […]
How to perform a fuzzy match using SAS functions
Click Here
How to split one data set into many
Back in the day when the prison system forced inmates to perform “hard labor”, folks would say (of someone in prison): “He’s busy making little ones out of big ones.” This evokes the cliché image of inmates who are chained together, forced to swing a chisel to break large rocks […]
The post How to split one data set into many appeared first on The SAS Dummy.
SAS employees are now easier to spot on Communities
Since the launch of Communities on SAS, hundreds of SAS employees have been among you. Some SAS employees made themselves known by selecting a telling user name (such as Cynthia@SAS), but others remained camouflaged or incognito, keeping their secret identities like the SAS superheroes they are. That’s about to change. […]
The post SAS employees are now easier to spot on Communities appeared first on The SAS Dummy.
Spark practice(4): malicious web attack
Suppose there is a website tracking user activities to prevent robotic attack on the Internet. Please design an algorithm to identify user IDs that have more than 500 clicks within any given 10 minutes.Sample.txt: anonymousUserID timeStamp clickCount123 9:45am 10
234 9:46am 12
234 9:50am 20
456 9:53am 100
123 9:55am 33
456 9:56am 312
123 10:03am 110
123 10:16am 312
234 10:20am 201
456 10:23am 180
123 10:25am 393
456 10:27am 112
999 12:21pm 888
Thought
Single machine solution
from datetime import datetime
import time
from collections import deque
def get_minute(s, fmt = '%I:%M%p'):
return time.mktime(datetime.strptime(s, fmt).timetuple())
def get_diff(s1, s2):
return int(get_minute(s2) - get_minute(s1)) / 60
def find_ids(infile, duration, maxcnt):
queue, htable, ans = deque(), {}, set()
with open(infile, 'rt') as _infile:
for l in _infile:
line = l.split()
line[2] = int(line[2])
current_id, current_time, current_clk = line
if current_id not in htable:
htable[current_id] = current_clk
else:
htable[current_id] += current_clk
queue.append(line)
while queue and get_diff(queue[0][1], current_time) > duration:
past_id, _, past_clk = queue.popleft()
htable[past_id] -= past_clk
if htable[current_id] > maxcnt:
ans.add(current_id)
return ans
if __name__ == "__main__":
print find_ids('sample.txt', 10, 500)
Cluster solution
To be continued
Example 2015.1: Time to refinance?
In the US, it’s typical to borrow a fairly substantial portion of the cost of a new house from a bank. The cost of these loans, the mortgage rate, varies over time depending on what the financial wizards see in their crystal balls. What this means over time is that when the mortgage rates go down, the cost of living in your own house magically decreases–you take a new loan at the lower rate and pay off your old loan with it– then you only have to pay off the new loan at the lower rate. You can find mortgage rate calculators on the web very easily– if you don’t mind their collecting your data and being bombarded with ads if you let their cookies trace you.
Instead, you can use SAS or R to calculate what you might pay for a new loan with various posted rates. There are some sophisticated tools available for either package if you’re interested in the remaining principal or the proportion of each payment that’s principal. Here, we just want to check the monthly payment.
R
We’ll begin by writing a little function to calculate the monthly payment from the principal, interest rate (in per cent), and term (in years) of the loan. This is basic stuff, but the code here is adapted from a function written by Thomas Girke of UC Riverside.
mortgage J N M monthPay return(monthPay)
}
To compare the monthly costs for a series of loans offered by a local bank, we’ll input the bank’s loans as a data frame. To save typing, we’ll use the rep() function to generate the term of the loan and the points.
offers = data.frame(
principal = rep(275000, times=9),
term = rep(c(30,20,15), each=3),
points = rep(c(0,1,2), times=3),
rate = c(3.875, 3.75, 3.5, 3.625, 3.5, 3.375, 3, 2.875, 2.75))
> offers
principal term points rate
1 275000 30 0 3.875
2 275000 30 1 3.750
3 275000 30 2 3.500
4 275000 20 0 3.625
5 275000 20 1 3.500
6 275000 20 2 3.375
7 275000 15 0 3.000
8 275000 15 1 2.875
9 275000 15 2 2.750
(Points are an up-front cost a borrower can pay to lower the mortgage rate for the loan.) With the data and function in hand, it’s easy to add the monthly cost to the data frame:
offers$monthly = with(offers, mortgage(rate=rate, term=term, principal=principal))
> offers
principal term points rate monthly
1 275000 30 0 3.875 1293.152
2 275000 30 1 3.750 1273.568
3 275000 30 2 3.500 1234.873
4 275000 20 0 3.625 1612.610
5 275000 20 1 3.500 1594.889
6 275000 20 2 3.375 1577.282
7 275000 15 0 3.000 1899.100
8 275000 15 1 2.875 1882.611
9 275000 15 2 2.750 1866.210
In theory, each of these costs are fair, and the borrower should choose based on monthly costs they can afford, as well as whether they see a better value in having money in hand to spend on a better quality of life or to invest it in savings or in paying off their house sooner. Financial professionals often discuss things like the total dollars spent or the total spent on interest vs. principal, as well.
SAS
The SAS/ETS package provides the LOAN procedure, which can calculate the detailed analyses mentioned above. For simple calculations like this one, we can use the mort function in the data step. It will find and return the missing one of the four parameters– principal, payment, rate, and term. To enter the data in a manner similar to R, we’ll use array statements and do loops.
data t;
principal = 275000;
array te [3] (30,20,15);
array po [3] (0,1,2);
array ra [9] (.03875, .0375, .035, .03625, .035,
.03375, .03, .02875, .0275);
do i = 1 to 3;
do j = 1 to 3;
term = te[i];
points = po[j];
rate = ra[ 3 * (i-1) +j];
monthly = mort(principal,.,rate/12, term*12);
output;
end;
end;
run;
proc print noobs data = t;
var principal term points rate monthly; run;
principal term points rate monthly
275000 30 0 0.03875 1293.15
275000 30 1 0.03750 1273.57
275000 30 2 0.03500 1234.87
275000 20 0 0.03625 1612.61
275000 20 1 0.03500 1594.89
275000 20 2 0.03375 1577.28
275000 15 0 0.03000 1899.10
275000 15 1 0.02875 1882.61
275000 15 2 0.02750 1866.21
An unrelated note about aggregators:We love aggregators! Aggregators collect blogs that have similar coverage for the convenience of readers, and for blog authors they offer a way to reach new audiences. SAS and R is aggregated by R-bloggers, PROC-X, and statsblogs with our permission. If you read this on another aggregator that does not credit the blogs it incorporates, please come visit us at SAS and R. We answer comments there and offer direct subscriptions if you like our content. In addition, no one is allowed to profit by this work under our license; if you see advertisements on this page other than as noted above, the aggregator is violating the terms by which we publish our work.->
SAS is #1…In Plans to Discontinue Use
I’ve been tracking The Popularity of Data Analysis Software for many years now, and a clear trend is the decline of the market share of the bigger analytics firms, notably SAS and SPSS. Many people have interpreted my comments as implying … Continue reading →![]()