Does your data just look like a bunch of dots? Would you like to bring those dots to life so they can tell their story? This example shows how to make some voter-registration dots a bit more lifelike, so you can see what they’re saying. You can apply t…
Category: SAS
2 Easy Steps to Write a Great Book Outline
Editor’s note: This series of blogs addresses the questions we are most frequently asked at SAS Press! It worth spending some time on this. Arguably, this is one of the most important parts of the book. The table of contents and outline provide the blue print of your book – […]
The post 2 Easy Steps to Write a Great Book Outline appeared first on SAS Learning Post.
Gartner’s 2017 Take on Data Science Software
In my ongoing quest to track The Popularity of Data Analysis Software, I’ve finally decided to change the title to use the newer term “data science”. The 2017 version of Gartner’s Magic Quadrant for Data Science Platforms was just published, … Continue reading →
How to add roads to your SAS maps
Have you been using the SAS/Graph Gmap procedure to plot your data on maps for years, but never knew you could add roads to your maps?!? Follow along in this blog post, and I’ll teach you how… But before we get started, here’s a picture of a nice aer…
7 Simple Steps from Proposal to Print: Development of a SAS Press Book
Editor’s note: This series of blogs addresses the questions we are most frequently asked at SAS Press! In our last post about how to write a good outline, we discussed the importance of developing an outline for your proposed SAS Press book and gave so…
The Other 27 SAS Numeric Missing Values
What?!? You mean a period (.) isn’t the only SAS numeric missing value? Well, there are 27 others: .A .B, to .Z and ._ (period underscore). Your first question might be: “Why would you need more than one missing value?” One situation where multiple missing values are useful involves survey data. Suppose […]
The post The Other 27 SAS Numeric Missing Values appeared first on SAS Learning Post.
Google’s switch to G Suite
My paymenet information for this custom URL is expiring and Google told me to update the payment information. That’s OK, I will do it. However, by switching to G Suite, Google makes paying this $10.00 so difficult that I decided it is not worthy to c…
Good math, bad engineering
As a formal statistician and a current engineer, I feel that a successful engineering project may require both the mathematician’s ability to find the abstraction and the engineer’s ability to find the implementation.
For a typical engineering problem, the steps are usually –
– 1. Abstract the problem with a formula or some pseudocodes
– 2. Solve the problem with the formula
– 3. Iterate the initial solution until it achieves the optimal time complexity and space complexity
I feel that a mathematician would like dynamic programming or DP questions most, because they are too similar to the typical deduction question in math. An engineer will feel it challenging, since it needs the imagination and some sense of math.
The formula is the most important: without it, try-and-error or debugging does not help. Once the the formula is figured out, the rest becomes a piece of the cake. However, sometimes things are not that straightforward. Good mathematics does not always lead to good engineering.
Let’s see one question from Leetcode.
You are given a list of non-negative integers, a1, a2, ..., an, and a target, S. Now you have 2 symbols + and -. For each integer, you should choose one from + and - as its new symbol.
Find out how many ways to assign symbols to make sum of integers equal to target S.
Example 1:
Input: nums is [1, 1, 1, 1, 1], S is 3.
Output: 5
Explanation:
-1+1+1+1+1 = 3
+1-1+1+1+1 = 3
+1+1-1+1+1 = 3
+1+1+1-1+1 = 3
+1+1+1+1-1 = 3
There are 5 ways to assign symbols to make the sum of nums be target 3.
1. The quick solution
For each of the element of a list, it has two options: plus or minus. So the question asks how many ways to get a special number by all possible paths. Of course, if the sum of numbers is unrealistic, we just need to return 0.
Sounds exactly like a DP question. If we have a pencil and a paper, we can start to explore the relationship between dp(n)
and dp(n-1)
. For example, our goal is to get a sum of 5, and we are given a list of [1, 1, 1, 1, 1]. If th a smaller tuple/list is (1, 1, 1, 1) and some paths get 4, that is exactly what we want since it adds 1 and becomes 5. Similarly, if they could get 6, that is fine as well. We add simply both paths together, since there are only two paths.
The formula is is dp(n, s) = dp(n-1, s-x) + dp(n-1, s+x
), where n
is the size of the list, s
is the sum of the numbers and x
is the one that adds to the previous list. OK, the second step is easy.
def findTargetSumWays_1(nums, S):
"""
:type nums: Tuple[int]
:type S: int
:rtype: int
"""
if not nums:
if S == 0:
return 1
else:
return 0
return findTargetSumWays_1(nums[1:], S+nums[0]) + findTargetSumWays_1(nums[1:], S-nums[0])
small_test_nums = (1, 1, 1, 1, 1)
small_test_S = 3
%time findTargetSumWays_1(small_test_nums, small_test_S)
It is theoretically correct and works perfectly with small test cases. But we know that it is going to a nightmare for an engineering application, because it has a hefty time complexity of O(2^N). So math part is done, and We have to move to the third step.
2. The third step that is hard
So we need to find a data structure to record all the paths. If it is the Fibonacci number problem, a simple linear data structure like a list will slash O(2^N) to O(N).
But the hard part is: what data structure is going to be used here. Since the get
operation in Hashtable is O(1), a rolling dictionary will help record the previous states. However, Python’s dictionary does not support change/add ops while it is in a loop, then we have to manually replace it. The overall path will be like a tree structure. So the ideal solution will be like –
def findTargetSumWays_2(nums, S):
if not nums:
return 0
dic = {nums[0]: 1, -nums[0]: 1} if nums[0] != 0 else {0: 2}
for i in range(1, len(nums)):
tdic = {}
for d in dic:
tdic[d + nums[i]] = tdic.get(d + nums[i], 0) + dic.get(d, 0)
tdic[d - nums[i]] = tdic.get(d - nums[i], 0) + dic.get(d, 0)
dic = tdic
return dic.get(S, 0)
big_test_nums = tuple(range(100))
big_test_S = sum(range(88))
%time findTargetSumWays_2(big_test_nums, big_test_S)
The time is exactly what we need. However, the codes are not elegant and hard to understand.
CPU times: user 189 ms, sys: 4.77 ms, total: 194 ms
Wall time: 192 ms
3. Finally the easy solution
If we don’t want things to get complicated. Here we just want a cache and Python 3 provides a lru_cache
decorator. Then adding one line to the first solution will quickly solve the problem.
@lru_cache(10000000)
def findTargetSumWays_3(nums, S):
if not nums:
if S == 0:
return 1
else:
return 0
return findTargetSumWays_3(nums[1:], S+nums[0]) + findTargetSumWays_3(nums[1:], S-nums[0])
%time findTargetSumWays_3(big_test_nums, big_test_S)
CPU times: user 658 ms, sys: 19.7 ms, total: 677 ms
Wall time: 680 ms
Conclusion
Good math cannot solve all the engineering problems. It has to combine with the details of the languange, the application and the system to avoid bad engineering implementation.
The Jupyter notebook is at Github. If you have any comment, please email me wm@sasanalysis.com.