Tag: python

SAS vs. Python for data analysis

To perform data analysis efficiently, I need a full stack programming language rather than frequently switching from one language to another. That means — this language can hold large quantity of data, manipulate data promptly and easily (e.g. if-then-else; iteration), connect to various data sources such as relational database and Hadoop, apply some statistical models, and report result as graph, table or web. SAS is famous for its capacity to realize such a data cycle, as long as you are willing to pay the annual license fee.
SAS’s long-standing competitor, R, still keeps growing. However, in the past years, the Python community has launched a crazy movement to port R’s jewels and ideas to Python, which resulted in a few solid applications such as pandas and ggplot. With the rapid accumulation of the data-related tools in Python, I feel more comfortable to work with data in Python than R, because I have a bias that Python’s interpreter is more steady than R’s while dealing with data, and sometimes I just want to escape from R’s idiosyncratic syntax such as x<-4 or foo.bar.2000=10.

Actually there is no competition between SAS and R at all: these two dwell in two parallel universes and rely on distinctive ecosystems. SAS, Python, Bash and Perl process data row-wise, which means they input and output data line by line. R, Matlab, SAS/IML, Python/pandas and SQL manipulate data column-wise. The size of data for row-wise packages such as SAS are hard-disk-bound at the cost of low speed due to hard disk. On the contrary, the column-wise packages including R are memory-bound given the much faster speed brought by memory. 
Let’s go back to the comparison between SAS and Python. For most parts I am familiar with in SAS, I can find the equivalent modules in Python. I create a table below to list the similar components between SAS and Python.
SAS Python
DATA step core Python
SAS/STAT StatsModels
SAS/Graph matplotlib
SAS Statistical Graphics ggplot
PROC SQL sqlite3
SAS/IML NumPy
SAS Windowing Environment Qt Console for iPython
SAS Studio IPython notebook
SAS In-Memory Analytics for Hadoop Spark with Python
This week SAS announced some promising products. Interesting, they can be traced to some of the Python’s similar implementations. For example, SAS Studio, a fancy web-based IDE with the feature of code completion, opens an HTML server at local machine and uses a browser to do coding, which is amazingly similar to iPython notebook. Another example is SAS In-Memory Analytics for Hadoop. Given that the old MapReduce path for data analysis is painfully time-consuming and complicated, aggregating memory instead of hard disk across many nodes of a Hadoop cluster is certainly faster and more interactive. Based on the same idea, Apache Spark, which fully supports Python scripting, has just been released to CDH 5.0. It will be interesting to compare Python and SAS’s in-memory ability for data analysis at the level of Hadoop.
Before there is a new killer app for R, at least for now, Python steals R’s thunder to be an open source alternative for SAS.

Use a list as stack/queue in Python

Python does not have native structures for stack or queue, which could otherwise be manually implemented if needed. However, list in Python is an easy alternative as stack or queue. Stack and queue share the same method append(), while stack uses pop() and queue applies pop(0) distinctively.

1. Valid Parentheses

Given a string containing just the characters ‘(‘, ‘)’, ‘{‘, ‘}’, ‘[‘ and ‘]’, determine if the input string is valid.
The brackets must close in the correct order, “()” and “()[]{}” are all valid but “(]” and “([)]” are not.
This question executes the principle of try-match-else-insert. Since the question only needs to return True/False, the logic is pretty straightforward.
# -*- coding: utf-8 -*-
def isValid(s):
a = []
for x in s:
if len(a) == 0:
a.append(x)
# Locate the three corret cases
elif ( x == ']' and a[-1] == '[' ) or ( x == '}' and a[-1] == '{' ) or ( x == ')' and a[-1] == '(' ):
a.pop()
else:
a.append(x)
if len(a) != 0:
return False
return True

if __name__ == "__main__":
testcase1 = '['
testcase2 = '(('
testcase3 = '([)]'
testcase4 = '()[]{}'
testcase5 = '([])'
for i in range(1, 6):
print isValid(eval('testcase' + str(i)))

2. Longest Valid Parentheses

Given a string containing just the characters ‘(‘ and ‘)’, find the length of the longest valid (well-formed) parentheses substring.
For “(()”, the longest valid parentheses substring is “()”, which has length = 2.
Another example is “)()())”, where the longest valid parentheses substring is “()()”, which has length = 4.
This question is the difficult version of the question above, since it asks the maximum length of valid parentheses. This time the objected to be inserted is the index of the character instead of the character itself.
# -*- coding: utf-8 -*-
def LVP2(a):
if len(a) == 0 or a == None:
return 0
last = -1
maxLen = 0
b = [] # use a buffer list as stack
for i in xrange(len(a)):
if a[i] == '(':
b.append(i)
else:
# If buffer is empty
if not b:
# Record the position before first left parenthesis
last = i
else:
b.pop()
# If buffer is empty again
if not b:
maxLen = max(i-last, maxLen)
else:
# Select the top element from the buffer
maxLen = max(i-b[-1], maxLen)
return maxLen

if __name__ == "__main__":
testcase1 = ")()())"
testcase2 = '()()()()()))))))(((((('
testcase3 = "()(()))"
testcase4 = "()(()"
testcase5 = "))))((()(("
for i in range(1, 6):
print LVP2(eval("testcase" + str(i)))

3. Evaluate Reverse Polish Notation

Evaluate the value of an arithmetic expression in Reverse Polish Notation.
Valid operators are +, -, *, /. Each operand may be an integer or another expression.
Some examples:
[“2”, “1”, “+”, “3”, ““] -> ((2 + 1) 3) -> 9
[“4”, “13”, “5”, “/“, “+”] -> (4 + (13 / 5)) -> 6
This question implements the principle of try-number-else-calcultate. The tricky part is that the stack will have to pop the elements twice.
# -*- coding: utf-8 -*-
def polish(s):
stack = []
for x in s:
# Use the int() function to make decision instead of isdigit()
try:
stack.append(int(x))
except:
# Still want to floating operation
e2 = float(stack.pop()) # the first pop-out is the second number
e1 = float(stack.pop())
if x == '+':
result = e1 + e2
elif x == '-':
result = e1 - e2
elif x == '*':
result = e1 * e2
elif x == '/':
if e2 == 0:
raise ValueError("No zero for denominator")
result = e1 / e2
else:
raise ValueError("Incorrect operator")
stack.append(int(result))
return stack[0] # transform single element list to number

if __name__ == "__main__":
testcase1 = ["2", "1", "+", "3", "*"]
testcase2 = ["4", "13", "5", "/", "+"]
testcase3 = ["3", "-4", "+"]
testcase4 = ["18"]
testcase5 = ["10","6","9","3","+","-11","*","/","*","17","+","5","+"]
for i in range(1, 6):
print polish(eval('testcase' + str(i)))

Sorting in Python

#-------------------------------------------------------------------------------
# Name: Methods of sorting
# Purpose: implements the sortings mentioned by Robert Sedgewick and
# Kevin Wayne, Algorithms 4ed
#
#-------------------------------------------------------------------------------

def selection_sort(a):
for i in range(len(a)):
min = i
for j in range(i+1, len(a)):
if a[j] < a[min]:
min = j
a[i], a[min] = a[min], a[i]

def insertion_sort(a):
for i in range(len(a)):
j = i
while j > 0:
if a[j] < a[j-1]:
a[j], a[j-1] = a[j-1], a[j]
j -= 1

def shell_sort(a):
h = 1
while h <= len(a)/3:
h = 3*h+ 1 # in the test use 4 as increment sequence
while h >= 1:
for i in range(len(a)):
j = i
while j >= h and a[j] < a[j-h]:
a[j], a[j-h] = a[j-h], a[j]
j -= h
h /= 3

def merge_sort(x):
result = []
if len(x) < 2:
return x
mid = int(len(x)/2)
y = merge_sort(x[:mid])
z = merge_sort(x[mid:])
i = 0
j = 0
while i < len(y) and j < len(z):
if y[i] > z[j]:
result.append(z[j])
j += 1
else:
result.append(y[i])
i += 1
result += y[i:]
result += z[j:]
return result

def quick_sort(a):
if len(a) <= 1:
return a
else:
return quick_sort([x for x in a[1:] if x < a[0]]) + [a[0]] \
+ quick_sort([x for x in a[1:] if x >= a[0]])

if __name__ == '__main__':
a = [7, 10, 1, 1, 3, 4, 5, 9, 2, 8]
b = {}
for i in range(1, 6):
b['test'+str(i)] = a[:]
# Test the three simple sortings
insertion_sort(b['test1']) #1
selection_sort(b['test2']) #2
shell_sort(b['test3']) #3
print b
# Test the sortings that requires recursion
print merge_sort(b['test4']) #4
print quick_sort(b['test5']) #5
# Timsort that is native in Python
a.sort() #6
print a

Use hash to decrease complexity

  • Unique String
#-------------------------------------------------------------------------------
# Name: Unique String
# Purpose: Find if a string is unique or not
#
#-------------------------------------------------------------------------------

# Solution 1
def is_unique(s):
a = [False]*256
for n in s:
# Use the ord() function to find the ASCII value of a character
if a[ord(n)]:
return False # jump out of the loop
a[ord(n)] = True
return True

# Solution 2
def is_unique2(s):
a = {}
for x in s:
try:
a[x] += 1
return False
except:
a[x] = 1
return True

if __name__ == "__main__":
testcases = ["abcdefg", "abcdefga", "5523231"]
for case in testcases:
print is_unique(case)
print is_unique2(case)
  • Two Sum
#-------------------------------------------------------------------------------
# Name: Two Sum
# Purpose: Given an array of integers, find two numbers
# such that they add up to a specific target number
#
#-------------------------------------------------------------------------------

# Solution 1
def two_sum(a, target):
res = []
for i in range(len(a)):
for j in range(i+1, len(a)):
if a[i] + a[j] == target:
res.append([a[i], a[j]])
return res

# Solution 2
def two_sum2(a, target):
h = {}
res = []
for x in a:
if h.has_key(x):
res.append([target-x, x])
h[target - x] = x
return res

if __name__ == "__main__":
testcase = [1, 2, 4, 3, 10, 9]
target = 5
print two_sum(testcase, target)
print two_sum2(testcase, target)
  • Longest Consecutive Sequence
#-------------------------------------------------------------------------------
# Name: Longest Consecutive Sequence
# Purpose: Given an unsorted array of integers, find the length of the longest consecutive elements sequence.
#
# For example,
# Given [100, 4, 200, 1, 3, 2],
# The longest consecutive elements sequence is [1, 2, 3, 4]. Return its length: 4.
#
# Your algorithm should run in O(n) complexity.
#-------------------------------------------------------------------------------

def find_lgst(a):
h = {} # use a hash table
st = set(a) # use a hash set
cnt = 0
for x in a: # iteration is O(n)
h[x] = 1
right = x + 1
left = x - 1
while right in st: # search in a hash set is O(1)
st.discard(right) # reduce the size and increase speed
h[x] += 1
right +=1
while left in st:
st.discard(left) # reduce the size and increase speed
h[x] += 1
left -= 1
cnt = max(cnt, h[x])
return cnt # don't use max(h.values()), which is not O(n)

if __name__ == '__main__':
a = [100, 4, 200, 1, 3, 2, 5, 2321, 6, 9, 10, 42343, 10, 7, 32424, 8]
print find_lgst(a)
  • Longest Common Prefix
#-------------------------------------------------------------------------------
# Name: Longest Common Prefix
# Purpose: Write a function to find the longest common prefix string amongst an array of strings.
#
#-------------------------------------------------------------------------------

def find_prefix(a):
h = {}
prefix = ''
for s in a:
for i, x in enumerate(s):
key = str(i) + x
try:
h[key] += 1
except:
h[key] = 1
# Recover the character and the order from the key
for key in sorted(h, key=lambda x: int(x[0:-1])):
if h[key] != len(a):
return prefix
prefix += key[-1]
return prefix

if __name__ == "__main__":
a = ['ab', 'abc', 'abaffsfas']
b = ["a"]
c = ["c", "c"]
d = []
e = ["aca","cba"]
g = ["abab","aba","abc"]
print find_prefix(a)
print find_prefix(b)
print find_prefix(c)
print find_prefix(d)
print find_prefix(e)
print find_prefix(g)
  • First Missing Positive
#-------------------------------------------------------------------------------
# Name: First Missing Positive
# Purpose: Given an unsorted integer array, find the first missing positive integer.
#
# For example,
# Given [1,2,0] return 3,
# and [3,4,-1,1] return 2.
#
# Your algorithm should run in O(n) time and uses constant space.
#
#-------------------------------------------------------------------------------
def fst_mis( a):
if len(a) == 0:
return 1
st = set(range(1, len(a)+1))
for x in a:
if x in st:
st.discard(x)
if len(st) == 0:
return max(a) + 1
return st.pop()

if __name__ == "__main__":
test1 = []
test2 = [1,2,0]
test3 = [3,4,-1,1]
print fst_mis(test1)
print fst_mis(test2)
print fst_mis(test3)

Use Python to solve math questions (1)

  1. Translate the numbers to an array of integer to avoid stack overflow
  2. Use while to control flow once the iteration times are unknown
  3. Apply % and \ to retrieve digits from a integer
  4. Be careful of the direction of the iteration

  • Reverse Integer
Reverse digits of an integer.
Example1: x = 123, return 321
Example2: x = -123, return -321
def rev_int(a):
negative = False
result = 0
if a == 0:
return a
if a < 0:
negative = True
a = -a
while a > 0:
result = result*10 + a%10
a /= 10
if negative:
return -result
return result

if __name__ == "__main__":
print rev_int(123)
print rev_int(-4124324)
  • Multiply Strings
Given two numbers represented as strings, return multiplication of the numbers as a string.
Note: The numbers can be arbitrarily large and are non-negative.
class multiply:
def mul(self, int1, int2):
a = self.toArray(int1)
b = self.toArray(int2)
c = [0]*(len(a) + len(b))
d = [0]*(len(a) + len(b))
for i in range(len(a)):
for j in range(len(b)):
p = a[i] * b[j]
c[i+j+1] += p
rem = 0
for i in reversed(range(len(c))):
c[i] += rem
carry = c[i]%10
rem = c[i]/10
d[i] = carry
return self.toInt(d)

def toArray(self, n):
result = []
while n > 0:
a = n%10
result.append(int(a)) # a is possibly a long type given the value of n
n = n/10
return [x for x in reversed(result)]

def toInt(self, a):
y = 0
for i in range(len(a)):
mul = 10**( len(a)-1-i )
y += a[i]*mul
return y

if __name__ == "__main__":
a = multiply()
# Testcase1
print a.mul(9934248, 983204442)
print 9934248*983204442
# Testcase2
print a.mul(1, 2)
  • Valid Number
Validate if a given string is numeric.
Some examples:
“0” => true
“ 0.1 “ => true
“abc” => false
“1 a” => false
“2e10” => true
def valid_num(a):
try:
a = float(a)
except:
return False
return True
  • Add Binary
Given two binary strings, return their sum (also a binary string).
For example,
a = “11”
b = “1”
Return “100”.
def bin_add(a, b):
if len(a) < len(b):
a, b = b, a
c = [0]*len(a)
b = '0'*(len(a)-len(b)) + b

carry = 0
for i in range(len(a)):
index = -i - 1 # from the right to the left
c[index] = carry + int(a[index]) + int(b[index])
carry = 0
if c[index] == 2:
c[index] = 0
carry = 1
if i == len(a) - 1:
c.insert(0, 1)
return ''.join([str(x) for x in c])

if __name__ == "__main__":
print bin_add('1', '1001')
print bin_add('11', '1')
print bin_add('0', '0')
  • Plus one
def plus_one(a):
# Error case and bad case
if not isinstance(a, list) or len(a) == 0:
raise ValueError
# Normal case
for i in reversed(range(len(a))):
if a[i] == 9:
a[i] = 0
else:
a[i] += 1
return a
# Special case
a.insert(0, 1)
return a

if __name__ == "__main__":
testcase1 = [1, 2, 3, 4]
print plus_one(testcase1)
testcase2 = [8, 9, 9, 9]
print plus_one(testcase2)
testcase3 = [9, 9, 9, 9]
print plus_one(testcase3)
testcase4 = 0
print plus_one(testcase4)
  • Count and Say
The count-and-say sequence is the sequence of integers beginning as follows:
1, 11, 21, 1211, 111221, …
1 is read off as “one 1” or 11.
11 is read off as “two 1s” or 21.
21 is read off as “one 2, then one 1” or 1211.
Given an integer n, generate the nth sequence.
def read(a):
# The array for counter and value
current = [0, 10]
result = []
for i in range(len(a)+1):
if i == len(a):
result += current
return result[2:] # return result from the last loop
if a[i] == current[1]:
current[0] += 1
else:
result += current
current[1] = a[i]
current[0] = 1
carrier = a[i]

def cnt_say(n):
# Set the initial seed
result = [1]
for i in range(n):
result = read(result)
# Return the result as a string
return ''.join(str(x) for x in result)

if __name__ == "__main__":
for i in range(10):
print cnt_say(i)
  • Read Numbers
1 -> one
100 -> one hundred
500234 -> five hundred thousand two hundred thirty four
1232232 -> 1 million two hundred ….
def read_piece(s):
if len(s) < 2:
return s
return s[0] + ' hundred ' + s[1:]

def read(s):
alen = len(s)
read2 = {0: '', 1: 'thousand', 2: 'million', 3: 'billion', 4: 'trillion'}
b = []
result = []
while alen > 0:
low = max(0, alen-3)
high = alen
b.append(s[low: high])
alen -= 3

for i, x in enumerate(reversed(b)):
index = len(b) - i - 1
result += [read_piece(x), read2[index]]
return ' '.join(n for n in result)

Use max/min functions to avoid conditions

The functions such as max() and min() play a role such as
if a < b:
a = b
Using them in programing will bring flexibility and simply coding.

1. Best Time to Buy and Sell Stock Total

Say you have an array for which the ith element is the price of a given stock on day i.
If you were only permitted to complete at most one transaction (ie, buy one and sell one share of the stock), design an algorithm to find the maximum profit.
The stock prices fluctuate, which results into profitable gaps(see the graph). The first question only wants to capture the possible profit given only one transaction. There are two other variants originating from it: unlimited transactions and two transactions.
from ggplot import *
import pandas as pd

# If the stock could be traded just once
from sys import maxint
def stock1(a):
min_price = maxint
max_profit = 0
for x in a:
min_price = min(x, min_price)
max_profit = max(x - min_price, max_profit)
return max_profit

def stock1A(a):
max_price = 0
max_profit = 0
for x in reversed(a):
max_price = max(x, max_price)
max_profit = max(max_price - x, max_profit)
return max_profit

#------------------------EXTENSION--------------------------------------------
# If the stock could be traded multiple times and you can only have one stock all time
def stock2(a):
max_profit = 0
for i in range(len(a)):
profit = max(a[i] - a[i-1], 0)
max_profit += profit
return max_profit

# If the stock could be traded at most twice and you can only have one stock all time
from collections import defaultdict
def stock3(a):
min_price = maxint
fst_max_profit = 0
h = defaultdict(list)
for i in range(len(a)):
min_price = min(a[i], min_price)
fst_max_profit = max(a[i] - min_price, fst_max_profit)
h[i].append(fst_max_profit)
max_price = 0
snd_max_profit = 0
for i in reversed(range(len(a))):
max_price = max(a[i], max_price)
snd_max_profit = max(max_price - a[i], snd_max_profit)
h[i].append(snd_max_profit)
return max([sum(h[i]) for i in h.keys()])

if __name__ == "__main__":
a = [1, 3, 6, 3, 1, 2, 4, 4, 2, 5]
print stock1(a)
print stock1A(a)
print stock2(a)
print stock3(a)

# Draw the plot of stock price
df = pd.DataFrame()
df['price'] = a
df['day'] = df.index + 1
print ggplot(aes(x='day', y='price'), data = df) + geom_line()

2. Minimum Window Substring

Given a string S and a string T, find the minimum window in S which will contain all the characters in T in complexity O(n).
For example,
S = “ADOBECODEBANC”
T = “ABC”
Minimum window is “BANC”.
def min_wds(s, t):
if not isinstance(s, str) or not isinstance(t, str):
raise ValueError("Must be strings")
if len(s) == 0 or len(t) == 0:
return ""
a = {}
for i in xrange(len(s)):
try:
a[s[i]] = i
except:
pass
start = len(s)
end = 0
cnt = 0
for x in t:
if a.has_key(x):
start = min(start, a[x])
end = max(end, a[x])
cnt += 1
if cnt == len(t):
return s[start:end+1]
return ""

if __name__ == '__main__':
print min_wds("ADOBECODEBANC", "ABC")

3. Maximum Subarray

Find the contiguous subarray within an array (containing at least one number) which has the largest sum.
For example, given the array [−2,1,−3,4,−1,2,1,−5,4],
the contiguous subarray [4,−1,2,1] has the largest sum = 6.
This question only asks about the maximum length for the non-repetitive subarray. Thus, two max functions will solve it quick.
from sys import maxint
def find_max(A):
overall_max = -maxint
running_max = 0
for x in A:
running_max += x
running_max = max(0, running_max)
overall_max = max(running_max, overall_max)
return overall_max

if __name__ == "__main__":
a = [-2,1,-3,4,-1,2,1,-5,4]
print find_max(a)

Use Python generator to solve the ‘find-all’ problems

The typical find-all questions are widely seen in daily coding. Time complexity and space complexity are not the topmost concerns for these questions. The biggest challenge is that the element number of the result is largely unknown and therefore the upper/lower limits are hard to be sought for the iteration. Thus, the recursion is ideal to replace the iteration, while the generator function in Python supplies a great vehicle to hold the output data.
In summary, the strategy of recursion + generator can be realized with the recursion part and the following driver to wrap it up.

1. Combination Sum

Given a set of candidate numbers (C) and a target number (T), find all unique combinations in C where the candidate numbers sums to T.
The same repeated number may be chosen from C unlimited number of times.
All numbers (including target) will be positive integers.
For example, given candidate set 2,3,6,7 and target 7,
A solution set is:
[7]
[2, 2, 3]
The first yield operator within the exit condition specifies the type of the returned object, while the two yield operators should keep the same type.
The while loop is used to add more conditions to accommodate the 2sum-like subtraction-oriented searching.
def cmb_sum_rcs(arr,  target, start = 0):
"""
A function as the recursion piece
"""

if target == 0:
yield [] # the list as the final format to return
# Use 'while' instead of 'for' to enter iteration given a boundary
while start < len(arr) and arr[start] <= target:
# 'target' instead of 'start' is the driver to jump out the iteration
for p in cmb_sum_rcs(arr, target - arr[start], start):
# Concatenate the elements as a list
yield p + [arr[start]]
start += 1

def cmb_sum(arr, target):
"""
A function to wrap up the recursion piece
"""

return list(cmb_sum_rcs(sorted(arr), target)) # sorting realizes all combinations

if __name__ == "__main__":
print cmb_sum([2, 3, 6, 7, 1], 7)

2. Letter Combinations of a Phone Number

Given a digit string, return all possible letter combinations that the number could represent.
Input:Digit string “23”
Output: [“ad”, “ae”, “af”, “bd”, “be”, “bf”, “cd”, “ce”, “cf”].
This questions is similar to the flatten function at the previous post: the loop occurs at the very beginning.
def phone_nums_rcs(a, arr, start = 0):
"""
A function as the recursion piece
"""

# Retreive the corresponding key-board values from the lookup array
str = a[int(arr[start])]
for x in str:
# Jump out once the last char is reached
if start >= len(arr) - 1:
yield x
else:
for p in phone_nums_rcs(a, arr, start + 1):
yield x + p # concatenate the strings

def phone_nums(arr):
"""
A function to wrap up the recursion piece
"""

a = ["","","abc","def","ghi","jkl","mno","pqrs","tuv","wxyz"]
return list(phone_nums_rcs(a, arr))

if __name__ == "__main__":
print phone_nums("239")

3. Permutations

Given a collection of numbers, return all possible permutations.
Compared with the previous implementation of the permutation function, the coding style of generator is easier to understand and maintain the logic.
For example,
[1,2,3] have the following permutations:
[1,2,3], [1,3,2], [2,1,3], [2,3,1], [3,1,2], and [3,2,1].
def permu_rcs(arr, start = 0):
"""
A function as the recursion piece
"""

if start >= len(arr):
# The copy of the inital input array is passed instead of the reference
yield arr[:]
for i in range(start, len(arr)):
arr[start], arr[i] = arr[i], arr[start]
for x in permu_rcs(arr, start + 1):
yield x
arr[start], arr[i] = arr[i], arr[start]

def permu(arr):
"""
A function to wrap up the recursion piece
"""

return list(permu_rcs(arr))

if __name__ == "__main__":
print permu([0, 1, 2])

Some differences of the data frames between R and Pandas

Pandas is an emerging open source framework on Python and a substitute to R. Both apply a data structure called DataFrame. Although their data frames look quite similar, there are some cautions for a R programmer who like to play Pandas. A …