With SAS Global Forum 2015 just a few weeks away, I’m spending most of my time at the moment working on a demo version of our next Metacoda Plug-ins release. We’ll be showing this upcoming version at our Metacoda booth in The Quad (previously known as the SAS Support and Demo Area). The next Metacoda […]
Category: SAS
Who paid $500k for a US visa? Over 10,000 people!
Having spent many years in graduate school, and living in the Research Triangle Park (RTP) in North Carolina, I have a lot of friends from other countries. Therefore when I recently saw some stories & graphs about EB-5 visas (where you invest a cool half-million US $ to bypass the […]
The post Who paid $500k for a US visa? Over 10,000 people! appeared first on The SAS Training Post.
Google Scholar Finds Far More SPSS Articles; Analytics Forecast Updated
Only last August I wrote that among scholars, the use of R had probably exceeded that of SPSS to become their most widely used software for analytics. That forecast was based on Google Scholar searches focused on one year at a … Continue reading →![]()
6 Tips for Data Visualization from a Floral Designer
You never know where you will find inspiration. This past weekend I attended the NC Museum of Art Art in Bloom Festival. The idea is that local floral designers use a museum masterpiece to draw inspiration for a floral design. [More: WRAL video about event] It was incredible to see how someone could paint with flowers. Turns out there are many Art in Bloom events like …
Post 6 Tips for Data Visualization from a Floral Designer appeared first on BI Notes for SAS® Users. Go to BI Notes for SAS® Users to subscribe.
[[ This is a content summary only. Visit my website for full links, other content, and more! ]]
Spotting a misleading chart
Everyone loves a good conspiracy theory – hopefully you’ll enjoy mine about the number of US E1 visas! I was perusing some of the US government charts, and found one on US immigration visas that caught my attention. It was a 3D bar chart, and since I always mistrust 3D […]
The post Spotting a misleading chart appeared first on The SAS Training Post.
When art and analytics collide
The best graphs are both beautiful and informative – a smooth blend of art and analytics. But more often than not, the two collide rather than blending smoothly… Here is a link to a artistic infographic I recently saw posted by Vendavo on twitter. Their message (80% of your profit is generated […]
The post When art and analytics collide appeared first on The SAS Training Post.
Deploy a minimal Spark cluster
Requirements
-
Intranet speedThe cluster should easily copy the data from one server to another. MapReduce always shuffles a large chunk of data throughout the HDFS. It’s best that the hard disk is SSD.
-
Elasticity and scalabilityBefore scaling the cluster out to more machines, the cloud should have some elasticity to size up or size down.
-
Locality of HadoopMost importantly, the Hadoop cluster and the Spark cluster should have one-to-one mapping relationship like below. The computation and the storage should always be on the same machines.
| Hadoop | Cluster Manager | Spark | MapReduce |
|---|---|---|---|
| Name Node | Master | Driver | Job Tracker |
| Data Node | Slave | Executor | Task Tracker |
Choice of public cloud:
-
From storage to computationHadoop’s S3 is a great storage to keep data and load it into the Spark/EC2 cluster. Or the Spark cluster on EC2 can directly read S3 bucket such as s3n://file (the speed is still acceptable). On DigitalOcean, I have to upload data from local to the cluster’s HDFS.
-
DevOps tools:
-
AWS: spark-ec2.py
- With default setting after running it, you will get
- 2 HDFSs: one persistent and one ephemeral
- Spark 1.3 or any earlier version
- Spark’s stand-alone cluster manager
- A minimal cluster with 1 master and 3 slaves will be consist of 4 m1.xlarge EC2 instances
- Pros: large memory with each node having 15 GB memory
- Cons: not SSD; expensive (cost $0.35 * 6 = $2.1 per hour)
- With default setting after running it, you will get
-
DigitalOcean: https://digitalocean.mesosphere.com/
- With default setting after running it, you will get
- HDFS
- no Spark
- Mesos
- OpenVPN
- A minimal cluster with 1 master and 3 slaves will be consist of 4 2GB/2CPUs droplets
- Pros: as low as $0.12 per hour; Mesos provide fine-grained control over the cluster(down to 0.1 CPU and 16MB memory); nice to have VPN to guarantee the security
- Cons: small memory(each has 2GB memory); have to install Spark manually
- With default setting after running it, you will get
-
Add Spark to DigitalOcean cluster
Then all the deployment onto the DigitOcean is just one command line.
# 10.1.2.3 is the internal IP address of the master
fab -H 10.1.2.3 deploy_spark
Euro vs Dollar exchange rate: An historic event?
I recently read a Washington Post article about the euro versus the dollar, and I wanted to analyze the data myself to see whether the article was simply stating the facts, or “sensationalizing” things. The washingtonpost.com article started with the headline, “This is historic: The dollar will soon be worth […]
The post Euro vs Dollar exchange rate: An historic event? appeared first on The SAS Training Post.