Spark SQL in json dataset in spark and Spark Web UI

For sqlContext we need to import sqlContext


from pyspark.sql import SQLContext

Create the SQL context and now entering sql domain


sqlContext=SQLContext(sc)

input a json for a banch of people
the people json file will be like this.

{"name":"Michael"}
{"name":"Andy", "age":30}
{"name":"Justin", "age":19}


users=sqlContext.jsonFile("people.json")

Register the table for users.


users.registerTempTable("users")

Select name, age of users table who are over 21. Nothing happens because it is lazy.

over21=sqlContext.sql("SELECT name, age FROM users WHERE age >21")

collect the over21 datas. It shows one person who is Andy and 30 age
over21.collect()

Spark Web UI

http;//localhost:4040

Apache Spark localhost
Apache Spark localhost 4040 jobs list

User interface for like traditional map reduce. Jobs list are here.By drilling in , you will get more and more information. you can see how long each node executes.

Find url link and content from a html file using python regular expression

At first you need to import regular expression library

import re


if I stored a html file in my G drive, so html file should load by the comment


fp=open("G://pashabd.html")

in the html code it has this html syntax

<!DOCTYPE HTML>
<html lang="en-US">
<head>
<meta charset="UTF-8">
<title></title>
</head>
<body><ul>
<li><a href="http://pashabd.com/mapreduce-of-local-text-file-using-apache-spark-pyspark/">MapReduce of local text file using Apache Spark– pyspark</a></li>
<li><a href="http://pashabd.com/how-to-install-spark-in-windows-8/">How to install apache spark in Windows 8?</a></li>
<li><a href="http://pashabd.com/how-to-install-spark-in-ubuntu-14-04/">How to install spark in ubuntu 14.04?</a></li>

</ul>

</body>
</html>

file pointer fp read the file in the content variable


content=fp.read()

to find the content in the hyper link findall function is used

match = re.findall(r'<a href="(.*?)".*>(.*)</a>', content)

match has the contents. So by if it is checked and link with title printed using for loop

if match:
for link, title in match:
print "link %s -&gt; %s" % (link, title)

The output will be like this.

Regular expression for HTML content phython
Regular expression for HTML content phython

MapReduce of local text file using Apache Spark– pyspark

To run the pyspark, for an RDD from a local text file. We need to create a text file using gedit or any kind of editor

In my file I have inserted the text

Hello, My name is Kamal.
I live in Bangladesh.
My language is Bangla.
My favorite color is orange.
I can ride bicyle.
If I eat something, I would eat an orange.

I saved the file in textData.txt format.

To run an RDD from the local text file, we need to write the command

textData=sc.textFile("textData.txt")

Here spark context(sc) run the file in Resilient Distributed Dataset(RDD) mode .For view the content of the RDD, we need to write the command

for line in textData.collect():
... print line
...

You need to be careful about indentation if you are new in python. The ouput will be like this

Hello, My name is Kamal.
I live in Bangladesh.
My language is Bangla.
My favorite color is orange.
I can ride bicyle.
If I eat something, I would eat an orange.

Do do the lazily filter any lines that contain the word “orange”

orangeLines=textData.filter(lambda line: "orange" in line)

To show the orange lines.

for line in orangeLines.collect():
... print line
...

To make all the letters in orangeLines capital

>>> caps=orangeLines.map(lambda line: line.upper())
>>> for line in caps.collect():
... print line
...

For word count program, at fist we need to split the words from line. For that flat map transformation in word to word data. That breaks up individual words.

>>> words=textData.flatMap(lambda line: line.split(" "))

Then mapping for every single word. The the words in reducedByKey method in back to back this is called chaining by period sign. Here mapping one for every single word and x+y sum up the word how many times it occours.

>>> result=words.map(lambda x: (x,1)).reduceByKey(lambda x,y: x+y)

To show output
>>> for line in result.collect():
… print line

the output will be like that–

words map
Map words

 

How to install apache spark in Windows 8?

At first you need to download spark library from apache spark website. The website is


http://spark.apache.org/downloads.html

Apache spark download
Apache Spark download for windows

After download, you will see the spark file like this.

To unzip the file, you need to have 7-zip exe. You can dowload it from


http://www.7-zip.org/download.html

By using 7-zip you can easily unzip the files. After unzip, you need to go to command prompt. Go to the spark folder like this

spark folder command prompt
Apache Spark folder command prompt

picture of spark folder command prompt.

Then you need to write the command


bin\spark-shell

The output will be like that

Apache Spark logo
Apache Spark logo

scala based prompt will be come up. You can read the readme.md using the programming. You need to write the following command.

val textFile = sc.textFile("README.md")

To count the text file line number, you need to write the command

textFile.count()

It will show the output which is 95.

To exit from scala library, you need to type the command

exit()

Then it will be in the command prompt.

All free things for web

From this site: http://freebie.supply/all

FREE WEBSITE + LOGO + HOSTING + INVOICING

FREE BUSINESS / PROJECT NAME GENERATORS

WRITING / BLOGGING

  • Hemingway: Hemingway App makes your writing bold and clear.
  • Grammarly: Finds & corrects mistakes of your writing.
  • Medium: Everyone’s stories and ideas.
  • ZenPen: The minimal writing tool of web.
  • Liberio: Simple eBook creation and publishing right from Google Drive.
  • Editorial Calendar: See all your posts, drag & drop to manage your blog.
  • Story Wars: Writing stories together.
  • Headline Analyzer: Emotional marketing value headline analyzer.
  • WP Hide Post: Control the visibility of items on your blog.
  • Social Locker: Ask visitors “to pay” for your content with a tweet, etc.
  • Egg Timer: Set a time and bookmark it for repeated use.

FIND (TRENDING) CONTENT (IDEAS)

 

FREE SEO + WEBSITE ANALYZERS

FREE IMAGE OPTIMIZERS

FREE IMAGE EDITORS

  • Canva: Amazingly simple graphic design for bloggers.
  • Pixlr: Pixlr Editor is a robust browser photo editor.
  • Skitch: Get your point across with fewer words.
  • Easel.ly: Empowers anyone to create & share powerful visuals.
  • Social Image Resizer Tool: Create optimized images for social media.
  • Placeit: Free product mockups & templates.
  • Recite: Turn a quote into a visual masterpiece.
  • Meme Generator: The first online meme generator.

COLLECT & SEND EMAILS FOR FREE

FREE SOCIAL MEDIA + COMMUNITY MANAGEMENT + SURVEYS

A/B TESTS & GROWTH HACKING

  • Petit Hacks: Acquisition, retention, & revenue hacks used by companies.
  • Optimizely: One optimization platform for websites and mobile apps.
  • Hello Bar: Tool for A/B testing different CTAs & power words.
  • GrowthHackers: Unlocking growth. Together.

 

FREE DESIGN RESOURCES

COLOR PICKERS

INSPIRATION

  • MaterialUp: Daily material design inspiration.
  • FLTDSGNDaily showcase of the best flat UI design websites and apps.
  • Site Inspire: Web design inspiration.
  • UI Cloud: The largest user interface design database in the world.
  • Moodboard: Build a beautiful moodboard and share the result.
  • Crayon: The most comprehensive collection of marketing designs.
  • Land-Book: Product landing pages gallery.
  • Ocean: A community of designers sharing feedback.
  • Dribbble: Show and tell for designers.
  • Behance: Showcase & discover creative work.
  • Pttrns: Mobile user interface patterns.
  • Flat UI Design: Useful board I discovered thanks to Erik.
  • Awwwards: The awards for design, creativity and innovation.
  • The Starter Kit: Curated resources for developers and designers.
  • One Page Love: Resource for one page website inspiration.
  • UI Parade: User interface design tools and design inspiration.
  • The Best Designs: The best of web design.
  • Agile DesignersBest resources for designers & developers.
  • Niice: A search engine with taste.

FREE STOCK PHOTOGRAPHY

FREE TYPOGRAPHY

FREE ICONS

FREE USEFUL STUFF

 

BACKGROUND SOUND TO FOCUS

  • NoisliBackground noise & color generator.
  • Noizio: Ambient sound equalizer for relax or productivity.
  • Defonic: Combine the sounds of the world into a melody.
  • Designers.mx: Curated playlists by designers, for designers.
  • Coffitivity: Stream the sounds of a coffee shop at work.

AVOID DISTRACTION

  • Self ControlMac: free application to help you avoid distracting websites.
  • Cold Turkey: Windows: temporarily block yourself off of distracting websites.

ORGANIZE & COLLABORATE

  • Trello: Keeps track of everything.
  • Evernote: The workspace for your life’s work.
  • Dropbox: Free space up to 2GB.
  • Yanado: Tasks management inside Gmail.
  • Wetransfer: Free transfer up to 2GB.
  • Drp.io: Free, fast, private and easy image and file hosting.
  • Pocket: View later, put it in Pocket.
  • Mailtoself: An iOS extension to mail notes to yourself from any app.
  • List.ly: Discover and create great lists.
  • MarkticleMark your reading progress in articles for later.

DIGITAL NOMADS & REMOTE WORKING

 

DISCOVER TOOLS & STARTUPS

BUILD TOGETHER

LEARN

NEWSLETTERS THAT DON’T SUCK

USEFUL

Continue reading All free things for web