How to install setup mpi or mpich and run hello world program in Linux cluster with slurm files?

At first, you need to download the mpich file from mpich website. After getting the link, in the linux command prompt we need to write the command

wget http://www.mpich.org/static/downloads/3.3.2/mpich-3.3.2.tar.gz

To unzip the file, we need to use tar unzip command

tar -xvf mpich-3.3.2.tar.gz

Then we need to export path environment variable like the below command.


export PATH=$PATH:/home/username/mpi/mpich_install/bin

echo $PATH will show the path values like this.


/cm/shared/apps/mpich/ge/gcc/64/3.3/bin:/cm/shared/apps/slurm/18.08.9/sbin:/cm/shared/apps/slurm/18.08.9/bin:/cm/local/apps/gcc/8.2.0/bin:/usr/lib64/qt-3.3/bin:/share/apps/rc/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/ibutils/bin:/opt/dell/srvadmin/bin:/home/username/.local/bin:/home/username/bin:/home/username/mpi/mpich_install/bin

Now, we need to write the hello_world problem.
the hello_world.c program is below:


#include
#include
#include

int main(int argc, char *argv[])
{

int my_rank; int size;

MPI_Init(&argc, &argv); /*START MPI */

/*DETERMINE RANK OF THIS PROCESSOR*/
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);

/*DETERMINE TOTAL NUMBER OF PROCESSORS*/
MPI_Comm_size(MPI_COMM_WORLD, &size);

printf("Hello world! I'm rank (processor number) %d of %d processor \n", my_rank, size);

MPI_Finalize(); /* EXIT MPI */
return 0;
}

We need to compile the program. The code is below:


mpicc hello_world.c -o hello_world.exe

The executable file will be hello_world.exe. We can run the mpi file by the command

mpirun ./hello_world1.exe

The output will be
Hello world! I'm rank (processor number) 0 of 1 processor

If we run the command with n tasks, it will show the below output.

$ mpirun -n 4 ./hello_world1.exe
Hello world! I'm rank (processor number) 3 of 4 processor
Hello world! I'm rank (processor number) 0 of 4 processor
Hello world! I'm rank (processor number) 2 of 4 processor
Hello world! I'm rank (processor number) 1 of 4 processor

Now, we need to run the mpi program from slurm script. By using the vim we can add the bash scripting file to the folder.

vim submit_helloworld.sh

it will show the editor where we can insert values. To insert text we need to press i, and after the editing we need to press escape key(esc) and :wq (to save and exit).
we can add the text to the file


#!/bin/bash -x

#SBATCH--job-name=hello_world
#SBATCH--output=out_hello_world.%j
#SBATCH--error=err_hello_world.%j

#SBATCH--partition=express
#SBATCH--nodes=1
#SBATCH--ntasks=4

module load mpich/ge/gcc/64/3.3

#mpiexec ./hello_world.exe
mpirun ./hello_world.exe


here, job_name is hello_world, output file is out_hello_world, error file is err_hello_world. %j is for the job id.
Number of tasks ntasks is 4, number of nodes is 1.
We need to load mpich module form the module loader

My cat command we can see the text.


cat submit_helloworld.sh

we can submit the shell script to the cluster by the below command and it will show us the job id


sbatch submit_helloworld.sh
Submitted batch job 6266850

Here job id is 6266850.

To see the job status, we can use squeue command


squeue --user=userid

After the finishing the job, it squeue will not show no task in the queue.

By pressing ls command we can see the out_hello_world.6266850 file in the directory.

By cat out_hello_world.6266850 we can see the output.


cat out_hello_world.6266850
Hello world! I'm rank (processor number) 0 of 4 processor
Hello world! I'm rank (processor number) 1 of 4 processor
Hello world! I'm rank (processor number) 2 of 4 processor
Hello world! I'm rank (processor number) 3 of 4 processor

How to search mpi module available to load in Linux cluster?

To find the module to load Linux cluster, we need to type

module spider module_name_key
module_name_key is the partial name which we can use for searching.
it will bring us the modules which are available for us to load.
For example,
I wanted to load openmpi module from the Linux cluster. So I need to know the exact name of mpi module of the cluster. I search it by
module spider openmpi

it showed me the openmpi modules which are available for loading.


-------------------------------------------------------------------------------------------------------------------------------------------------------------------
OpenMPI:
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Description:
The Open MPI Project is an open source MPI-3 implementation.

Versions:
OpenMPI/1.6.4-GCC-4.7.2
OpenMPI/1.6.5-GCC-4.8.1
OpenMPI/1.6.5-GCC-4.8.2
OpenMPI/1.8.4-GCC-4.8.4
OpenMPI/1.8.8-GNU-4.9.3-2.25
OpenMPI/1.8.8-iccifort-2015.3.187-GNU-4.9.3-2.25
OpenMPI/1.10.2-GCC-4.9.3-2.25
OpenMPI/1.10.3-GCC-5.4.0-2.26
OpenMPI/1.10.4-intel-2016a
OpenMPI/1.10.7-GCC-6.1.0-2.27
OpenMPI/1.10.7-intel-2017a
OpenMPI/2.0.2-GCC-6.3.0-2.27
OpenMPI/2.0.2-gcccuda-2017.01
OpenMPI/2.1.1-GCC-6.4.0-2.28
OpenMPI/2.1.2-GCC-6.4.0-2.28
OpenMPI/2.1.2-gcccuda-2018a
OpenMPI/3.0.0-iccifort-2017.1.132-GCC-6.3.0-2.27
OpenMPI/3.1.1-GCC-7.3.0-2.30
OpenMPI/3.1.1-gcccuda-2018b
OpenMPI/3.1.2-gcccuda-2018b
OpenMPI/3.1.3-GCC-8.2.0-2.31.1
OpenMPI/4.0.1-GCC-8.3.0-2.32

--------------------------------------------------------------------------------------------------------------------------------------------------------------------
For detailed information about a specific "OpenMPI" module (including how to load the modules) use the module's full name.


Suppose, we want to load the module OpenMPI/1.6.4-GCC-4.7.2

We need to load the module by this command

module load OpenMPI/1.6.4-GCC-4.7.2

By typing the following command I can check the module list in the cluster.


module list

It will show you the list which is loaded for you in the cluster.

To be specific


$ module list

Currently Loaded Modules:
1) shared 2) rc-base 3) DefaultModules 4) gcc/8.2.0 5) slurm/18.08.9 6) mpich/ge/gcc/64/3.3

$ module load OpenMPI/1.6.4-GCC-4.7.2
$ module list

Currently Loaded Modules:
1) shared 3) DefaultModules 5) slurm/18.08.9 7) GCC/4.7.2 9) OpenMPI/1.6.4-GCC-4.7.2
2) rc-base 4) gcc/8.2.0 6) mpich/ge/gcc/64/3.3 8) hwloc/1.6.2-GCC-4.7.2

How to overcome shell debugging error in linux cluster?

When I was working with bash files, it showed me this type of error.


Shell debugging temporarily silenced: export LMOD_SH_DBG_ON=1 for this output (/usr/share/lmod/lmod/init/bash)
Shell debugging restarted

To overcome this type of error, we need to type

export LMOD_SH_DBG_ON=1

Find out Odd and even number in different ways

We can use different ways to find odd or even number.

Modulo 2 operation

It is easy to find out the even or odd number by modulus operator. Modulo 2 can find out the remainder. If the modulo  2 of the number is 0 then the number is even, otherwise it is odd.

if(n%2)
printf("odd\n");
else
printf("even\n");

From the above code, we can easily understand that when n%2 is 1, the n is odd otherwise even.  We can see that in the above code we use (n%2),  not using (n%2==1) because in the bracket of if we need to check true or false. here, 0 means flase and 1 means true. For the odd number the modulo 2 is 1 while the even number is 0. Relational equal operator is not needed here. That’s why we skip the == operator.

Bitwise and (&) operator

We can check the odd even by using bitwise & operator. In the below code, n&1 will return 1 when it is odd and otherwise it will be 0. if the number is 7 then the binary format  of the number is 111 and doing bitwise ‘and’ operator with 1 will return 1.

0111 <— binary of 7

&     <— bitwise and

0001  <— binary of 1

——

0001    <— binary format

=1 <– decimal number

If the number is 4 then the binary format of the number is 100 and by doing bitwise and operation it will return 0.

0100 <— binary of 4

&        <— bitwise and

0001   <— binary of 1

——

0000  <— binary format

=0    <– decimal number

 

if(n&1)
printf("odd\n");
else
printf("even\n");

Without using modulo (%) operator

We can find out odd even without using % modulus operator. By subtracting the number dividing by 2 and multiplying by 2 from the number, we can have one for odd number otherwise zero for even numbers.

Suppose, 7-(7/2)*2 =7-3*2=7-6=1

6-(6/2)*2=6-3*2=6-6=0

if(n-(n/2)*2)
printf("odd\n");
else
printf("even\n");

One bit left shifting does the operation of division whereas right shifting does the operation of multiplication by 2. So the above code can be rewritten  in the below code.

if(n-(n>>1)<<1)
printf("odd\n");
else
printf("even\n");

If you do not understand the shifting operation, the explanation is given in the later part.

Conditional Operator

By using conditional operator, we can replace if -else statement. We need to write less code for that.   In the conditional operator we have 3 expressions.

1st expression is the relational operator, which gives us true or false , 2nd expression is the statement if the relation is true and last part is the expression if the relation is false. After the relational operation we need to put ‘?’ sign and ‘:’ sign will be used as a separator for the true and false expression.

if(n%2==1)
printf("odd\n");
else
printf("even\n");

The above code can be rewritten by the conditional (ternary) operator.

char *output;
output=(n%2==0)?"even\n":"odd\n";
printf(output);

In the above code, *output indicates the character pointer. Character pointer means collection of character. We need the declare the character at the beginning of the code where we declare the variable.

In the conditional operator if the remainder(modulo) 2 of the number is 0 then the number is even and otherwise odd.

we can write the code using ternary operator (conditional operator). The below code will bring the same output.

We can rewrite the above code, on the following way:

printf((n%2==0)?"even\n":"odd\n");

print function will print the string (collection of characters) and we need not to use the pointer output as an extra variable.

Right shift and left shift

We can rewrite the modulo operation by shifting operator.

The modulo 2 operation can be replace by doing the right shifting and left shifting. When we do right shift that means the number is divided by 2 that means last bit of the binary number will be omitted.   for example, n=7;

n>>1 means (7>>1)

=(111>>1)  <— this is in binary form

=(011)  <— binary form

=3 <– decimal form

When we do left shift then the number is multiply by 2 that means extra 0 will be added at the end (right side).   for exam, n=7; n<<1 means 7<<1

=(111<<1) <– this is in the binary form

=(1110) <— binary form

=14 <— decimal form

if we do the both operation on the number it will not be equal if it is odd. But the for the even number it will be equal the previous number.

x=n;
n=n>>1;
n=n<<1;
printf((n==x?"even\n":"odd\n");

Suppose, n=7; x=n=7;

n=n>>1 <— now the n is 3

n=n<<1 <— now n is 6

Here, n and x are not equal. So  it is odd number.

Again, n=8;x=n=8;

n=n>>1 <— now the n is 4

n=n<<1 <— now n is 8

Here, n and x are  equal. So  it is even number.

In the above code, we use variable x  as an extra variable. We can escape using the variable and write the code like this:

n==(n>>1)<<1

The above expression will be true if the number is even otherwise odd. So we can write the expression in the printf function.

printf((n==(n>>1)<<1)?"even\n":"odd\n");

 

Big number odd or even

For the big numbers, we can handle the numbers by using string. For that, we need to declare string at first using char n[10000]. We can get the last digit of the number from the length of the string then subtract 1 from it.

l=strlen(n);
last_digit=(int) (n[l-1]-'0');

printf((last_digit==(last_digit>>1)<<1)?"even\n":"odd\n");

Spark SQL in json dataset in spark and Spark Web UI

For sqlContext we need to import sqlContext


from pyspark.sql import SQLContext

Create the SQL context and now entering sql domain


sqlContext=SQLContext(sc)

input a json for a banch of people
the people json file will be like this.

{"name":"Michael"}
{"name":"Andy", "age":30}
{"name":"Justin", "age":19}


users=sqlContext.jsonFile("people.json")

Register the table for users.


users.registerTempTable("users")

Select name, age of users table who are over 21. Nothing happens because it is lazy.

over21=sqlContext.sql("SELECT name, age FROM users WHERE age >21")

collect the over21 datas. It shows one person who is Andy and 30 age
over21.collect()

Spark Web UI

http;//localhost:4040

Apache Spark localhost
Apache Spark localhost 4040 jobs list

User interface for like traditional map reduce. Jobs list are here.By drilling in , you will get more and more information. you can see how long each node executes.

Find url link and content from a html file using python regular expression

At first you need to import regular expression library

import re


if I stored a html file in my G drive, so html file should load by the comment


fp=open("G://pashabd.html")

in the html code it has this html syntax

<!DOCTYPE HTML>
<html lang="en-US">
<head>
<meta charset="UTF-8">
<title></title>
</head>
<body><ul>
<li><a href="http://pashabd.com/mapreduce-of-local-text-file-using-apache-spark-pyspark/">MapReduce of local text file using Apache Spark– pyspark</a></li>
<li><a href="http://pashabd.com/how-to-install-spark-in-windows-8/">How to install apache spark in Windows 8?</a></li>
<li><a href="http://pashabd.com/how-to-install-spark-in-ubuntu-14-04/">How to install spark in ubuntu 14.04?</a></li>

</ul>

</body>
</html>

file pointer fp read the file in the content variable


content=fp.read()

to find the content in the hyper link findall function is used

match = re.findall(r'<a href="(.*?)".*>(.*)</a>', content)

match has the contents. So by if it is checked and link with title printed using for loop

if match:
for link, title in match:
print "link %s -&gt; %s" % (link, title)

The output will be like this.

Regular expression for HTML content phython
Regular expression for HTML content phython

MapReduce of local text file using Apache Spark– pyspark

To run the pyspark, for an RDD from a local text file. We need to create a text file using gedit or any kind of editor

In my file I have inserted the text

Hello, My name is Kamal.
I live in Bangladesh.
My language is Bangla.
My favorite color is orange.
I can ride bicyle.
If I eat something, I would eat an orange.

I saved the file in textData.txt format.

To run an RDD from the local text file, we need to write the command

textData=sc.textFile("textData.txt")

Here spark context(sc) run the file in Resilient Distributed Dataset(RDD) mode .For view the content of the RDD, we need to write the command

for line in textData.collect():
... print line
...

You need to be careful about indentation if you are new in python. The ouput will be like this

Hello, My name is Kamal.
I live in Bangladesh.
My language is Bangla.
My favorite color is orange.
I can ride bicyle.
If I eat something, I would eat an orange.

Do do the lazily filter any lines that contain the word “orange”

orangeLines=textData.filter(lambda line: "orange" in line)

To show the orange lines.

for line in orangeLines.collect():
... print line
...

To make all the letters in orangeLines capital

>>> caps=orangeLines.map(lambda line: line.upper())
>>> for line in caps.collect():
... print line
...

For word count program, at fist we need to split the words from line. For that flat map transformation in word to word data. That breaks up individual words.

>>> words=textData.flatMap(lambda line: line.split(" "))

Then mapping for every single word. The the words in reducedByKey method in back to back this is called chaining by period sign. Here mapping one for every single word and x+y sum up the word how many times it occours.

>>> result=words.map(lambda x: (x,1)).reduceByKey(lambda x,y: x+y)

To show output
>>> for line in result.collect():
… print line

the output will be like that–

words map
Map words

 

How to install apache spark in Windows 8?

At first you need to download spark library from apache spark website. The website is


http://spark.apache.org/downloads.html

Apache spark download
Apache Spark download for windows

After download, you will see the spark file like this.

To unzip the file, you need to have 7-zip exe. You can dowload it from


http://www.7-zip.org/download.html

By using 7-zip you can easily unzip the files. After unzip, you need to go to command prompt. Go to the spark folder like this

spark folder command prompt
Apache Spark folder command prompt

picture of spark folder command prompt.

Then you need to write the command


bin\spark-shell

The output will be like that

Apache Spark logo
Apache Spark logo

scala based prompt will be come up. You can read the readme.md using the programming. You need to write the following command.

val textFile = sc.textFile("README.md")

To count the text file line number, you need to write the command

textFile.count()

It will show the output which is 95.

To exit from scala library, you need to type the command

exit()

Then it will be in the command prompt.

How to install spark in ubuntu 14.04?

The post will describe step by step procedure of spark installation. To install spark in ubuntu machine, you need to have java installed in your computer. Using following commands easily install java in ubuntu machine.

$ sudo apt-add-repository ppa:webupd8team/java
$ sudo apt-get update
$ sudo apt-get install oracle-java7-installer

To check the Java installation successful

$ java -version

It shows intalled java version

java version "1.7.0_80"
Java(TM) SE Runtime Environment (build 1.7.0_80-b15)
Java HotSpot(TM) Server VM (build 24.80-b11, mixed mode)

To install spark in your machine, you need to know where are you in the commnad line directory. Write the command in the shell

$ pwd

It will print working directory in the shell.

Apache spark will be downloaded in the directory. Then you need to write wget command to download the spark.

wget http://d3kbcqa49mib13.cloudfront.net/spark-1.6.0-bin-hadoop2.4.tgz

the spark link which is in the wget command is found from the url

https://spark.apache.org/downloads.html


Apache Spark download
Apache Spark download website

You need to select Pre-built for Hadoop 2.4 and Later and Direct download in download type. Click in the Download spark link and copy the link.

It will takes time to download the spark files.It is about 284.9 MB

After download you can check the directory by the command

ls

You need to untar the file. So the command will like this

tar xvf spark-1.6.0-bin-hadoop2.4.tgz

You need to change directory by the command

cd spark-1.6.0-bin-hadoop2.4/

Now you can run spark. At first we will try this for scala and later pyspark mode.

To run in the scala mode,you need to run the command

bin/spark-shell

It will show the spark command line like the below picture

 

Spark scala
spark scala library

You can read the readme.md using the programming. You need to write the following command.

 

val textFile = sc.textFile("README.md")

 

To count the text file line number, you need to write the command

textFile.count()

It will show the output which is 95.

To exit from scala library, you need to type the command

exit()

Then it will be in the shell.

Now we will try to run spark bypyspark library. To run pyspark, type the command

 

bin/pyspark

It will show the command prompt like the below picture

Spark pySpark
spark pySpark library

To run the pyspark, for an RDD from a local text file. We need to create a text file using gedit or any kind of editor

In my file I have inserted the text

Hello, My name is Kamal.
I live in Bangladesh.
My language is Bangla.
My favorite color is orange.
I can ride bicyle.
If I eat something, I would eat an orange.

I saved the file in textData.txt format.

To run an RDD from the local text file, we need to write the command

textData=sc.textFile("textData.txt")

Here spark context(sc) run the file in Resilent Distributed Dataset(RDD) mode .For view the content of the RDD, we need to write the command

for line in textData.collect():
... print line
...

You need to be careful about indentation if you are new in python. The ouput will be like this

Hello, My name is Kamal.
I live in Bangladesh.
My language is Bangla.
My favorite color is orange.
I can ride bicyle.
If I eat something, I would eat an orange.

Do do the lazily filter any lines that contain the word “orange”

orangeLines=textData.filter(lambda line: "orange" in line)

To show the orange lines.

for line in orangeLines.collect():
... print line
...

To make all the letters in orangeLines capital

>>> caps=orangeLines.map(lambda line: line.upper())
>>> for line in caps.collect():
... print line
...

For word count program, at fist we need to split the words from line. For that flat map transformation in word to word data. That breaks up individual words.

>>> words=textData.flatMap(lambda line: line.split(" "))

Then mapping for every single word. The the words in reducedByKey method in back to back this is called chaining by period sign. Here mapping one for every single word and x+y sum up the word how many times it occours.

>>> result=words.map(lambda x: (x,1)).reduceByKey(lambda x,y: x+y)

To show output
>>> for line in result.collect():
… print line

the output will be like that–

words map
Map words

For sqlContext we need to import sqlContext

from pyspark.sql import SQLContext

sqlContext=SQLContext(sc)
users=sqlContext.jsonFile("people.json")
users.registerTempTable("users")
over21=sqlContext.sql("SELECT name, age FROM users WHERE age >21")
over21.collect()

Spark Web UI

http;//localhost:4040

Apache Spark localhost
Apache Spark localhost 4040 jobs list

User interface for like traditional map reduce. Jobs list are here.By drilling in , you will get more and more information. you can see how long each node executes.