How to setup a three machine Riak cluster

Go back to Table of Contents

This article refers to Riak 0.7.1

I had a set of three physical dedicated Ubuntu machines which I wanted to turn into a Riak cluster. They were all sitting about idle, have remote (ssh) root access, and each has a fixed IP addresse on my local network.

So first I wanted to find out the specification for each so that I can find out their specs. The machines are on my local network with the IP addresses:

192.168.1.6
192.168.1.7
192.168.1.8

I am not going to use machine names for this article as then we have to worry about the complications of resolving the names which can have its own problems. So for each of the machines I do the following:

ssh 192.168.1.6
lshw
cat /etc/issue

: which return alot (several pages) of information about the machine. Once i had waded through these pages the following specifications resulted:

192.168.1.6
    Acer Aspire L5100
    Ubuntu 8.04.3
    AMD Athlon(tm) 64 X2 Dual Core Processor 5000+
    3GB RAM
    Corsair 64 GB SSD with EXT3
    Radeon 690 video card

192.168.1.7
    Acer Aspire L5100
    Ubuntu 8.04.3
    AMD Athlon(tm) 64 X2 Dual Core Processor 5000+
    3GB RAM
    Corsair 64 GB SSD with EXT3
    Radeon RS690 video card

192.168.1.8
    Acer Aspire L5100
    Ubuntu 9.10
    AMD Athlon(tm) 64 X2 Dual Core Processor 5000+
    3GB RAM
    Intel X25M 80 GB SSD with EXT4
    Radeon RS690 video card

: You may notice that I haven't written much about the processor speed, and that is because I couldn't figure out exactly how fast these processors go based on the information in lshw. Also I haven't described how to setup the machines so you can ssh to them. I think it is enough to say that installing OpenSsh on each server should allow you to do this.

The next thing is to check that the machines can talk to each other. This is very important as often there are so many things that can prevent machine access on the same network, and unless the machines can talk to each other there is no point with continuing. We will just try a basic test, and that is to connect to each machine and to see if it can see the other machines by using ping:


users-Mac-Pro:~ root# ssh 192.168.1.6
root@192.168.1.6's password:
Last login: Mon Feb 15 11:19:30 2010 from users-mac-pro.local
Linux 192 2.6.24-24-generic #1 SMP Fri Sep 18 16:16:18 UTC 2009 x86_64


The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.


Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.


To access official Ubuntu documentation, please visit:
http://help.ubuntu.com/
root@192:~# ping 192.168.1.7
PING 192.168.1.7 (192.168.1.7) 56(84) bytes of data.
64 bytes from 192.168.1.7: icmp_seq=1 ttl=64 time=0.322 ms
64 bytes from 192.168.1.7: icmp_seq=2 ttl=64 time=0.288 ms

--- 192.168.1.7 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.288/0.305/0.322/0.017 ms
root@192:~# ping 192.168.1.8
PING 192.168.1.8 (192.168.1.8) 56(84) bytes of data.
64 bytes from 192.168.1.8: icmp_seq=1 ttl=64 time=0.336 ms
64 bytes from 192.168.1.8: icmp_seq=2 ttl=64 time=0.284 ms

--- 192.168.1.8 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.284/0.310/0.336/0.026 ms
root@192:~# exit
logout
Connection to 192.168.1.6 closed.
users-Mac-Pro:~ root#


: So in the above example you can see that I connected to the first machine and tested the connection to both the other machines in the cluster. But why did I write out the whole damn command dialouge you may ask? Well, that is because this is a step you must get right, and you must make sure you perform it on all three of your machines in the cluster as otherwise you can end up spending hours trying to figure out why things don't work later on without any meaningful error messages. If the results of the ping commands time out then I refer you to your network administrator, or serverfault.com, as it could be one of many things that need to be fixed, like firewalls, network segment issues, and blocked ports amogst other things.

Ok, so now we are at the stage where we have a cluster of machines, we are able to access them, and they can all talk to each other.

So the next stage is to set up Riak on each machine. In the Riak For Ubuntu Installation Guide there is a script to do this. So do the following on each machine (assuming you have root access) which will install the correct version or Erlang and Riak:

ssh 192.168.1.6

apt-get clean 
apt-get remove erlang-base
apt-get remove erlang

apt-get install build-essential libncurses5-dev m4 
apt-get install openssl libssl-dev

cd /
wget http://erlang.org/download/otp_src_R13B03.tar.gz 
tar zxf otp_src_R13B03.tar.gz 
cd otp_src_R13B03/ 
./configure 
make
make install

cd /
wget http://hg.basho.com/riak/get/riak-0.7.1.tar.gz
tar xzf riak-0.7.1.tar.gz 
cd riak 
make all rel
export RIAK=`pwd`

cd rel/riak
bin/riak start

sleep 5
bin/riak-admin test 



The final output should be something like :

=INFO REPORT==== 10-Feb-2010::11:00:30 === Successfully completed 1 read/write cycle to 'riak@127.0.0.1'


: If the output is something different then refer to the Riak For Ubuntu Installation Guide. So once you have executed the above script three times then this means that Riak is installed on each machine. However, we still do not have a cluster yet, as we have three separate Riaks databases running, one on each server. We should however test this, so assuming that you have Riak installed on your local machine you should connect to the instances and store something unique in each of them. This is how I do this from my Mac Pro (note that RIAK is an environment variable that defines where I have installed Riak):

cd $RIAK/rel/riak/
erts-5.7.4/bin/erl -name riaktest -setcookie riak

: This will put you into the Erlang shell :

{ok, C} = riak:client_connect('riak@192.168.1.6').

: but this results in :
=ERROR REPORT==== 15-Feb-2010::11:47:56 ===
Error in process <0.40.0> on node 'riaktest@users-Mac-Pro.local' with exit value: {badarg,[{erlang,list_to_existing_atom,["riak@127.0.0.1"]},{dist_util,recv_challenge,1},{dist_util,handshake_we_started,1}]}

** exception error: no match of right hand side value {error,{could_not_reach_node,'riak@192.168.1.6'}}


: What happened? This error message is saying that Erlang couldn't find the remote Erlang node / VM which is running Riak. But we used the correct cookie riak, so what else could be wrong? You may remember that in another article we talked about how the Riak server sets up Erlang VM parameters in a configuration file called vm.args. So do the following:

ssh 192.168.1.6
cd /riak/rel/riak
bin/riak stop

: We just stopped the Riak node running on the remote server because every time you change a Riak configuration paramter it only takes effect the next time that node is started. So then we do :

emacs etc/vm.args

: and change the following lines:

## Name of the riak node
-name riak@127.0.0.1

: to :

## Name of the riak node
-name riakserver@192.168.1.6

: and then change the Riak web port by:

emacs etc/app.config

: and change :

%% riak_web_ip is the IP address that Riak's HTTP interface will bind to.
%%  If this is undefined, the HTTP interface will not run.
{riak_web_ip, "127.0.0.1"},

: to :

%% riak_web_ip is the IP address that Riak's HTTP interface will bind to.
%%  If this is undefined, the HTTP interface will not run.
{riak_web_ip, "192.168.1.6"},

: then clear the ring state:

rm data/ring/*

: And finally check the port 8098 is not in use the following on Ubuntu:

netstat -anp --tcp --udp | grep LISTEN

: which shows something like :


tcp   0  0 127.0.0.1:8098   0.0.0.0:*   LISTEN   10531/beam.smp 
tcp   0  0 0.0.0.0:8099     0.0.0.0:*   LISTEN   10531/beam.smp 
tcp   0  0 0.0.0.0:55756    0.0.0.0:*   LISTEN   10531/beam.smp 
tcp   0  0 0.0.0.0:80       0.0.0.0:*   LISTEN   5198/nginx     
tcp   0  0 0.0.0.0:4369     0.0.0.0:*   LISTEN   5894/epmd      
tcp   0  0 127.0.0.1:631    0.0.0.0:*   LISTEN   5182/cupsd     
tcp6  0  0 :::22            :::*        LISTEN   5114/sshd 

: and then :

kill 10531

: If your servers happen to be on Mac OS X which is not so common, but happens, then use :

lsof -i 4tcp

: and on OS X you will see the not so friendly output as something like:

Skype     2866 root   48u  IPv4 0x2dc5af28      0t0  TCP 192.168.1.4:60805->163-211.static.quiettouch.com:12350 (ESTABLISHED)
beam.smp  5335 root   16u  IPv4 0x35ee4680      0t0  TCP *:51247 (LISTEN)
beam.smp  5335 root   17u  IPv4 0x2eaf4304      0t0  TCP localhost:51248->localhost:epmd (ESTABLISHED)
beam.smp  5335 root   40u  IPv4 0x2dc5a304      0t0  TCP *:8099 (LISTEN)
beam.smp  5335 root   41u  IPv4 0x34a26ef8      0t0  TCP 192.168.1.4:8098 (LISTEN)
beam.smp  5335 root  106u  IPv4 0x2e00ee98      0t0  TCP 192.168.1.4:51247->192.168.1.4:51261 (ESTABLISHED)
beam.smp  5389 root    9u  IPv4 0x35ed62d4      0t0  TCP *:51257 (LISTEN)


: and kill the offending processes using kill.

Then restart the Riak node :

bin/riak start
bin/riak-admin test

: and you should get :

=INFO REPORT==== 15-Feb-2010::12:21:44 ===
Successfully completed 1 read/write cycle to 'riakserver@192.168.1.6'

: and then exit the ssh session with:

exit

: and try to connect again to Erlang again on your local machine with:

erts-5.7.4/bin/erl -name riaktest -setcookie riak

: Then, try to connect again to the remote Erlang node. Remember that we changed the node name from riak to riakserver in the etc/vm.args file earlier:

{ok, C6} = riak:client_connect('riakserver@192.168.1.6').

: This should result in something like:

{ok,{riak_client,'riakserver@192.168.1.6',<<4,22,140,118>>}}

: Then try this from the Erlang command line :

C6:list_buckets().

: This should show the buckets, or sections available in the Riak database at 192.168.1.6:

{ok,[<<"__riak_client_test__">>]}

: Then populate the database like this:


Data6 = riak_object:new(<<"Bucket_on_server_192.168.1.6">>, <<"Key">>, ["Value"]).


: to which you will get a really ugly response from the command line like:


{r_object,<<"Bucket_on_server_192.168.1.6">>,<<"Key">>,
          [{r_content,{dict,0,16,16,8,80,48,
                            {[],[],[],[],[],[],[],[],[],[],[],[],[],[],...},
                            {{[],[],[],[],[],[],[],[],[],[],[],[],...}}},
                      ["Value"]}],
          [],
          {dict,1,16,16,8,80,48,
                {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],...},
                {{[],[],[],[],[],[],[],[],[],[],[],[],[],...}}},
          undefined}


: and then:


C6:put(Data6,1).

: which should result in :

ok

: then see which buckets are available:

C6:list_buckets().

:and you should see:

{ok,[<<"Bucket_on_server_192.168.1.6">>,
<<"__riak_client_test__">>]}

:  You may notice that in addition to the bucket that we created there is also a bucket called <<"__riak_client_test__">>. This bucket was created when we installed Riak on this node when we ran bin/riak-admin test.

So now we have one machine in the Riak cluster set up, we must set up the second and third servers found at 192.168.1.7 and 192.168.1.8 in the same way, using the variables C7, C8, Data7, and Data8 for the Erlang connections to the other Riak machines.

From your local Erlang client you should now execute:

nodes().

and you should see:

['riakserver@192.168.1.6','riakserver@192.168.1.7',
 'riakserver@192.168.1.8']

: which means that the local Erlang node can see all of the Riak nodes in the cluster. But for Riak to cluster the nodes and additional command must be sent. So from your Erlang client execute this:

C6:list_buckets().

: which should result in the buckets for 192.168.1.6:

{ok,[<<"Bucket_on_server_192.168.1.6">>,
<<"__riak_client_test__">>]}

: then ssh to the first server:

ssh 192.168.1.6
cd /riak/rel/riak
bin/riak-admin join riakserver@192.168.1.7

:and you should get something back like:

Sent join request to riakserver@192.168.1.7

: then from your Erlang command line again type:

C6:list_buckets().

: and you will see:

{ok,[<<"Bucket_on_server_192.168.1.6">>,
     <<"Bucket_on_server_192.168.1.7">>,
     <<"__riak_client_test__">>,<<"groceries">>]}

: Great, so now your first two machines are clustered as 192.168.1.6 can see 192.168.1.7's buckets! Just to check do this:

C7:list_buckets().

:and you should see:

{ok,[<<"Bucket_on_server_192.168.1.6">>,

<<"Bucket_on_server_192.168.1.7">>,

<<"__riak_client_test__">>,<<"groceries">>]}

Yep, and 192.168.1.7 can see 192.168.1.6's buckets too!
What about the final server? From the Erlang command line type:

C8:list_buckets().

: to see which buckets the third Riak server (192.168.1.8) has access to:

{ok,[<<"Bucket_on_server_192.168.1.8">>,
     <<"__riak_client_test__">>,<<"groceries">>]}

So the third Riak server 192.168.1.8 still can't see the other two Riak nodes. So do :

ssh 192.168.1.8
cd /risk/rel/riak
bin/riak-admin join riak-server@192.168.1.6

and you should see:

Sent join request to riakserver@192.168.1.6

and then in the Erlang command line once again type:

C8:list_buckets().

: and you should get :

{ok,[<<"Bucket_on_server_192.168.1.6">>,
<<"Bucket_on_server_192.168.1.7">>,
<<"Bucket_on_server_192.168.1.8">>,
<<"__riak_client_test__">>]}

: so, now all your Riak nodes can see all the other Riak nodes. You now have a working Riak cluster set up!

No comments:

Post a Comment