Vijay Bellur

Life, Technology and a little more…

Upgrading to GlusterFS 3.4.0

Now that GlusterFS 3.4.0 is out, here are some mechanisms to upgrade from earlier installed versions of GlusterFS.

Upgrade from GlusterFS 3.3.x:

GlusterFS 3.4.0 is compatible with 3.3.x (yes, you read it right!). You can upgrade your deployment by following one of the two procedures below.

a) Scheduling a downtime (Recommended)

For this approach, schedule a downtime and prevent all your clients from accessing the servers.

  • Stop all glusterd, glusterfsd and glusterfs processes on your server.
  • Install  GlusterFS 3.4.0
  • Start glusterd.
  • Ensure that all started volumes have processes online in “gluster volume status”.

You would need to repeat these 4 steps on all servers that form your trusted storage pool.

After upgrading the servers, it is recommended to upgrade all client installations to 3.4.0.

b) Rolling upgrades with no downtime

If you have replicated or distributed replicated volumes with bricks placed in the right fashion for redundancy, have no data to be self-healed and feel adventurous, you can perform a rolling upgrade through the following procedure:

  • Stop all glusterd, glusterfs and glusterfsd processes on your server.
  • Install GlusterFS 3.4.0.
  • Start glusterd.
  • Run “gluster volume heal <volname> info” on all volumes and ensure that there is nothing left to be self-healed on every volume. If you have pending data for self-heal, run “gluster volume heal <volname>” and wait for self-heal to complete.
  • Ensure that all started volumes have processes online in “gluster volume status”.

Repeat the above steps on all servers that are part of your trusted storage pool.

Again after upgrading the servers, it is recommended to upgrade all client installations to 3.4.0.

Upgrade from GlusterFS 3.1.x and 3.2.x:

You can follow the procedure listed here and replace 3.3.0 with 3.4.0 in the procedure.

Do report your findings on 3.4.0 in gluster-users, #gluster on Freenode and bugzilla.

Please note that this may not work for all installations & upgrades. If you notice anything amiss and would like to see it covered here, please leave a comment and I will try to incorporate your findings.

Advertisements

July 15, 2013 - Posted by | Uncategorized

15 Comments »

  1. Excellent howto. Congratulations.

    Comment by Liyuan | July 19, 2013 | Reply

  2. Hello, I performed the rolling upgrade option on a 3 brick replica and after each node was upgraded the self-heal daemon stopped running:
    Status: self-heal-daemon is not running on 68c14….

    The last few lines of the glustershd.log are not that helpful, but indicate that it couldn’t connect to 127.0.0.1:24007.

    There’s no firewall, no selinux, and no DNS names here… Any thoughts?

    By the way, the gluster is fully operational except for this issue.

    Comment by Jesse | July 23, 2013 | Reply

    • Thanks for the feedback. Have you noticed anything in the glusterd log to see why the connection from glustershd failed?

      Comment by Vijay Bellur | July 23, 2013 | Reply

      • Hello again. The short answer is that no, there was nothing illustrative in the logs. I worked with several people on #gluster for a couple of days trying to figure this out but with no success. I’m not sure if this is related to the fact that this was initially a 3.2 volume that I upgraded to 3.3 and then to 3.4.

        The good news is that I have it fixed now, by blowing out all the old configuration and recreating the volume. Now everything looks as I would expect on the volume.

        Comment by Jesse | August 9, 2013

  3. The official docs lack a guide like this for 3.4 – it would be easier to find if you linked here from there.

    Comment by Marcus Bointon | July 23, 2013 | Reply

    • Thanks. Have added a link from gluster.org.

      Comment by Vijay Bellur | July 23, 2013 | Reply

  4. I tried a rolling upgrade from 3.3.2qa1 to 3.4.0, but the client mounts do not pick up the new brick port numbers after the upgrade (they change from 24009+ to 49152+). Therefore, as bricks are upgraded to 3.4.0, the clients no longer talk to them. I’m assuming that remounting the client would address this, but then it’s not really a rolling update, is it? It was especially complicated for me since I have a 2-node replica where both nodes are the actual clients as well. I couldn’t upgrade the servers and the clients independently. I had to just take a downtime and upgrade both nodes at the same time. Given the behavior of the clients not finding the new brick port numbers, I don’t see how a rolling upgrade is even possible.

    Comment by Todd Stansell | July 23, 2013 | Reply

  5. My setup is the same as Todd’s.

    After upgrading from 3.3.0 (Ubuntu packaged version) to 3.4 (source), I ran into this problem: http://lists.gnu.org/archive/html/gluster-devel/2013-01/msg00011.html
    There were old libs in /usr/lib, new ones in /usr/local/lib. Despite gluster being configured, built and run from /usr/local, the init script pointing there, and ldconfig including /usr/local/lib too, it was still looking in /usr for libs. I fixed it thus:
    rm /usr/lib/libglusterfs.so.0*
    ln -s /usr/local/lib/libglusterfs.* /usr/lib/
    rm /usr/lib/libgfrpc.so.0*
    ln -s /usr/local/lib/libgfrpc.so.0* /usr/lib/

    I ran into several other things which broke, fortunately most were known problems and had solutions in search results.

    Another important thing – NFS clients now connect to port 49152, which is a change from 3.3.

    Overall I found it less difficult to copy everything off my old volume, uninstall and delete every last trace of gluster, reinstall from scratch, and copy everything back on.

    Todd – how do you mount your volumes on boot? I can’t make it work from localhost.

    Comment by Marcus Bointon | July 25, 2013 | Reply

    • We couldn’t get the init stuff to work reliably for us the way we wanted (processes wouldn’t get shut down properly on reboot, mounts wouldn’t mount at the right time in the boot sequence, etc), so we created our own init scripts to do what we needed. We have a custom init script that mounts all glusterfs filesystems via fuse in the fstab at S21, right after glusterd starts at S20. This is because most of our services on these boxes depend on the gluster mount itself, so it has to start very early. Our fstab entries have noauto so that none of the system pieces will try mounting it. In our fstab entry, we reference the local name defined in gluster, rather than localhost (e.g. admin01 mounts via admin01:/, admin02 mounts via admin02:/).

      Comment by Todd Stansell | July 25, 2013 | Reply

  6. I ran into a weird issue when performing a rolling upgrade on 3 servers in a replica 3 pair.

    I am running ubuntu from the ppa.

    On each server, one at a time I did `service glusterfs-server stop`, installed 3.4 from apt, which then started the service again.

    Now my clients that were not already connected could not connect anymore! They were able to connect, download the volfile, determine how to access the other servers, but attempting to connect to those individual servers would always return connection refused.

    From the logs on the server

    [2013-07-29 18:00:01.849880] I [socket.c:2236:socket_event_handler] 0-transport: disconnecting now
    [2013-07-29 18:00:01.851216] I [glusterd-utils.c:954:glusterd_volume_brickinfo_get] 0-management: Found brick
    [2013-07-29 18:00:04.848353] E [socket.c:2788:socket_connect] 0-management: connection attempt failed (Connection refused)
    [2013-07-29 18:00:04.850322] I [glusterd-utils.c:954:glusterd_volume_brickinfo_get] 0-management: Found brick
    [2013-07-29 18:00:04.850352] I [socket.c:2236:socket_event_handler] 0-transport: disconnecting now

    Turns out, “service glusterfs-server stop“ does not actually stop or kill any of the glusterfsd or glusterfs processes. I tried to kill -HUP them, which did not work, then tried kill -15 which did not work, finally did a kill -9, then service glusterfs-server start.

    Once I did that, the clients were able to connect, and everything is working.

    Needless to say, the logging is terrible, and took a couple of hours with a guy on IRC to even guess at what to try. It would be nice if gluster volume info or peer status would have reported a problem, they both thought the world was great.

    I am thinking that this would have worked as advertised if service would actually kill the processes if they are not responding, similar to how apache service manager works.

    Comment by thadeusb | July 30, 2013 | Reply

  7. […] on my storage / compute cluster at ILRI, Kenya. I referenced Vijay Bellur’s blog post about upgrading to 3.4, then added my own bits using Ansible for my infrastructure (I gave an overview of my Ansible setup […]

    Pingback by Mjanja Tech | Update GlusterFS 3.3.1 -> 3.4.0 on CentOS 6.4 cluster | September 16, 2013 | Reply

  8. Hello,

    When you upgrade from 3.3 to 3.4, do you automatically take advantage of new features or do we need to change / enable any option in the volumes?

    Thanks!

    Comment by Fedes | November 8, 2013 | Reply

  9. Hi! Thanks for your blog, I have one question!…We can update with striped-distrib-replic volume first clients and after gluster servers of the volume?? It’s 100% compatible?

    I have done some tests and I had problems with “get stripped size” from client server with 3.4.1 and servers with 3.3.0…. 😦

    Thanks a lot!

    Comment by Emilio | December 10, 2013 | Reply

  10. After I upgrade all replication nodes from 3.3 to 3.4, it works very good. But when I was trying to add a new 3.4 Node, I always get failed,

    peer status says “rejected”

    glustershd.log

    [2014-03-20 22:47:34.105886] I [client-handshake.c:1614:select_server_supported_programs] 0-puppet-bucket-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)
    [2014-03-20 22:47:34.106238] I [client-handshake.c:1411:client_setvolume_cbk] 0-puppet-bucket-client-1: Connected to 10.136.200.16:49154, attached to remote volume ‘/opt/gluster-data/puppet/bucket’.
    [2014-03-20 22:47:34.106249] I [client-handshake.c:1423:client_setvolume_cbk] 0-puppet-bucket-client-1: Server and Client lk-version numbers are not same, reopening the fds

    so I have to install 3.3, then add it to pool, and upgrade. It really painful.

    Comment by Yang | March 21, 2014 | Reply

    • On upgrade from 3.3 to 3.4 you will need to perform a dummy operation on all the volumes to make them update the version strings on volume files ie. for example “gluster volume set volumename brick-log-level INFO”. See http://review.gluster.org/#/c/7729/

      Comment by partner | December 30, 2014 | Reply


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: