Devops tooling

HA Rancher server with HAProxy

Docker compose + components to setup HA Rancher with MySQL backend (not part of provisioning, using external galera cluster) with SSL termination, overbalancing to different active rancher server and SSL encryption for MySQL connection.

Demo platform for testing Rancher with Catle on OpenStack

  • OSS based stack (deployed via Terraform)


  • 2x docker for Catle
    • demo-rancher-minimal-server-se401-[oss_login]
    • demo-rancher-minimal-host-se40x-oss_login


  • OpenStack login/pass and key_pair
  • Latest stable Docker
  • Dockerfile contain all needed tooling (terraform, ansible, rancher-compose, …)

Start instances for Rancher in OpenStack

  • If you have Windows OS, some steps need to be done in different way (commands, paths, …) to run docker container
## Clone repozitory:
[root@workstation ]# git clone

# Run Docker
[root@workstation ]$ docker build -t rancher-minimal .
[root@workstation ]$ cp ~/.ssh/<your_oss_ssh_key> terraform.pem
[root@workstation ]$ docker run -v `pwd`:/code -ti rancher-minimal /bin/bash

## Rename sample variables and add your credentials
[root@container /]$ cd /code
[root@container code]$ mv variables.sample # setup your credentials to OpenStack
[root@container code]$ terraform plan
[root@container code]$ terraform apply
[root@container code]$ terraform output  # to see IP addresses of instances

Setup rancher and hosts via Ansible

[root@container code]$ ansible-playbook -i ./ rancher-provision.yml -b

Setup rancher manually

Login to UI

  • get IPADDRESS of demo-rancher-minimal-server-se401-[oss_login] and put it to browser http://IPADDRESS
    • you should be abble to see UI and after a few seconds also registered host(s)


# You need Environment API key and URL exported, http://IPADDRESS/env/1a5/api/keys

export RANCHER_URL=http://<server_ip>/v2-beta/projects/1a5
export RANCHER_ACCESS_KEY=xxxxxxx
export RANCHER_SECRET_KEY=xxxxxxxxxxxxxxxxx

cd wordpress
rancher-compose up
#... some changes ...
rancher-compose up --force-upgrade
rancher-compose down
rancher-compose rm


Graphite and Grafana in Docker
 docker-compose up
 docker-compose down

Graphite + Carbon-cache

  • 8888 the graphite web interface admin/admin
  • 2003 the carbon-cache line receiver (the standard graphite protocol)
  • 2004 the carbon-cache pickle receiver
  • 7002 the carbon-cache query port (used by the web interface)


  • 3000 Kibana4 admin/admin
  • add new data sources

Send some fake stats to Graphite

while true
  echo "local.random.diceroll $(((RANDOM % 10) + 1)) `date +%s`" | nc -c localhost 2003

Create new dashboard with graph from new datasource

  • local.random.diceroll

ElasticSearch v5 + Kibana


Repozitory structure overview

  • set of README files:
    • README – this document
    • README_api_usage – basic handling of ELK via curl
    • README_mapping – ELK mapping example
    • README_query_cheatsheet – examples of query language
    • README_watcher – how to setup simple alerting
    • README_workshop – labs + basic terms
    • README_geoip – info about GeoIP coordinates handling
    • README_s3 – experimental backup to S3 bucket simulated via Riak CS
  • Terraform – just for workshop, spin several servers with preinstalled docker, readme in directory

ELK Stack components


  • 1x Fluentd NEED TO BE RUN FIRST see how to run this stack
    • containers output logger (back to ELK)
    • index platform*


  • 3x server (data/client/master role) – you can start just one server (elasticsearchdataone) if you don't have HW resources or limit resources via docker CPU/Mem qutoas, see comments in common-services.yml
  • x-pack installed
  • exposed ports:
    • 920[1-3] / 930[1-3]



  • used for easy sample data upload
    • exposed ports:
    • 5000 – json filter
    • 5001 – raw, no filters
    • you can use .raw field for not_analyzed data
    • index: logstash*

Riak CS

  • used for AWS S3 simulation
    • exposed ports:
    • 8080 – API
  • not logged via Fluentd to ELK (API key created during start)


  • Mattermost server – running outside demo stack as a simple container with open access and webhook created
  • IP of server is setup in .env

Stack handling

Start stack

  • Start stack (do not use docker-compose up as there are some prerequisities to start stack)
  • This short script will prepare temporary data volumes for ES servers and start fluend container first
  • Download git repo:
$ git clone 
$ cd elk-stack-v5-xpack
$ ./_start

Stop stack

  • just stop containers, for removing network/artefact use docker-copose command
$ ./_stop

Used tools

You can use logstash or kibana containers (all is mounted as /code/) or install on your system (OSX/Lnx).

  • netcat – for logstash feedings
  • jq – for pretty outputs
  • curl – for shell work with Elastic
  • curator

Not covered


Traffic loss simulation

Example behaviour of typical 3G network

300Kbit 200-100ms avg 2% loss

Setup script –

#!/bin/bash -x
# ./ \""{{ DELAY &nbsp;}}";"{{ SPREAD }}"\" \""{{ SPEED }}";"{{ CEIL }}";"{{ BURST }}"\" "{{ LOSS }}"


# Info

delay=$(echo ${_DELAY} | cut -f 1 -d ";")
spread=$(echo ${_DELAY} | cut -f 2 -d ";")
speed=$(echo ${_SPEED} | cut -f 1 -d ";")
ceil=$(echo ${_SPEED} | cut -f 2 -d ";")
burst=$(echo ${_SPEED} | cut -f 3 -d ";")

# Reset settings on eth0
tc qdisc del dev eth0 root

# Basic setup
tc qdisc add dev eth0 root handle 1: htb default 10
tc class add dev eth0 parent 1: classid 1:1 htb rate 100mbit burst 30k

# Use parameters for TC filter setup
tc class add dev eth0 parent 1:1 classid 1:10 htb rate ${speed}kbit ceil ${ceil}kbit burst ${burst}k
tc qdisc add dev eth0 parent 1:10 handle 10: netem delay ${delay}ms ${spread}ms loss ${loss}% distribution normal

# Enable filter
tc filter add dev eth0 protocol ip parent 1:10 prio 3 u32 match ip dst flowid 1:1

Simple Ansible playbook

- hosts: any
  gather_facts: no
  sudo: yes
   - vars.yml

- name: "Reset TC settings, safe to ignore errors if there is no previouse setup"
  shell: /sbin/tc qdisc del dev eth0 root
  register: resettc_result
  tags: reset
  ignore_errors: yes

- name: Copy shaping script to node and run
  script: './ \""{{ DELAY }}";"{{ SPREAD }}"\" \""{{ SPEED }}";"{{ CEIL }}";"{{ BURST }}"\" "{{ LOSS }}"'
  register: shapper_result
  tags: setup

- name: Show TC settings
  shell: /sbin/tc -s qdisc ls dev eth0
  register: tcresult
  ignore_errors: yes
   - always

- name: Debug shaper output
  debug: msg="{{ shapper_result.stdout }}"
  tags: setup

- name: Debug TC shaper setup
  debug: msg="{{ tcresult.stdout }}"
   - always


GitLab HA solution (slow but reliable)


Continues delivery? Jenkins/cron driven task, Ansible playbooks from Git repos ? So what happens if GitLab goes down ?

Example setup

We deploy and install three servers for all important GitLab application. For easinest start with clean CentOS6 64b machines root access and ssh key from your workstation and galera cluster control.

  • 3x balanced application servers with nginx + ssl
  • 3x database backend (Galera with S9s Cluster Control)
  • 3x redis HA with Sentinel
  • GlusterFS for repositories, keys and satelites




Recommended way is to use SeveralNines solution for full MySQL master-master cluster deploy, they provide easy web UI for installation pack creation:


Use version > 2.6


bind 192.168.X.11
slaveof 6379 #redis02/redis03 only

/etc/redis-sentinel.conf (redis01/redis02/redis03)

sentinel monitor redis01 6379 2
sentinel down-after-milliseconds redis01 60000
sentinel failover-timeout redis01 180000 sentinel parallel-syncs redis01 1

restart redis and redis-sentinel process.

GitLab APP



group haproxy
maxconn 10000
stats socket /var/lib/haproxy/stats mode 660 level admin
user haproxy

log global
maxconn 8000
option redispatch
retries 3
stats enable
timeout http-request 10s
timeout queue 1m
timeout connect 10s
timeout client 1m
timeout server 1m
timeout check 10s

group admin users admin
user admin insecure-password XXXXXX
user stats insecure-password MONITOR

listen admin_page
mode http
acl AuthOkay_ReadOnly http_auth(STATSUSERS)
acl AuthOkay_Admin http_auth_group(STATSUSERS) admin
stats enable
stats refresh 60s
stats uri /
stats http-request auth realm admin_page unless AuthOkay_ReadOnly
stats admin if AuthOkay_Admin

# Specifies TCP timeout on connect for use by the frontend ft_redis
# Set the max time to wait for a connection attempt to a server to succeed
# The server and client side expected to acknowledge or send data.
defaults REDIS
mode tcp
timeout connect 3s
timeout server 6s
timeout client 6s

# Specifies listening socket for accepting client connections using the default
# REDIS TCP timeout and backend bk_redis TCP health check.
frontend ft_redis
bind *:6378 name redis
default_backend bk_redis

# Specifies the backend Redis proxy server TCP health settings
# Ensure it only forward incoming connections to reach a master.
backend bk_redis
option tcp-check
tcp-check connect
tcp-check send PING\r\n
tcp-check expect string +PONG
tcp-check send info\ replication\r\n
tcp-check expect string role:master
tcp-check send QUIT\r\n
tcp-check expect string +OK
server redis01 check inter 1s
server redis02 check inter 1s
server redis03 check inter 1s

listen mysql
mode tcp
balance leastconn
default-server port 9200 inter 15s downinter 10s rise 3 fall 3 slowstart 5s maxconn 4000 maxqueue 512 weight 100
option redispatch
option httpchk
timeout client 120000ms
timeout server 120000ms
server database01 check weight 100
server database02 check weight 100
server database03 check weight 100

Check via IP:9600 if database and master node of redis is online.


We setup 3 replicated disk for repositories, auth. file and uploads. Install gluster-fs on gitlab app servers and then:

# On all gitlabX servers
mkfs.xfs -i size=512 /dev/sdb1 /dev/sdb2 /dev/sdb3
mkdir -p /data/brick1/data/brick2 /data/brick3
echo '/dev/sdb1 /data/brick1 xfs defaults 1 2' &amp;amp;gt;&amp;amp;gt; /etc/fstab
echo '/dev/sdb1 /data/brick2 xfs defaults 1 2' &amp;amp;gt;&amp;amp;gt; /etc/fstab
echo '/dev/sdb1 /data/brick3 xfs defaults 1 2' &amp;amp;gt;&amp;amp;gt; /etc/fstab

mount -a &amp;amp;amp;&amp;amp;amp; mount
# On one of gitappX server:
gluster peer probe gitapp01
gluster peer probe gitapp02
gluster peer probe gitapp03

# On all gitlabX servers
mkdir /data/brick1/gv0
mkdir /data/brick2/gv1
mkdir /data/brick3/gv2

# On one of gitappX server:
gluster volume create gv0 replica 3 gitapp01:/data/brick1/gv0 gitapp02:/data/brick1/gv0 gitapp03:/data/brick1/gv0 gluster volume start gv0
gluster volume create gv0 replica 3 gitapp01:/data/brick2/gv1 gitapp02:/data/brick2/gv1 gitapp03:/data/brick2/gv1 gluster volume start gv1
gluster volume create gv0 replica 3 gitapp01:/data/brick3/gv2 gitapp02:/data/brick3/gv2 gitapp03:/data/brick3/gv2 gluster volume start gv2

Check status via “gluster volume info” if everything is ok than create user git and mount points (on all gitlab0X servers):

useradd git -d /opt/git
mkdir /opt/git/{uploads,repositories,.ssh}
mount -t glusterfs gitapp01:/gv0 /opt/git/uploads
mount -t glusterfs gitapp01:/gv1 /opt/git/repositories
mount -t glusterfs gitapp01:/gv2 /opt/git/.ssh

GitLab installation and setup

Follow this perfect step-by-step manual

Performance testing

Without GlusteFS (local disks) vs GFS, operation clone (repository with 10K files and 20 branches) and push (1x 10Kb text file). GFS is aprox. 6x slower (all servers running on vmware, in three location with ~10ms network latencies).

Screen Shot 2015-11-28 at 15.08.23

Powered by

Up ↑