Difference between revisions of "Management"

From CSLabsWiki
(added some more stuff about the progress of the software and hardware solutions)
Line 16: Line 16:
| operating_system = Debian 8.2
| operating_system = Debian 8.2
| ldap = none
| ldap = none
| development_status = ready for deployment! (still in beta server side, but client side is done)
| development_status = ready for deployment! (still in beta server side, but client side is mostly done)
| status = running
| status = running
| vm_host = Felix
| vm_host = Felix

Revision as of 03:45, 8 February 2016

IP Address(es):
Contact Person: Jared Dunbar
Last Update: January 2016
Services: server status indicator

COSI Management
Hostname: management
Operating system: Debian 8.2
LDAP Support: none
Development Status: ready for deployment! (still in beta server side, but client side is mostly done)
Status: running

Management is a VM on Felix that will be used for monitoring the status of VM's on other machines and the status of the hardware in the server room, ie. checking the CPU, RAM, and hard drive stats.

Each computer will have a startup executable written in BASH scripts and C++ executables that will send data periodically to Management which will be shown in an uptime page on a webpage that can easily be used to determine system uptime and service uptime.

Additional planned features are:

  • database system to store the data collected
  • graph display of events?
  • select server subnet(s)
  • add specific server IP's not in the subnets
  • move to some hardware - to be discussed at a forum near you
  • manage battery backups and tell servers when exactly to power down in the event of an outage
  • add some more master key configurations for fallback mechanisms
  • add server specific key functions and configurations (such as owner information, contact details, and others)

Currently installed on the VM are the following:

htop openssh-client vim libmysql-java openjdk-7-jdk p7zip-full g++ sudo

Required for the client side of the management software is:

g++ top awk tail bash sed

The scripts rely on bash and the executable needs to be compiled for the architecture that it is run on.

The source code for the client executable is available online at <https://github.com/jrddunbr/management-client>

The bash scripts are made wherever necessary (it's expandable and each server can have as many (to the limit of the amount of ram the JVM has) keys as it wants, each data parameter is stored as a key) and here are some functional examples:

CPU: this works

DATA=$(top -bn1 | grep "Cpu(s)" | \
           sed "s/.*, *\([0-9.]*\)%* id.*/\1/" | \
           awk '{print 100 - $1}')
echo $DATA
/manage/management-client 80 cpu $DATA

Used-Ram: ?? not sure if this is right quite yet

FREE_DATA=`free -m | grep Mem` 
DATA=`echo $FREE_DATA | cut -f3 -d' '`MB
echo $DATA
/manage/management-client 80 used-ram $DATA

Total-Ram: this works

FREE_DATA=`free -m | grep Mem` 
DATA=`echo $FREE_DATA | cut -f2 -d' '`MB
echo $DATA
/manage/management-client 80 total-ram $DATA

These scripts expect management-client.cpp to be compiled as management-client and to be in the /manage folder (for simplicity, I tend to put them all in the same folder).

I also have one script that runs all of the scripts and this script is started by a Systemd Unit file located in /etc/systemd/system/manage.service:

Description=manage stuff

ExecStart=/bin/bash /manage/run.sh


The Bash script that runs all the other bash scripts looks a lot like this:

cd /manage
while true
    /manage/cpu.sh &
    /manage/free-ram.sh &
    /manage/total-ram.sh &
    /manage/uptime.sh &
    sleep 20

It is easy to make more customized bash scripts that will complete other tasks. The compiled file has an expected input of ./management-client (IP) (PORT) (KEY) (VALUE) and this causes a key to go up, and saves at the value. When the server gets this as a rest call, the server reads it because it's in the 145 subnet and then sets it into the data structures of the program.

Unfortunately for the time being, the 145 subnet is a hard-coded thing. In future releases, as I have more time to finish this, it will become more functional and more features will arise.

The server side of the software is available at <https://github.com/jrddunbr/management-server> and is still a work in progress.

It requires the following to be installed:

openjdk-7-jdk wget

You place the compiled .jar file in a handy place along with a few files (most found in the Github repo):

index.html # a template HTML file that is used to list all of the servers, uptimes, and other data.
server.html # a template HTML file that is used to list one server and all of the associated key and value pairs that it has.
templates.yml # a template YAML file that is used to create all of the server specific YAML files. Once these are made, they will appear in the servers folder created in the root that the jar is run in,
master.yml # a file that defines master keys, which are server side keys that define server characteristics locally, used to enable servers, specify if they are urgent to server uptime, and in the future the maintainers and if it's a VM, the VM-host operator.

One downside to the whole system is that it depends on TALOS's HTTPS server to be running when this starts because it fetches the domain files. It can use a fallback mechanism where it copies the file to the hard drive as a backup, and you could technically put the file there for it to read. A new configuration key needs to be added to the master list before this will work however.. coming soon!

Inside the servers folder, there are configurable per-server configs.

Make sure that you check that your YAML files are parsed properly or I guarantee that the Java code will crash. There are a few good online checkers out there.

I made the startup script for the management server much the same as the client one, in fact I only changed the path to an executable SH file and changed the description slightly.

The edited SH file that starts it is as follows:

cd /manage
date >> runtime.log
java -jar management-server.jar >> runtime.txt

As a helpful tip, here's how to start and stop Systemd unit files, do these:

systemctl enable <name> for enabling units

systemctl disable <name> for disabling

systemctl start <name> for starting

systemctl stop <name> for stopping

systemctl status <name> for the executable status.