Difference between revisions of "Management"

From CSLabsWiki
 
(113 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
{{services
 
{{services
  +
| screenshot = [[File:Cosi-management.png|200px]]
  +
| caption =
 
|ip_addr = 128.153.145.62
 
|ip_addr = 128.153.145.62
 
|contact_person = [[User:Jared|Jared Dunbar]]
 
|contact_person = [[User:Jared|Jared Dunbar]]
|last_update = ''January 2016''
+
|last_update = ''July 2017''
|services = server status indicator
+
|services = server status collection server
  +
|development_status = 30%
|category = VM
 
  +
|category = Machines
 
|handoff = no
 
|handoff = no
 
}}
 
}}
   
{{VM
 
| name = management
 
| screenshot = [[File:Cosi-management.png|200px]]
 
| caption = COSI Management
 
| maintainer = [[User:Jared|Jared Dunbar]]
 
| hostname = management
 
| operating_system = Debian 8.2
 
| ldap = none
 
| development_status = ready for deployment! (still in beta server side, but client side is mostly done)
 
| status = running
 
| vm_host = Felix
 
}}
 
   
'''Management''' is a VM on Felix that will be used for monitoring the status of VM's on other machines and the status of the hardware in the server room, ie. checking the CPU, RAM, and hard drive stats.
+
'''Management''' (stat3) is a vm used for monitoring the status of hosts in the server room, ie. checking the CPU, RAM, and hard drive stats, among other configurable things.
   
Each computer will have a startup executable written in BASH scripts and C++ executables that will send data periodically to '''Management''' which will be shown in an uptime page on a webpage that can easily be used to determine system uptime and service uptime.
+
Each computer in the server room that is configured sends data periodically which will be shown in an uptime page on a webpage that can easily be used to determine system uptime and service uptime among other things.
   
  +
Also, you can view COSI network stats at <del>http://management.cosi.clarkson.edu/cacti with the csguest user (and default password)</del> in raw data at http://stat.cosi.clarkson.edu/data since Cacti broke (again)
Additional planned features are:
 
*database system to store the data collected
 
*graph display of events?
 
*select server subnet(s)
 
*add specific server IP's not in the subnets
 
*move to some hardware - to be discussed at a forum near you
 
*manage battery backups and tell servers when exactly to power down in the event of an outage
 
*add some more master key configurations for fallback mechanisms
 
*add server specific key functions and configurations (such as owner information, contact details, and others)
 
   
  +
=Installing Management Clients=
   
  +
Required Software: Git, g++, make
Currently installed on the VM are the following:
 
   
  +
On Debian:
<pre>
 
htop openssh-client vim libmysql-java openjdk-7-jdk p7zip-full g++ sudo
 
</pre>
 
 
Required for the client side of the management software is:
 
   
  +
<pre style="background-color:#ffcccc">
<pre>
 
  +
apt update && apt install make g++ git
g++ top awk tail bash sed free
 
 
</pre>
 
</pre>
   
  +
==Clone with Git==
The source code for the client executable is available online at https://github.com/jrddunbr/management-client
 
   
  +
First, set git to allow all certificates, and get the files using Git.
The bash scripts are made wherever necessary (it's expandable and each server can theoretically have as many keys as it wants, each data parameter is stored as a key) and here are some functional examples:
 
   
  +
<pre style="background-color:#ffcccc">
CPU:
 
  +
git config --global http.sslVerify false
<pre>
 
  +
git clone https://gitlab.cosi.clarkson.edu/jared/manage2client.git
#!/bin/bash
 
DATA=$(
 
top -bn2 | \
 
grep "Cpu(s)" | \
 
sed -n '1!p' | \
 
sed "s/.*, *\([0-9.]*\)%* id.*/\1/" | \
 
awk '{print 100 - $1}')
 
echo $DATA
 
/manage/management-client 128.153.145.62 80 cpu $DATA
 
 
</pre>
 
</pre>
   
  +
Re-secure the system by only accepting repos with certificates.
Free-Ram:
 
<pre>
 
#!/bin/bash
 
DATA=`free -m | awk 'NR==2{printf "%sMB\n", $2-$7 }'`
 
echo $DATA
 
/manage/management-client 128.153.145.62 80 used-ram $DATA
 
</pre>
 
   
  +
<pre style="background-color:#ffcccc">
Total-Ram:
 
  +
git config --global http.sslVerify true
<pre>
 
#!/bin/bash
 
FREE_DATA=`free -m | grep Mem`
 
DATA=`echo $FREE_DATA | cut -f2 -d' '`MB
 
echo $DATA
 
/manage/management-client 128.153.145.62 80 total-ram $DATA
 
 
</pre>
 
</pre>
   
  +
==Prepare files==
Uptime:
 
<pre>
 
#!/bin/bash
 
DATA=$(uptime -p | sed -e 's/ /_/g')
 
echo $DATA
 
/manage/management-client 128.153.145.62 80 uptime "$DATA"
 
</pre>
 
   
  +
Move the folder to the root.
These scripts expect management-client.cpp to be compiled as
 
  +
<pre>
 
  +
<pre style="background-color:#ffcccc">
g++ management-client.cpp -o management-client --std=c++11
 
  +
mv manage2client /manage
 
</pre>
 
</pre>
and to be in the /manage folder (for simplicity, I tend to put them all in the same folder).
 
 
I also have one script that runs all of the client scripts and this script is started by a Systemd Unit file located in /etc/systemd/system/manage.service:
 
 
<pre>
 
[Unit]
 
Description=manage stuff
 
 
[Service]
 
ExecStart=/bin/bash /manage/run.sh
 
   
  +
Move the systemd service to the systemd serivces folder
[Install]
 
WantedBy=multi-user.target
 
   
  +
<pre style="background-color:#ffcccc">
  +
sudo mv /manage/manage.service /etc/systemd/system/manage.service
 
</pre>
 
</pre>
   
  +
==Configure system==
The Bash script that runs all the other bash scripts looks a lot like this:
 
   
  +
If the hard drive you want to track is not /dev/sda1, select a different mount point to track in totaldisk.sh and useddisk.sh
<pre>
 
#!/bin/bash
 
cd /manage
 
while true
 
do
 
/manage/cpu.sh &
 
/manage/free-ram.sh &
 
/manage/total-ram.sh &
 
/manage/uptime.sh &
 
sleep 20
 
done
 
</pre>
 
 
It is easy to make more customized bash scripts that will complete other tasks. The compiled file has an expected input of ./management-client (IP) (PORT) (KEY_NAME) (VALUE) and this causes a key to go up, and saves at the value. When the server gets this as a rest call, the server reads it because it's in the 145 subnet and then sets it into the data structures of the program.
 
   
  +
If you want to have virsh, edit run.sh, and un-comment the line with virsh.sh
Unfortunately for the time being, the 145 subnet is a hard-coded thing. In future releases, as I have more time to finish this, it will become more functional and more features will arise.
 
   
  +
If you want to poll faster, change sleep from 30 to 5. Any faster, and the Linux scheduler will fall behind on busy boxes.
The server side of the software is available at https://github.com/jrddunbr/management-server and is still a work in progress.
 
   
  +
==Compile Management for your platform==
It requires the following to be installed:
 
   
  +
<pre style="background-color:#ffcccc">
<pre>
 
  +
make
openjdk-7-jdk wget
 
 
</pre>
 
</pre>
   
  +
==Enable Systemd Services==
You place the compiled .jar file in a handy place along with a few files (most found in the Github repo as examples):
 
   
  +
<pre style="background-color:#ffcccc">
<pre>
 
  +
sudo systemctl enable manage
index.html # a template HTML file that is used to list all of the servers, uptimes, and other data.
 
  +
sudo systemctl start manage
server.html # a template HTML file that is used to list one server and all of the associated key and value pairs that it has.
 
templates.yml # a template YAML file that is used to create all of the server specific YAML files. Once these are made, they will appear in the servers folder created in the root that the jar is run in,
 
master.yml # a file that defines master keys, which are server side keys that define server characteristics locally, used to enable servers, specify if they are urgent to server uptime, and in the future the maintainers and if it's a VM, the VM-host operator.
 
 
</pre>
 
</pre>
   
  +
==Whitelist==
One downside to the whole system is that it depends on TALOS's HTTPS server to be running when this starts because it fetches the domain files. It can use a fallback mechanism where it copies the file to the hard drive as a backup, and you could technically put the file there for it to read. A new configuration key needs to be added to the master list before this will work however.. coming soon!
 
   
  +
Email dunbarj@clarkson.edu to get the server added to the whitelist
Inside the servers folder, there are configurable per-server configs.
 
   
  +
=Installing Management Server=
Make sure that you check that your YAML files are parsed properly or I guarantee that the Java code will crash. There are a few good online checkers out there.
 
   
  +
Start with an Arch VM
I made the startup script for the management server much the same as the client one, in fact I only changed the path to an executable SH file and changed the description slightly.
 
   
  +
==Set Hostname==
The edited SH file that starts it is as follows:
 
   
  +
Edit
<pre>
 
  +
cd /manage
 
  +
<pre style="background-color:#ccccff">
date >> runtime.log
 
  +
/etc/hostname
java -jar management-server.jar >> runtime.txt
 
 
</pre>
 
</pre>
   
  +
Clear the contents and enter this on the first line, and save
As a helpful tip, here's how to start and stop Systemd unit files, do these:
 
   
  +
<pre style="background-color:#ccffcc">
systemctl enable <name> for enabling units
 
  +
management
  +
</pre>
   
  +
==Set Network==
systemctl disable <name> for disabling
 
   
  +
Copy example ethernet-static to netctl folder
systemctl start <name> for starting
 
   
  +
<pre style="background-color:#ffcccc">
systemctl stop <name> for stopping
 
  +
cp /etc/netctl/examples/ethernet-static /etc/netctl/ehternet
  +
</pre>
   
  +
Edit
systemctl status <name> for the executable status.
 
   
  +
<pre style="background-color:#ccccff">
<br/>
 
  +
/etc/netctl/ethernet
Plans:
 
  +
</pre>
   
  +
Clear the contents and set it to this:
Hardware Implementation:
 
   
  +
<pre style="background-color:#ccffcc">
I am thinking of making this move from a VM onto a piece of hardware that is independent of LDAP (as it is now but I am still developing it), with a local sign on so that it can properly manage system statuses without being interfered with from major server downtime, which it is designed to track and not be affected by. The idea is that it can run even when everything else has crashed for one reason or another. (the server room will be on fire and this will tell us that when nothing else works, so long as there is ethernet to broadcast from)
 
  +
Description='A basic static ethernet connection'
 
  +
Interface=ens3 # Make sure this is the interface or you won't have a network
The main reason I am thinking of a single board computer is because it can easily be battery powered in the case of power outages to manage the battery backups without wasting power in a transformer, and is much safer to maintain and we know exactly how it will perform in power outages.
 
  +
Connection=ethernet
 
  +
IP=static
Having it in hardware at all allows us to interface to hardware such as battery backup cables and also to be able to take temperature readings of the room itself.
 
  +
Address=('128.153.145.62/24')
  +
Gateway='128.153.145.1'
  +
DNS=('128.153.145.3')
  +
</pre>
   
  +
=Objectives=
I would like to use either a pcDuino 3 Nano Lite (I have one and it works rather well - low cost, good specs, and GigE which is a good plus - on it's own bus so that it doesn't crash when USB is pegged like the raspberry pi) or a raspberry pi. I would rather use the pcDuino due to the better specs and the Gig E. Either chip would do the trick though.
 
   
  +
*Create a monitoring system that can monitor all of the servers, battery backups, network, and also some temperature sensors placed through the server room at strategic locations
If it does use a battery I think that I would use this https://learn.adafruit.com/adafruit-powerboost/pinouts along with a RC circuit to allow the pcDuino a few seconds to shut off before cutting a power transistor, effectively prohibiting the power draw any further than low battery to preserve the liPo that would be used. I would expect the pcDuino or raspberry pi to draw about 800mA up to 1200mA, depending upon what is connected to the USB ports.
 
  +
*Notify computers of when to power down in a power outage
  +
*Create API's that can be used to interface the management platform
   
  +
==Plans==
Idealy, a non powered hub connects all of the battery backups together into one hub, and then the SBC would read each charge level and serial number of the UPS in order to associate the correct device to the correct servers. This way, when the power goes out, we immediately shut down non essential hardware, and when the UPS's go low, we begin to shut down everything in a controlled manner, without any bad halt situations that might cause disk inconsistencies.
 
   
  +
* Update ALL instances of Management to stat3 when client completed (and depricate the old versions - we still have versions of Management 1.0 and manage2client out there)
The battery charge circuit, which would also power the board in operation when on utility power, would also be surge protected and shut off immediately when the power begins to fluxuate or do strange things, potentially using a relay to do this with some diodes for voltage protection on a detection line so that should surges happen it doesn't destroy the advanced voltage regulation electronics we create.
 
  +
* Create new server with authentication (both real encryption and perhaps OpenComputers compatible for Minecraft servers :P)
  +
* Configurable low power options
  +
* Email Notifications
  +
* Shutdowns
  +
* Sensors Interface - configurability is a must
  +
* Better web interface? Cookies? Logins? LDAP? PAM?
  +
* Ability for custom messages, custom dashboards?
   
  +
[[Category:Web Service]]
This is a general idea and I write my ideas only to share them, not that they are firm in any way and I ask that others help me in the selection of hardware to make sure that we make good decisions.
 

Latest revision as of 12:50, 29 July 2017

Management
Cosi-management.png
IP Address(es): 128.153.145.62
Contact Person: Jared Dunbar
Last Update: July 2017
Services: server status collection server


Management (stat3) is a vm used for monitoring the status of hosts in the server room, ie. checking the CPU, RAM, and hard drive stats, among other configurable things.

Each computer in the server room that is configured sends data periodically which will be shown in an uptime page on a webpage that can easily be used to determine system uptime and service uptime among other things.

Also, you can view COSI network stats at http://management.cosi.clarkson.edu/cacti with the csguest user (and default password) in raw data at http://stat.cosi.clarkson.edu/data since Cacti broke (again)

Installing Management Clients

Required Software: Git, g++, make

On Debian:

apt update && apt install make g++ git

Clone with Git

First, set git to allow all certificates, and get the files using Git.

git config --global http.sslVerify false
git clone https://gitlab.cosi.clarkson.edu/jared/manage2client.git

Re-secure the system by only accepting repos with certificates.

git config --global http.sslVerify true

Prepare files

Move the folder to the root.

mv manage2client /manage

Move the systemd service to the systemd serivces folder

sudo mv /manage/manage.service /etc/systemd/system/manage.service

Configure system

If the hard drive you want to track is not /dev/sda1, select a different mount point to track in totaldisk.sh and useddisk.sh

If you want to have virsh, edit run.sh, and un-comment the line with virsh.sh

If you want to poll faster, change sleep from 30 to 5. Any faster, and the Linux scheduler will fall behind on busy boxes.

Compile Management for your platform

make

Enable Systemd Services

sudo systemctl enable manage
sudo systemctl start manage

Whitelist

Email dunbarj@clarkson.edu to get the server added to the whitelist

Installing Management Server

Start with an Arch VM

Set Hostname

Edit

/etc/hostname

Clear the contents and enter this on the first line, and save

management

Set Network

Copy example ethernet-static to netctl folder

cp /etc/netctl/examples/ethernet-static /etc/netctl/ehternet

Edit

/etc/netctl/ethernet

Clear the contents and set it to this:

Description='A basic static ethernet connection'
Interface=ens3 # Make sure this is the interface or you won't have a network
Connection=ethernet
IP=static
Address=('128.153.145.62/24')
Gateway='128.153.145.1'
DNS=('128.153.145.3')

Objectives

  • Create a monitoring system that can monitor all of the servers, battery backups, network, and also some temperature sensors placed through the server room at strategic locations
  • Notify computers of when to power down in a power outage
  • Create API's that can be used to interface the management platform

Plans

  • Update ALL instances of Management to stat3 when client completed (and depricate the old versions - we still have versions of Management 1.0 and manage2client out there)
  • Create new server with authentication (both real encryption and perhaps OpenComputers compatible for Minecraft servers :P)
 * Configurable low power options
 * Email Notifications
 * Shutdowns
  • Sensors Interface - configurability is a must
  • Better web interface? Cookies? Logins? LDAP? PAM?
  • Ability for custom messages, custom dashboards?