SPID Monitoring

This manual describes the SPID environment, its startup, shutdown, monitoring process and other resources. The SPID environment is composed of a set of server-side services: SPID Server and Control Panel, GBDS, SQL Server and Ambari Services.

System Monitoring

Griaule recommends monitoring tools such as Zabbix, Cacti and other systems to automate tracking of system resources and performance.

SPID Status

One way to monitor SPID is through the API, available at the URL http://<hostname>:8082/gbs-spid-server/service/cluster/ping

Note that in the default configuration, SPID is configured on port 8082.

This test can be performed by a browser.

If SPID is working, the following message will be displayed:

Pong!

Server-side

There are two ways to check status via the terminal.

service spid status

ps aux | grep spid-server | grep -v grep

The response to these commands should show the process running.

SPID Control Panel Status

The SPID Control Panel is a web service and is available at the URL http://<hostname>:58086/gbs-spid-controlpanel. In the default configuration, the control panel runs on port 58086.

Server-side

There are two ways to check status via the terminal.

service spid-cp status

ps aux | grep spid-controlpanel | grep -v grep

The response to these commands should show the process running.

Idnservice Server-side

Griaule's IDN Service is an optional service, and when used, can be checked with the following commands:

service idnservice status

ps aux | grep spid-idnservice | grep -v grep

The response to these commands should show the process running.

GBDS API Status

One way to monitor the GBDS API is via the URL http://<hostname>:8085/gbds/v2/operations/ping.

Always point the URL to a GBDS node that hosts the API. In the default configuration, the API runs on port 8085. The API should return the following message:

{
	"data": "pong!"
}

An extra check, which also tests database access, is available at the following address:

http://<hostname>:8085/gbds/v2/exceptions/EndDate=1400000000000

By clicking the link, the API will fetch exceptions up to the date May 13, 2014 (in epoch time), so the API should not return exception messages. If the response is similar to the response below, the connection to the database is working.

{
	"pagination": {
		"total": 0,
		"count": 0,
		"pageSize": 0,
		"currentPage": 0,
		"totalPages": 0
	}
}

Instead of ping, one can list exceptions in the database, but this operation demands more resources, so it should be used with restrictions.

Server-side

The GBDS API runs via a service named gbdsapid. The following command can be used to check if this service is running.

Remember to repeat the command on each node where the API is running.

service gbsapid status

ps aux | grep gbsapi | grep -v grep

The response to these commands should show the API process running.

GBDS Status

Server-side

GBDS runs as a process. Remember to repeat the command on each node of the GBDS cluster.

The first command can be used to see if the GBDS process is running:

ps aux | grep -v grep | grep griaulebiometrics.gbds.driver.Driver

The output of this command should be displayed if the process is running.

The second command can be used to check the count of matchers:

ps aux | grep akka | grep -v grep | wc -l

The output of this command will show the number of matchers that are running.

Troubleshooting

GBDS

In case of problems GBDS should be restarted. First, it is necessary to check the service status and stop it.

su griaule

/var/lib/griaule/gbds/scripts/kill-cluster.sh

#Call again till all nodes return that no service is running
/var/lib/griaule/gbds/scripts/kill-cluster.sh

Then, as user griaule, the following script must be executed to start the driver.

/var/lib/griaule/gbds/scripts/start-cluster.sh

More details for GBDS can be found in the logs.

GBDS API

If there is any problem with the API, it should be restarted using a griaule or superuser.

service gbsapid restart #restart API
service gbsapid status #check api status

SPID

If there is any problem with SPID, it should be restarted using a griaule or superuser.

service spid restart #restart spid
service spid status #check spid status

SPID Control Panel

If there is any problem with the Control Panel, it should be restarted using a griaule or superuser.

service spid-cp restart
service spid-cp status

IDN Service

If there is any problem with the idnservice, it should be restarted using a griaule or superuser.

Remember that Griaule's IDN is optional, users may choose to implement it themselves.

service idnservice restart
service idnservice status

Logs

If any problem is found, the support team should be contacted. Once contact is made, it is important to send the logs related to the problem to reduce the time to fix it.

Application with error

Path to logs

HBase

/var/log/hbase/

HDFS

/var/log/hadoop/hdfs/hadoop-hdfs-datanode-hostname.log

GBDS

/var/log/griaule/gbds/gbds.log

GBDS API (start up process)

/var/log/griaule/gbsapi/console.out

GBDS API

/var/log/griaule/gbsapi/gbsapi.log

SPID

/var/log/griaule/spid/ac.log

SPID Control Panel

/var/log/griaule/spid/controlpanel.log

idnService

/var/log/griaule/idnservice/

Post-Cluster Restart Processes

If all cluster nodes are restarted simultaneously, Ambari services must be restarted manually. This procedure can also be used in case the environment goes offline as an initial approach to handle the incident.

Ambari services restart

To access the Ambari Control Panel, go to the URL http://<hostname>:8080 from a web browser and log in. By default, both the login and password are admin.

Then, in the left side panel, in the Services tab, click on ... and then on Start All. At the end of the operation, all services should be running (highlighted by a green dot). If any service fails to start, it will be marked in red and should be started manually.

In the upper right corner of the screen you have a gear icon; by pressing it the user can follow the current startup status.

You should check if the namenode is not running in Safemode due to some problem. Therefore, check the status of the namenode.

hdfs dfsadmin -fs hdfs://<hostname>:8020 -safemode get | grep 'Safe mode is OFF'

If the namenode is started in safemode, run the following command on Node 1 as the hdfs user.

sudo su - hdfs hdfs dfsadmin -safemode leave

Service startup

Start GBDS, GBDS API, SPID Control Panel and IDN Service as indicated in Troubleshooting

Shutdown

The following procedure should be used whenever production servers are shut down. This procedure can also be used in case the environment goes offline as an initial approach to handle the incident.

You need a superuser to make the calls.

service spid stop

service spid-cp stop

service gbsapid stop

/var/lib/griaule/gbscluster/scripts/kill-gbscluster.sh

#Call again till all nodes return that no service is running
/var/lib/griaule/gbscluster/scripts/kill-gbscluster.sh

Access the Ambari Control Panel via URL http://<hostname>:8080.
Stop all Hadoop services by clicking ... in “Services” and then on “Stop All”.

Additional Information

GBDS is CPU-bound, which means it will always use as much CPU as possible to perform its operations. Therefore, it is common for monitoring software to report high CPU usage on cluster nodes.

Last updated 5 months ago

Was this helpful?