1. GBDS 4 Software Requirements

1.1. Operating System

GBDS can be installed on the following operating systems:

  • CentOS 7
  • Red Hat 7
  • Oracle Linux 7
  • Oracle Linux 8

1.2. Hadoop

GBDS is based on Apache Hadoop version 3.1, which is a collection of open-source software. Hadoop provides multi-purpose tools for parallel and scalable systems. Currently, GBDS is integrated with the following Hadoop components:

  • Ambari: Provision, management, and monitoring of a Hadoop cluster
  • Kafka: A distributed streaming system for integrating real-time data
  • Zookeeper: Coordination service that enables synchronization across a cluster
  • HBase: Non-relational database management system
  • HDFS: Distributed file system designed to run on commodity hardware

1.3. Database

GBDS uses two different database systems, relational and non-relational:

  • HBase for biometric images and templates.
  • MySQL [1] for metadata (transactions, exceptions, criminal cases, biometric profiles, and unsolved latents.
[1]MySQL is recommended as certain Hadoop components rely on it internally, and this fact facilitates the interoperation between them, but it is possible to adapt to any other SQL database system.

1.4. Local Balancing

The template extraction from the biometric image requires more resources than the biometric comparison between templates and is realized within the GBDS API handler. To optimize the hardware usage, GBDS is highly parallel and every node in the cluster can be able to receive API requests if configured for, so, it is recommended to use a load balancer to distribute the requests equally between nodes for achieving the best performance. This way, there will be no overcharged nodes in the cluster.

It is possible to use either a hardware or software load balancer. A simple software solution for load balancing is HAProxy, a free open-source software that provides load balance and proxy server tools.