1. GBDS Batch Operations¶

1.1. Introduction¶

GBDS Batch Operations application is designed to perform extraction operations over the entire GBDS database, such as re-extracting biometric templates from GBDS.

This manual will describe the installation procedures, how to configure Batch Operations, how to set the correct mode to your environment and will describe logs and metrics available.

This manual is updated for GBDS Batch Operations 4.6.1.

Warning

GBDS Batch Operations should preferably be run isolated from GBDS. If you need to run GBDS Batch Operations in the same node as GBDS, it is recommended to stop GBDS during the operation.

1.2. Installation¶

To install the application, download the correct RPM file and install it with the command below:

sudo rpm -Uhv batch-operations-<version>.x86_64.rpm

The following folders/files will be created during the installation process:

/var/lib/griaule/batch-operations/	Application .jar file
/var/lib/griaule/batch-operations/lib	Libraries used by the application
/var/lib/griaule/batch-operations/scripts	Scripts for start/stop
/var/log/griaule/batch-operations/	Log files
/etc/griaule/conf/batch-operations/	Configuration Files

1.2.1. Configuring application.conf¶

Some changes in the configurations may be needed when installing GBDS Batch Operations, those changes must be applied to application.conf file. The description of each parameter can be found at section Configuration Files. Under all blocks below, all text between <> must be changed to the correct parameter of the user environment.

The configuration file directory is /etc/griaule/conf/batch-operations/application.conf.

First, set the loglevel from info to warning.

loglevel = "info"

Set the hostname to the current machine hostname:

hostname = <HOSTNAME>

Set cluster of Akka nodes:

cluster {
  seed-nodes = [
    " akka://main@<hostname1>:2553”
  ]

Set the number of nodes executing the boot:

role.manager.min-nr-of-members=<number of akka cluster nodes>

Tip

As Batch Operations is used to operate in single nodes, we recommend setting this parameter to 1.

Set Zookeeper quorum:

gbds.cluster.zookeeper.quorum="<zookeeper_quorum>:2181"

And finish configuring template locations:

gbds.node.put.size=30
gbds.node.scan.size=100
gbds.node.scanners.number=2
gbds.node.workers.number=10
gbds.node.writers.number=2
gbds.node.writer.cooldown=1500

# cf read configurations
gbds.operations.cf.read.finger="fingerprint-read"
gbds.operations.cf.read.palm="palmprint-read"
gbds.operations.cf.read.face="face-read"
gbds.operations.cf.read.iris="iris-read"
gbds.operations.cf.read.newborn-palm="newborn-palmprint-read"

# cf write configurations
gbds.operations.cf.write.finger="fingerprint-write"
gbds.operations.cf.write.palm="palmprint-write"
gbds.operations.cf.write.face="face-write"
gbds.operations.cf.write.iris="iris-write"
gbds.operations.cf.write.newborn-palm="newborn-palmprint-write"

gbds.operations.native.fnet=false
gbds.operations.native.fnet-qual=0

gbds.operations.modality.finger=false
gbds.operations.modality.palm=false
gbds.operations.modality.face=true
gbds.operations.modality.iris=false
gbds.operations.modality.newborn-palm=false

gbds.operations.mode.extract=false
gbds.operations.mode.simplify=true

gbds.operations.simplify.activate-person=true

gbds.operations.read.base64=false
gbds.operations.xml.drop.templates=true

gbds.operations.worker-actor.heap-size="1024m"

gbds.operations.extraction.retries=2

batch.scan.mode.start-row="<change>"
batch.scan.mode.end-row="<change>"
batch.scan.mode.name="BY_RANGE"

gbds.node.put.size=30
gbds.node.scan.size=100
gbds.node.scanners.number=2
gbds.node.workers.number=5
gbds.node.writers.number=2

#Create a new Column Family, it needs to be different from the existing Column family

gbds.operations.cf.read.finger="<NEWfingerprintCF>"
gbds.operations.cf.read.palm="<NEWpalmprintCF>"
gbds.operations.cf.read.face="<NEWfaceCF>"
gbds.operations.cf.read.iris="<NEWirisCF>"
gbds.operations.cf.read.newborn-palm="<NEWnewborn-palmprintCF>"

gbds.operations.cf.write.finger="<NEWfingerprintCF>"
gbds.operations.cf.write.palm="<NEWpalmprintCF>"
gbds.operations.cf.write.face="<NEWfaceCF>"
gbds.operations.cf.write.iris="<NEWirisCF>"
gbds.operations.cf.write.newborn-palm="<NEWnewborn-palmprintCF>"

gbds.operations.native.fnet=false
gbds.operations.native.fnet-qual=0

gbds.operations.modality.finger=false
gbds.operations.modality.palm=false
gbds.operations.modality.face=true
gbds.operations.modality.iris=false
gbds.operations.modality.newborn-palm=false

gbds.operations.mode.extract=true
gbds.operations.mode.simplify=true

gbds.operations.read.base64=false
gbds.operations.xml.drop.templates=true

#DEPRECATED RDB CONFIGURATIONS

gbds.cluster.rdb.url="jdbc:mysql://<rdb_url>:3306/gbds?useSSL=false"
gbds.cluster.rdb.user="<user>"
gbds.cluster.rdb.password="<password>"
gbds.operations.mode.rdb=false
gbds.operations.rdb.people.flag="people1"
gbds.operations.rdb.quality.flag="quality1"
gbds.operations.rdb.exceptions.flag="exceptions1"
gbds.operations.rdb.force-validation=true

1.3. Operation¶

There are two operation modes for GBDS Batch operations:

Extract: This mode extracts new full templates and writes them into the Hbase transactions table.
Simplify: This mode reduces the already existing templates and writes them into the HBase people table. If there are no previously extracted templates, it will trigger the extraction mode.

Each mode can be run individually or in combination, depending on the configuration parameters provided through the configuration file.

1.4. Scan Mode¶

There are three scan modes for GBDS Batch Operations, those scan modes define how the software will scan the transactions and save them in memory to operate. Those are defined in the batch.scan.mode.name configuration parameter. The options are:

BY_REGION: This mode will get the regions as arranged by the HBASE and distribute a region for each Akka actor. You can use this scan mode to scan all base with only one Batch Operation instance. Its execution behavior depends on the cluster, if there is only one node, it behaves similarly to the NODE_ONLY mode (execution in a single node). If there are more than one node, it covers the entire cluster.
NODE_ONLY: Execute only the current node regions. If you have more than one node, it will be necessary to run one instance of Batch Operation by node. In this mode, there is no distribution of scans, each node will scan only its own regions.
BY_RANGE: This mode scans an user-defined region of the HBASE. It is used to subdivide the batch processing in many instances. This mode requires two additional configurations to be used:
- batch.scan.mode.start-row
- batch.scan.mode.end-row
Both configurations values are strings, which range from 0 (representing the first transaction in HBASE) to a partial or full GUID value, that is, you can insert a partial GUID value. The GUID may be a TGUID or PGUID, according to the type of operation, see details in the Note below.

Tip

For example: if the configurations are batch.scan.mode.start-row=0 and batch.scan.mode.end-row="F1F14ADA", Batch Operations will iterate from the first transaction until the last GUID starting with F1F14ADA is found, not respecting HBASE regions divisions.

Note

If set to Extract mode, that is, with the parameter gbds.operations.mode.extract set to true, it runs on the Transaction table and the GUID will be a TGUID. If set to Simplify mode, that is, with the parameter gbds.operations.mode.simplify set to true, it runs on the People table and the GUID will be a PGUID.

1.5. Configuration Files¶

The configuration file directory is etc/griaule/conf/batch-operations/application.conf.

The custom attributes that can be changed in application.conf file are:

Configuration Parameter	Description
akka.cluster.seed-nodes	Contains all nodes quorums in the cluster in the array format. Each value must be in `akka.tcp://main@<hostname>:2553` format, being hostname the node hostname. This configuration must be equal for all cluster nodes.
akka.cluster.role.manager.min-nr-of-members	Determines the minimum number of nodes that must be up in the cluster. This number MUST be equal to the total number of nodes in the cluster. The value must be the same for each node in the cluster.
gbds.operations.mode.extract	Determines whether to run in Extract mode. Can be true or false, defaulting to true. All configuration depending on extraction will take effect only if this value is set to true
gbds.operations.mode.simplify	Determines whether to run in Simplify mode. Can be true or false, defaulting to true. If it is needed to extract the templates, it will use the extraction configurations to execute it
gbds.operations.simplify.activate-person	Default: `true` If `true`, activate person on simplify. If `false`, leave it as found.
gbds.cluster.zookeeper.quorum	Defines the hostname and port through which zookeeper servers can be found. Each value must be separated by commas if more than one is available. This configuration must be equal for all cluster nodes.
gbds.node.scan.size	Defines the size of the buffer for each scanner actor in the node. Its value must be in range of 2 to 10000, with default of 1000. Important The amount of RAM used is proportional to this buffer size * gbds.node.scanners.number.
gbds.node.put.size	Defines the size of the buffer for each writer actor in the node. Its value must be in range of 2 to 10000, with a default of 300. An optimal value in production is ⅓ of the scan buffer size. Important The amount of RAM used is proportional to this buffer size * gbds.node.writers.number.
gbds.node.scanners.number	Defines the number of scanner actors in the node. Its value must be in range of 2 to 10000, with a default of 2. It must not exceed the number of threads in the node.
gbds.node.workers.number	Defines the number of extractor actors in the node. Its value must be in range 1 to 10000, with a default of 10. It must not exceed the number of threads in the node.
gbds.node.writers.number	Defines the number of writer actors in the node. Its value must be in range of 1 to 10000, with a default of 2. It must not exceed the number of threads in the node.
gbds.operations.cf.read.finger gbds.operations.cf.read.palm gbds.operations.cf.read.face gbds.operations.cf.read.iris gbds.operations.cf.read.newborn-palm	Defines the column family name used to get the old templates (if existent).
gbds.operations.cf.write.finger gbds.operations.cf.write.palm gbds.operations.cf.write.face gbds.operations.cf.write.iris gbds.operations.cf.write.newborn-palm	Defines the column family name used to substitute, or add (if non-existent), the new templates. Also determines the flag saved to prevent redoing this register. Flag Format: transaction:<column-family>
gbds.operations.modality.finger gbds.operations.modality.palm gbds.operations.modality.face gbds.operations.modality.iris gbds.operations.modality.newborn-palm	Enables biometric template extraction. Takes effect if extraction or simplify modes are activated. The value must be true or false.
gbds.operations.xml.drop.templates	If true, deletes the templates from person-xml, if they exist. The value must be true or false, defaulting to true.
gbds.operations.read.base64	Defines if Batch Operations will read the templates in binary or base64. If the value is set to true, it will read as base64, if false, as binary.
gbds.operations.native.fnet	Enables native fingernet extraction. It must be true or false, defaulting to false.
gbds.operations.native.fnet-qual	Sets the quality threshold that activates fingernet extraction. It must be a value from 0 to 101, where 0 indicates to never use fnet and 101 will use it in all fingers. Default value is 0.
batch.scan.mode.name	Define the batch operations scan mode. Those are explained at Scan Mode section
batch.scan.mode.start-row	Defines the start row of the BY_REGION scan mode. The value is the TGUID or the partial TGUID of the HBASE transaction.
batch.scan.mode.end-row	Defines the end row of the BY_REGION scan mode. The value is the TGUID or the partial TGUID of the HBASE transaction.
gbds.operations.worker-actor.heap-size	Limit worker heap size. Default value is 1024m.
gbds.operations.extraction.retries	Defines the number of retries for worker extraction operation. Default: 2 Min: 0 Max: 10

1.6. Logs¶

The application logs are stored in /var/log/griaule/batch-operations, and are separated into two files:

console.out: logs the default output prints from the system;
batch-operations.log: application main logs.

To follow the application logs in realtime, run:

tail -F /var/log/griaule/batch-operations/batch-operations.log

1.7. Metrics¶

Local metrics are saved in /var/log/griaule/batch-operations/metrics.txt for each node, being updated every second.

Global metrics are saved in /var/log/griaule/batch-operations/global-metrics.txt and are updated every two seconds.

To monitor metrics run the following command:

watch -n 2 cat /var/log/griaule/batch-operations/global_metrics.txt

The metrics files register how many regions from HBase the node received to process. The following parameters are also registered: How many regions are currently used by local/total scanners, the total number of people already processed, the total number of biometrics extracted, the number of people templates saved back to HBase and relational database, and the elapsed times for all described operations.

1.8. Scripts Commands¶

In the /var/lib/griaule/batch-operations/scripts folder, there are two scripts used to start and stop the application.

start_node.sh	Starts the application in the current node only, loading the libraries and the configuration file
kill_node.sh	Kills the application process for the current node only.

1.9. RDB - Deprecated¶

In early versions, the Batch Operations used to help migrating existing register to the RDB, using the operation mode RDB. In the current version, this mode is deprecated and should not be used. The configurations relative to RDB were not removed from the software, and for awareness, they will be described below:

Important

If you need to migrate the RDB, contact Griaule Support Team.

Configuration Parameter	Description
gbds.operations.mode.rdb	Determines whether to run in RDB mode. Can be true or false, defaulting to false
gbds.cluster.rdb.url	Url to access the relational MySQL database. Must be in this format: `jdbc:mysql://<rdb_url>:3306/gbds?useSSL=false`.
gbds.cluster.rdb.user	User to access the relational MySQL database.
gbds.cluster.rdb.password	Password to access the relational MySQL database.
gbds.operations.rdb.people.flag	Determines the column name in Hbase to check whether a register should be migrated to the people table in the relational MySQL database. Format: `rdb:<column`.
gbds.operations.rdb.exceptions.flag	Determines the column name in Hbase to check whether a register should be migrated to the exception table in the relational MySQL database. Format: `rdb:<column`.