1. GBDS Batch Operations¶
1.1. Introduction¶
GBDS Batch Operations application is designed to perform extraction operations over the entire GBDS database, such as re-extracting biometric templates from GBDS.
This manual will describe the installation procedures, how to configure Batch Operations, how to set the correct mode to your environment and will describe logs and metrics available.
This manual is updated for GBDS Batch Operations 4.6.1.
Warning
GBDS Batch Operations should preferably be run isolated from GBDS. If you need to run GBDS Batch Operations in the same node as GBDS, it is recommended to stop GBDS during the operation.
1.2. Installation¶
To install the application, download the correct RPM file and install it with the command below:
sudo rpm -Uhv batch-operations-<version>.x86_64.rpm
The following folders/files will be created during the installation process:
/var/lib/griaule/batch-operations/ | Application .jar file |
/var/lib/griaule/batch-operations/lib | Libraries used by the application |
/var/lib/griaule/batch-operations/scripts | Scripts for start/stop |
/var/log/griaule/batch-operations/ | Log files |
/etc/griaule/conf/batch-operations/ | Configuration Files |
1.2.1. Configuring application.conf¶
Some changes in the configurations may be needed when installing GBDS Batch Operations, those changes must be applied to application.conf
file. The description of each parameter can be found at section Configuration Files. Under all blocks below, all text between <>
must be changed to the correct parameter of the user environment.
The configuration file directory is /etc/griaule/conf/batch-operations/application.conf
.
First, set the loglevel from info to warning.
loglevel = "info"
Set the hostname to the current machine hostname:
hostname = <HOSTNAME>
Set cluster of Akka nodes:
cluster {
seed-nodes = [
" akka://main@<hostname1>:2553”
]
Set the number of nodes executing the boot:
role.manager.min-nr-of-members=<number of akka cluster nodes>
Tip
As Batch Operations is used to operate in single nodes, we recommend setting this parameter to 1.
Set Zookeeper quorum:
gbds.cluster.zookeeper.quorum="<zookeeper_quorum>:2181"
And finish configuring template locations:
gbds.node.put.size=30
gbds.node.scan.size=100
gbds.node.scanners.number=2
gbds.node.workers.number=10
gbds.node.writers.number=2
gbds.node.writer.cooldown=1500
# cf read configurations
gbds.operations.cf.read.finger="fingerprint-read"
gbds.operations.cf.read.palm="palmprint-read"
gbds.operations.cf.read.face="face-read"
gbds.operations.cf.read.iris="iris-read"
gbds.operations.cf.read.newborn-palm="newborn-palmprint-read"
# cf write configurations
gbds.operations.cf.write.finger="fingerprint-write"
gbds.operations.cf.write.palm="palmprint-write"
gbds.operations.cf.write.face="face-write"
gbds.operations.cf.write.iris="iris-write"
gbds.operations.cf.write.newborn-palm="newborn-palmprint-write"
gbds.operations.native.fnet=false
gbds.operations.native.fnet-qual=0
gbds.operations.modality.finger=false
gbds.operations.modality.palm=false
gbds.operations.modality.face=true
gbds.operations.modality.iris=false
gbds.operations.modality.newborn-palm=false
gbds.operations.mode.extract=false
gbds.operations.mode.simplify=true
gbds.operations.simplify.activate-person=true
gbds.operations.read.base64=false
gbds.operations.xml.drop.templates=true
gbds.operations.worker-actor.heap-size="1024m"
gbds.operations.extraction.retries=2
batch.scan.mode.start-row="<change>"
batch.scan.mode.end-row="<change>"
batch.scan.mode.name="BY_RANGE"
gbds.node.put.size=30
gbds.node.scan.size=100
gbds.node.scanners.number=2
gbds.node.workers.number=5
gbds.node.writers.number=2
#Create a new Column Family, it needs to be different from the existing Column family
gbds.operations.cf.read.finger="<NEWfingerprintCF>"
gbds.operations.cf.read.palm="<NEWpalmprintCF>"
gbds.operations.cf.read.face="<NEWfaceCF>"
gbds.operations.cf.read.iris="<NEWirisCF>"
gbds.operations.cf.read.newborn-palm="<NEWnewborn-palmprintCF>"
gbds.operations.cf.write.finger="<NEWfingerprintCF>"
gbds.operations.cf.write.palm="<NEWpalmprintCF>"
gbds.operations.cf.write.face="<NEWfaceCF>"
gbds.operations.cf.write.iris="<NEWirisCF>"
gbds.operations.cf.write.newborn-palm="<NEWnewborn-palmprintCF>"
gbds.operations.native.fnet=false
gbds.operations.native.fnet-qual=0
gbds.operations.modality.finger=false
gbds.operations.modality.palm=false
gbds.operations.modality.face=true
gbds.operations.modality.iris=false
gbds.operations.modality.newborn-palm=false
gbds.operations.mode.extract=true
gbds.operations.mode.simplify=true
gbds.operations.read.base64=false
gbds.operations.xml.drop.templates=true
#DEPRECATED RDB CONFIGURATIONS
gbds.cluster.rdb.url="jdbc:mysql://<rdb_url>:3306/gbds?useSSL=false"
gbds.cluster.rdb.user="<user>"
gbds.cluster.rdb.password="<password>"
gbds.operations.mode.rdb=false
gbds.operations.rdb.people.flag="people1"
gbds.operations.rdb.quality.flag="quality1"
gbds.operations.rdb.exceptions.flag="exceptions1"
gbds.operations.rdb.force-validation=true
1.3. Operation¶
There are two operation modes for GBDS Batch operations:
- Extract: This mode extracts new full templates and writes them into the Hbase transactions table.
- Simplify: This mode reduces the already existing templates and writes them into the HBase people table. If there are no previously extracted templates, it will trigger the extraction mode.
Each mode can be run individually or in combination, depending on the configuration parameters provided through the configuration file.
1.4. Scan Mode¶
There are three scan modes for GBDS Batch Operations, those scan modes define how the software will scan the transactions and save them in memory to operate. Those are defined in the batch.scan.mode.name
configuration parameter. The options are:
BY_REGION: This mode will get the regions as arranged by the HBASE and distribute a region for each Akka actor. You can use this scan mode to scan all base with only one Batch Operation instance. Its execution behavior depends on the cluster, if there is only one node, it behaves similarly to the NODE_ONLY mode (execution in a single node). If there are more than one node, it covers the entire cluster.
NODE_ONLY: Execute only the current node regions. If you have more than one node, it will be necessary to run one instance of Batch Operation by node. In this mode, there is no distribution of scans, each node will scan only its own regions.
BY_RANGE: This mode scans an user-defined region of the HBASE. It is used to subdivide the batch processing in many instances. This mode requires two additional configurations to be used:
- batch.scan.mode.start-row
- batch.scan.mode.end-row
Both configurations values are strings, which range from 0 (representing the first transaction in HBASE) to a partial or full GUID value, that is, you can insert a partial GUID value. The GUID may be a TGUID or PGUID, according to the type of operation, see details in the Note below.
Tip
For example: if the configurations are
batch.scan.mode.start-row=0
andbatch.scan.mode.end-row="F1F14ADA"
, Batch Operations will iterate from the first transaction until the last GUID starting withF1F14ADA
is found, not respecting HBASE regions divisions.Note
If set to Extract mode, that is, with the parameter
gbds.operations.mode.extract
set totrue
, it runs on the Transaction table and the GUID will be a TGUID. If set to Simplify mode, that is, with the parametergbds.operations.mode.simplify
set totrue
, it runs on the People table and the GUID will be a PGUID.
1.5. Configuration Files¶
The configuration file directory is etc/griaule/conf/batch-operations/application.conf
.
The custom attributes that can be changed in application.conf
file are:
Configuration Parameter | Description |
---|---|
akka.cluster.seed-nodes | Contains all nodes quorums in the cluster in the array format. Each value must be
in akka.tcp://main@<hostname>:2553 format, being hostname the node
hostname. This configuration must be equal for all cluster nodes. |
akka.cluster.role.manager.min-nr-of-members | Determines the minimum number of nodes that must be up in the cluster. This number MUST be equal to the total number of nodes in the cluster. The value must be the same for each node in the cluster. |
gbds.operations.mode.extract | Determines whether to run in Extract mode. Can be true or false, defaulting to true. All configuration depending on extraction will take effect only if this value is set to true |
gbds.operations.mode.simplify | Determines whether to run in Simplify mode. Can be true or false, defaulting to true. If it is needed to extract the templates, it will use the extraction configurations to execute it |
gbds.operations.simplify.activate-person | Default:
true If
true , activate person on simplify.If
false , leave it as found. |
gbds.cluster.zookeeper.quorum | Defines the hostname and port through which zookeeper servers can be found. Each value must be separated by commas if more than one is available. This configuration must be equal for all cluster nodes. |
gbds.node.scan.size | Defines the size of the buffer for each scanner actor in the node. Its value must be in range of 2 to 10000, with default of 1000. Important The amount of RAM used is proportional to this buffer size * gbds.node.scanners.number. |
gbds.node.put.size | Defines the size of the buffer for each writer actor in the node. Its value must be in range of 2 to 10000, with a default of 300. An optimal value in production is ⅓ of the scan buffer size. Important The amount of RAM used is proportional to this buffer size * gbds.node.writers.number. |
gbds.node.scanners.number | Defines the number of scanner actors in the node. Its value must be in range of 2 to 10000, with a default of 2. It must not exceed the number of threads in the node. |
gbds.node.workers.number | Defines the number of extractor actors in the node. Its value must be in range 1 to 10000, with a default of 10. It must not exceed the number of threads in the node. |
gbds.node.writers.number | Defines the number of writer actors in the node. Its value must be in range of 1 to 10000, with a default of 2. It must not exceed the number of threads in the node. |
gbds.operations.cf.read.finger
gbds.operations.cf.read.palm
gbds.operations.cf.read.face
gbds.operations.cf.read.iris
gbds.operations.cf.read.newborn-palm
|
Defines the column family name used to get the old templates (if existent). |
gbds.operations.cf.write.finger
gbds.operations.cf.write.palm
gbds.operations.cf.write.face
gbds.operations.cf.write.iris
gbds.operations.cf.write.newborn-palm
|
Defines the column family name used to substitute, or add (if non-existent), the new templates. Also determines the flag saved to prevent redoing this register. Flag Format: transaction:<column-family> |
gbds.operations.modality.finger
gbds.operations.modality.palm
gbds.operations.modality.face
gbds.operations.modality.iris
gbds.operations.modality.newborn-palm
|
Enables biometric template extraction. Takes effect if extraction or simplify modes are activated. The value must be true or false. |
gbds.operations.xml.drop.templates | If true, deletes the templates from person-xml, if they exist. The value must be true or false, defaulting to true. |
gbds.operations.read.base64 | Defines if Batch Operations will read the templates in binary or base64. If the value is set to true, it will read as base64, if false, as binary. |
gbds.operations.native.fnet | Enables native fingernet extraction. It must be true or false, defaulting to false. |
gbds.operations.native.fnet-qual | Sets the quality threshold that activates fingernet extraction. It must be a value from 0 to 101, where 0 indicates to never use fnet and 101 will use it in all fingers. Default value is 0. |
batch.scan.mode.name | Define the batch operations scan mode. Those are explained at Scan Mode section |
batch.scan.mode.start-row | Defines the start row of the BY_REGION scan mode. The value is the TGUID or the partial TGUID of the HBASE transaction. |
batch.scan.mode.end-row | Defines the end row of the BY_REGION scan mode. The value is the TGUID or the partial TGUID of the HBASE transaction. |
gbds.operations.worker-actor.heap-size | Limit worker heap size. Default value is 1024m. |
gbds.operations.extraction.retries | Defines the number of retries for worker extraction operation.
Default: 2
Min: 0
Max: 10
|
1.6. Logs¶
The application logs are stored in /var/log/griaule/batch-operations
, and are separated into two files:
- console.out: logs the default output prints from the system;
- batch-operations.log: application main logs.
To follow the application logs in realtime, run:
tail -F /var/log/griaule/batch-operations/batch-operations.log
1.7. Metrics¶
Local metrics are saved in /var/log/griaule/batch-operations/metrics.txt
for each node, being updated every second.
Global metrics are saved in /var/log/griaule/batch-operations/global-metrics.txt
and are updated every two seconds.
To monitor metrics run the following command:
watch -n 2 cat /var/log/griaule/batch-operations/global_metrics.txt
The metrics files register how many regions from HBase the node received to process. The following parameters are also registered: How many regions are currently used by local/total scanners, the total number of people already processed, the total number of biometrics extracted, the number of people templates saved back to HBase and relational database, and the elapsed times for all described operations.
1.8. Scripts Commands¶
In the /var/lib/griaule/batch-operations/scripts
folder, there are two scripts used to start and stop the application.
start_node.sh | Starts the application in the current node only, loading the libraries and the configuration file |
kill_node.sh | Kills the application process for the current node only. |
1.9. RDB - Deprecated¶
In early versions, the Batch Operations used to help migrating existing register to the RDB, using the operation mode RDB. In the current version, this mode is deprecated and should not be used. The configurations relative to RDB were not removed from the software, and for awareness, they will be described below:
Important
If you need to migrate the RDB, contact Griaule Support Team.
Configuration Parameter | Description |
---|---|
gbds.operations.mode.rdb | Determines whether to run in RDB mode. Can be true or false, defaulting to false |
gbds.cluster.rdb.url | Url to access the relational MySQL database. Must be in this format:
jdbc:mysql://<rdb_url>:3306/gbds?useSSL=false . |
gbds.cluster.rdb.user | User to access the relational MySQL database. |
gbds.cluster.rdb.password | Password to access the relational MySQL database. |
gbds.operations.rdb.people.flag | Determines the column name in Hbase to check whether a register should be migrated to the people table in the relational MySQL database. Format: |
gbds.operations.rdb.exceptions.flag | Determines the column name in Hbase to check whether a register should be migrated to the exception table in the relational MySQL database. Format: |