.. _cbgm:

======
 CBGM
======

This page describes how to do the CBGM directly on the VM :ref:`vm`.

.. contents::
   :local:
   :depth: 1


Preparing the Database for the CBGM
===================================

All input data must be converted and imported into one Postgres database.

- The mysql database :code:`ECM` contains the apparatus of the *Editio
  Critica Maior* publication.
  This database is exported from the NTVMR application
  (New Testament Virtual Manuscript Room).

- The mysql database :code:`Leitzeile` contains the Leitzeile of the
  current Nestle-Aland edition or any other appropriate "Leitzeile".

- Optionally the mysql database :code:`VarGen` contains
  previous editorial decisions regarding the priority of the readings.
  If this database is not supplied default priorities are used.

.. pic:: uml
   :caption: Database Preparation for CBGM

   skinparam backgroundColor transparent

   database  "ECM"        as dbsrc1
   database  "Leitzeile"  as dbsrc2
   database  "VarGen"     as dbsrc3
   note left of dbsrc1: mysql

   component "import.py"  as import
   component "prepare.py" as prepare
   database  "Acts"       as db1
   database  "Acts"       as db2
   note left of db1: Postgres
   note right of import: copies mysql\nto Postgres
   note right of prepare: normalizes and\nchecks data integrity

   dbsrc1  --> import
   dbsrc2  --> import
   dbsrc3  --> import
   import  --> db1
   db1     --> prepare
   prepare --> db2

The `import.py` script copies the mysql databases into temporary tables of the
postgres database without doing any integrity checking.
The temporary tables are named :file:`original_*`.
These tables are very useful for finding and understanding data errors.

The `prepare.py` script reads the temporary tables in the Postgres database and
writes tables in a :ref:`database structure <cbgm-db>` suitable for doing the CBGM.
This structure is normalized and data integrity is enforced.
The script will print all data integrity errors found
and also log them in the file :file:`prepare.log`.

.. warning::

   The script will not complete if there are data integrity errors.

All data integrity errors that surface at this point must be fixed in the
source data with the aid of the NTVMR people.  Then the source data must be
imported again. This is an iterative and often very time-consuming process.


Applying the CBGM
=================

The `cbgm.py` script applies the CBGM method.
The CBGM is applied at the start of every new project phase.
It must also be applied immediately after the `prepare.py` script.

.. pic:: uml
   :caption: Applying the CBGM

   skinparam backgroundColor transparent

   database  "Acts"    as db1
   component "cbgm.py" as cbgm
   database  "Acts"    as db2
   note right of cbgm: applies the\nCBGM method

   db1  --> cbgm
   cbgm --> db2


.. _cbgm-new-project:

Starting a New Project
======================

To start a new project:

- create a new Postgres database,
- create local copies of the mysql databases,
- add an instance to the server,
- prepare the new Postgres database,
- run the CBGM,
- restart the application server.


Worked Example
--------------

As an example we will create a new project: Mark Phase 3.

The name of the new Postgres database is: :code:`mark_ph3`.

We assume having obtained two mysql database dumps from the NTVMR people:
:file:`ECM_Mark_Ph3.dump.bz2` and :file:`Nestle29.dump.bz2`.

ssh into the server.

.. note::

   You need to have permission to :code:`sudo -u postgres` and :code:`sudo -u ntg`.

First create a new Postgres database:

.. code-block:: bash

   sudo -u postgres ~ntg/prj/ntg/ntg/scripts/cceh/create_database.sh mark_ph3

Then import the database dumps into the local mysql databases:

.. code-block:: bash

   sudo -iu ntg

   mysql -e "CREATE DATABASE ECM_Mark_Ph3"
   mysql -e "CREATE DATABASE Nestle29"

   bzcat ECM_Mark_Ph3.dump.bz2 | mysql -D ECM_Mark_Ph3
   bzcat Nestle29.dump.bz2 | mysql -D Nestle29

Then create a new server instance.
The fastest way is to just copy an old instance configuration file and edit it:

.. code-block:: bash

   cd ~/prj/ntg/ntg/instance
   cp mark_ph22.conf mark_ph3.conf
   emacs mark_ph3.conf

Change all relevant parts of the instance configuration file.
See: :ref:`api-server-config-files`.

Use the `import.py` and `prepare.py` scripts to import
the mysql databases into Postgres and prepare them for CBGM:

.. code-block:: bash

   cd ~/prj/ntg/ntg
   python3 -m scripts.cceh.import  -vvv instance/mark_ph3.conf
   python3 -m scripts.cceh.prepare -vvv instance/mark_ph3.conf

(Note: If you came from :ref:`new-phase-update` continue there.)

Then run the CBGM with the `cbgm.py` script:

.. code-block:: bash

   python3 -m scripts.cceh.cbgm -vvv instance/mark_ph3.conf

Last, restart the application server:

.. code-block:: bash

   sudo /bin/systemctl restart ntg

If the server doesn't start, check for configuration errors:

.. code-block:: bash

   sudo /bin/journalctl -u ntg


Add the database to the file :file:`scripts/cceh/active_databases`.
This file controls :ref:`nightly and weekly backups<vm-backups>`.

.. code-block:: bash

   emacs scripts/cceh/active_databases

If you are satisfied with the new project,
you may drop the mysql databases.
The application server uses the Postgres database only.

.. code-block:: bash

   mysql -e "DROP DATABASE ECM_Mark_Ph3"
   mysql -e "DROP DATABASE Nestle29"


Starting a New Phase
====================

A new phase of the project is entered after the editors have completed a pass
over the whole text.
All editorial decisions taken during this pass are used to recalculate
the CBGM for the next phase.

To start a new phase:

- copy the database into a new database,
- add an instance to the server, and
- run the CBGM on the new instance.


Worked Example
--------------

As an example let us create a new Mark Phase 3 from an existing Mark Phase 2.2.

ssh into the server.

.. note::

   You need to have permission to sudo postgres and sudo ntg.

First stop the application server and make a copy of the mark_ph22 database:

.. code-block:: bash

   sudo -u ntg sudo /bin/systemctl stop ntg
   sudo -u postgres psql -c "CREATE DATABASE mark_ph3 TEMPLATE mark_ph22 OWNER ntg"
   sudo -u ntg sudo /bin/systemctl start ntg

Then create a new server instance:

.. code-block:: bash

   sudo -iu ntg
   cd ~/prj/ntg/ntg/instance
   cp mark_ph22.conf mark_ph3.conf

Change all relevant parts of the instance configuration file.
See: :ref:`api-server-config-files`.

.. code-block:: bash

   emacs mark_ph3.conf

Put the old database in read-only mode (set WRITE_ACCESS="nobody"):

.. code-block:: bash

   emacs mark_ph22.conf

Then run the CBGM on the *new* instance:

.. code-block:: bash

   cd ~/prj/ntg/ntg
   python3 -m scripts.cceh.cbgm -vvv instance/mark_ph3.conf

Last, restart the application server:

.. code-block:: bash

   sudo /bin/systemctl restart ntg

.. _new-phase-update:

Starting a New Phase With Apparatus Update
==========================================

Sometimes a new phase goes hand in hand with a change in the apparatus.

To update the apparatus while maintaining (most) editorial decisions:

- create a new database for the phase,
- add an instance to the server,
- prepare the new database with the new apparatus,
- save the editorial decisions from the old database,
- load the editorial decisions into the new database, and
- run the CBGM on the new instance.


Worked Example
--------------

As an example let us create a new Mark Phase 3 from an existing Mark Phase 2.2
using a new apparatus.

First follow the steps in :ref:`cbgm-new-project` above, until you reach the
CBGM step.

Put the old database in read-only mode (set WRITE_ACCESS="nobody"):

.. code-block:: bash

   cd ~/prj/ntg/ntg/instance
   emacs mark_ph22.conf
   sudo /bin/systemctl restart ntg

Then use the `save_edits.py` script to save the editorial decisions
of the previous phase and the `load_edits.py` script to load them
into the new instance:

.. code-block:: bash

   cd ~/prj/ntg/ntg
   python3 -m scripts.cceh.save_edits -vvv -o saved_edits.xml instance/mark_ph22.conf
   python3 -m scripts.cceh.load_edits -vvv -i saved_edits.xml instance/mark_ph3.conf

The last command will also output a list of passages in the old apparatus
that are missing or different in the new apparatus and store them
in the file :file:`load_edits.log`.

Then run the `cbgm.py` script on the *new* instance to apply the CBGM method:

.. code-block:: bash

   python3 -m scripts.cceh.cbgm -vvv instance/mark_ph3.conf

Restart the application server:

.. code-block:: bash

   sudo /bin/systemctl restart ntg

Add the database to the file :file:`scripts/cceh/active_databases`.
This file controls :ref:`nightly and weekly backups<vm-backups>`.

.. code-block:: bash

   emacs scripts/cceh/active_databases

If you are satisfied with the new project,
you may drop the mysql databases.
The application server uses the Postgres database only.

.. code-block:: bash

   mysql -e "DROP DATABASE ECM_Mark_Ph3"
   mysql -e "DROP DATABASE Nestle29"