Common Functions¶

Here are the functions used by the server and the commandline scripts.

ntg_common.cbgm_common¶

Common routines for the CBGM.

class CBGM_Params¶

Structure that holds intermediate results of the CBGM.

ancestor_matrix = None¶: Integer matrix (ranges x mss x mss) with counts of the passages that are older in ms1 than in ms2. This matrix is asymmetrical.

and_matrix = None¶: Integer matrix (ranges x mss x mss) with counts of the passages that are defined in both mss.

def_matrix = None¶: Boolean matrix (mss x passages) set if ms. is defined at passage.

eq_matrix = None¶: Integer matrix (ranges x mss x mss) with counts of the passages that are equal in both mss.

labez_matrix = None¶: Integer matrix (mss x passages) of labez. Each entry represents one reading: 0 = lacuna, 1 = ‘a’, 2 = ‘b’, … Used by the pre-coherence computations.

n_mss = 0¶: No. of manuscripts

n_passages = 0¶: No. of passages

n_ranges = 0¶: No. of ranges

parent_matrix = None¶: Integer matrix (ranges x mss x mss) with counts of the passages that are older in ms1 than in ms2, using only immediate descendence. This matrix is asymmetrical.

ranges = None¶: list of (named tuple Range)

unclear_ancestor_matrix = None¶: Integer matrix (ranges x mss x mss) with counts of the passages whose relationship is unclear in ms1 and ms2.

unclear_parent_matrix = None¶: Integer matrix (ranges x mss x mss) with counts of the passages whose relationship is unclear in ms1 and ms2, using only immediate descendence.

variant_matrix = None¶: Boolean (1 x passages) matrix of invariant passages. We will need this the day we decide not to eliminate all invariant readings from the database.

calculate_mss_similarity_postco(dba, parameters, val, do_checks=True)¶

Calculate post-coherence mss similarity

Genealogical coherence outputs asymmetrical matrices. Loop over all mss O(n_mss² * n_ranges * n_passages).

The main idea in this function is to get the DAG (directed acyclic graph) into a representation that can be used by numpy. Numpy gives us a tremendous speed boost.

For every passage and every reading we build

a bitmask for the reading and
a bitmask for all prior readings of this reading.

Then for every passage and every manuscript we look up what the manuscript offers and store the relative bitmasks in 2 matrices.

For illustration we refer to this passage (Mc 10:10/16-22 pass_id == 3240):

In a first step every reading (labez and clique) gets assigned a bitmask:

labez | clique |                               mask
------+--------+-----------------------------------------------------------------
?     |        | 0000000000000000000000000000000000000000000000000000000000000001
a     | 1      | 0000000000000000000000000000000000000000000000000000000000000010
a     | 2      | 0000000000000000000000000000000000000000000000000000000000000100
b     | 1      | 0000000000000000000000000000000000000000000000000000000000001000
c     | 1      | 0000000000000000000000000000000000000000000000000000000000010000
c     | 2      | 0000000000000000000000000000000000000000000000000000000000100000
c     | 3      | 0000000000000000000000000000000000000000000000000000000001000000
d     | 1      | 0000000000000000000000000000000000000000000000000000000010000000
d     | 2      | 0000000000000000000000000000000000000000000000000000000100000000
e     | 1      | 0000000000000000000000000000000000000000000000000000001000000000
f     | 1      | 0000000000000000000000000000000000000000000000000000010000000000
f     | 2      | 0000000000000000000000000000000000000000000000000000100000000000
g     | 1      | 0000000000000000000000000000000000000000000000000001000000000000
h     | 1      | 0000000000000000000000000000000000000000000000000010000000000000
i     | 1      | 0000000000000000000000000000000000000000000000000100000000000000
j     | 1      | 0000000000000000000000000000000000000000000000001000000000000000
k     | 1      | 0000000000000000000000000000000000000000000000010000000000000000
l     | 1      | 0000000000000000000000000000000000000000000000100000000000000000
m     | 1      | 0000000000000000000000000000000000000000000001000000000000000000
n     | 1      | 0000000000000000000000000000000000000000000010000000000000000000
o     | 1      | 0000000000000000000000000000000000000000000100000000000000000000
p     | 1      | 0000000000000000000000000000000000000000001000000000000000000000
q     | 1      | 0000000000000000000000000000000000000000010000000000000000000000
r     | 1      | 0000000000000000000000000000000000000000100000000000000000000000
s     | 1      | 0000000000000000000000000000000000000001000000000000000000000000
t     | 1      | 0000000000000000000000000000000000000010000000000000000000000000
u     | 1      | 0000000000000000000000000000000000000100000000000000000000000000
v     | 1      | 0000000000000000000000000000000000001000000000000000000000000000
v     | 2      | 0000000000000000000000000000000000010000000000000000000000000000
v     | 3      | 0000000000000000000000000000000000100000000000000000000000000000
w     | 1      | 0000000000000000000000000000000001000000000000000000000000000000

Note that we have an extra bitmask for ‘?’. This allows quick testing for unknown origin.

In the second step we build the ancestor bitmasks.

Reading ‘f’ has prior readings ‘c’, ‘m’, and ‘a’. Thus the ancestor bitmask for reading ‘f’ is the bitwise_or of the masks for ‘c’, ‘m’, and ‘a’:

labez | clique |                               mask
------+--------+-----------------------------------------------------------------
c     | 1      | 0000000000000000000000000000000000000000000000000000000000010000
m     | 1      | 0000000000000000000000000000000000000000000001000000000000000000
a     | 1      | 0000000000000000000000000000000000000000000000000000000000000010

labez | clique |                            ancestor mask
------+--------+-----------------------------------------------------------------
f     | 1      | 0000000000000000000000000000000000000000000001000000000000010010

Another example: Reading ‘w’ has prior readings ‘a2’, ‘a2’ is of unknown origin. The ancestor mask for ‘w’ is the bitwise_or of the masks for ‘a2’ and ‘?’:

labez | clique |                               mask
------+--------+-----------------------------------------------------------------
a     | 2      | 0000000000000000000000000000000000000000000000000000000000000100
?     |        | 0000000000000000000000000000000000000000000000000000000000000001

labez | clique |                            ancestor mask
------+--------+-----------------------------------------------------------------
w     | 1      | 0000000000000000000000000000000000000000000000000000000000000101

After building the masks for every reading at every passage we put the masks into 2 matrices of dimension (mss x passages), the mask_matrix and the ancestor_matrix. The mask_matrix contains the mask for the reading the manuscript offers, the ancestor_matrix contains the ancestor mask for that reading.

Manuscript 1457 (ms_id == 156) (at pass_id == 3240) reads ‘c’, so the mask_matrix contains:

mask_matrix[156,3240] = b'0000000000000000000000000000000000000000000000000000000000010000'

Manuscript 706 (ms_id == 102) (at pass_id == 3240) reads ‘f’, so the ancestor_matrix contains:

ancestor_matrix[102,3240] = b'0000000000000000000000000000000000000000000001000000000000010010'

To test for ancestrality between mss. 1457 and 706 we do a bitwise_and of the mask_matrix of 1457 and the ancestor_matrix of 706. If the result is non-zero then 1457 is ancestral to 706.

is_ancestral = np.bitwise_and (mask_matrix[156,3240], ancestor_matrix[102,3240]) > 0

But that would be very slow. Numpy allows to operate on whole matrix rows at a time so we can calculate the ancestrality for all passages with a single call to numpy.

is_ancestral = np.bitwise_and (mask_matrix[156], ancestor_matrix[102]) > 0

is_ancestral is an array of booleans. We only have to count how many elements of it are True to obtain the number of prior readings.

Reversing the role of the two manuscripts (mask_matrix and ancestor_matrix) gives us the number of posterior readings.

calculate_mss_similarity_preco(_dba, _parameters, val)¶

Calculate pre-coherence mss similarity

The pre-coherence similarity is defined as:

\[\mbox{similarity}=\frac{\mbox{equal passages}}{\mbox{passages in common}}\]

Kapitelweise füllen auf Basis von Vergleichen einzelner Variantenspektren in ECM_Acts_Sp. Vergleich von je zwei Handschriften: An wieviel Stellen haben sie gemeinsam Text, an wieviel Stellen stimmen sie überein bzw. unterscheiden sie sich (inklusive Quotient)? Die Informationen werden sowohl auf Kapitel- wie auch Buchebene festgehalten.

—VGA/VG05_all3.pl

count_by_range(a, range_starts, range_ends)¶

Count true bits in array ranges

Count the bits that are true in multiple ranges of the same array of booleans.

Parameters

a (np.Array of np.bool:) – Input array
range_starts (int[]) – Starting offsets of the ranges to count.
range_ends (int[]) – Ending offsets of the ranges to count.

create_labez_matrix(dba, parameters, val)¶: Create the labez matrix.

write_affinity_table(dba, parameters, val)¶: Write back the new affinity (and ms_ranges) tables.

ntg_common.config¶

The commandline and configuration stuff.

class Formatter(fmt=None, datefmt=None, style='%')¶

Logging formatter. Allows colorful formatting of log lines.

format(record)¶

Format the specified record as text.

The record’s attribute dictionary is used as the operand to a string formatting operation which yields the returned string. Before formatting the dictionary, a couple of preparatory steps are carried out. The message attribute of the record is computed using LogRecord.getMessage(). If the formatting string uses the time (as determined by a call to usesTime(), formatTime() is called to format the event time. If there is exception information, it is formatted using formatException() and appended to the message.

args = <ntg_common.config.Args object>¶: Globally accessible arguments from command line.

config_from_pyfile(filename)¶

Emulate the Flask config file parser.

Emulate the Flask config file parser so we can use the same config files for the server and the commandline script.

init_logging(args, *handlers)¶: Init the logging stuff.

ntg_common.db_tools¶

This module contains functions for database access.

class MySQLEngine(fn=('/etc/my.cnf', '/etc/mysql/my.cnf', '~/.my.cnf'), group=('mysql', 'client', 'client-server', 'client-mariadb'), db='')¶: Database Interface

class PostgreSQLEngine(**kwargs)¶

PostgreSQL Database Interface

get_connection_params(args={})¶

Get sqlalchemy connection parameters.

Try to get the connection parameters in turn from these sources:

get host, port, database, user from args
get PGHOST, PGPORT, PGDATABASE, PGUSER from args
get PGHOST, PGPORT, PGDATABASE, PGUSER from environment
use defaults

N.B. The postgres client library automatically reads the password from the file ~/.pgpass. It should be configured there.

static receive_checkout(dbapi_connection, _connection_record, _connection_proxy)¶: Set a default for the postgres variable ntg.user_id that is used by Transaction-Time State Tables tables.

vacuum()¶: Vacuum the database.

wait_for_server(retries=60)¶: Wait for the Postgres server to come up.

fix(conn, msg, check_sql, fix_sql, parameters)¶

Check and eventually fix errors.

Executes the check_sql statement to check for a possible error conditions and, if rows emerge, prints a warning and executes the fix_sql statement. The fix_sql statement should be written as to fix the errors reported by the check_sql statement. Finally it executes the check_sql statement again and reports an error if the error condition still exists.

Parameters

msg (str) – The warning / error message
check_sql (str) – The sql statement that checks for the error condition.
fix_sql (str) – The sql statement that fixes the error condition.

init_default_cliques(conn)¶

Generate a default cliques table.

In a default cliques table there is a default clique ‘1’ for every reading in the readings table.

init_default_locstem(conn)¶

Generate a default locstem table.

In a default LocStemEd, labez ‘a’ is the original reading and every other reading depends on labez ‘a’, except in a Fehlvers, where ‘b’ is of unknown origin and every other reading depends on ‘b’.

init_default_ms_cliques(conn)¶

Generate a default ms_cliques table.

In a default ms_cliques table there is a default clique ‘1’ for every reading in the apparatus table.

local_stemma_to_nx(conn, pass_id, add_isolated_roots=False)¶

Load a passage from the database into an nx Graph.

Parameters: add_isolated_roots (bool) – Add an ‘*’ or ‘?’ node even if they are isolated. Needed in edit mode.

tabulate(res)¶

Format and output a rowset

Uses an output format similar to the one produced by the mysql commandline utility.

ntg_common.exceptions¶

exception EditError(message, status_code=None, payload=None)¶

exception EditException(message, status_code=None, payload=None)¶: See: http://flask.pocoo.org/docs/0.12/patterns/apierrors/

exception PrivilegeError(message, status_code=None, payload=None)¶

ntg_common.tools¶

This module contains some useful functions.

BOOKS = [(1, 'Mt', 'Matthew', 28), (2, 'Mc', 'Mark', 16), (3, 'L', 'Luke', 24), (4, 'J', 'John', 21), (5, 'Acts', 'Acts', 28), (6, 'R', 'Romans', 16), (7, '1K', '1 Corinthians', 16), (8, '2K', '2 Corinthians', 13), (9, 'G', 'Galatians', 6), (10, 'E', 'Ephesians', 6), (11, 'Ph', 'Philippians', 4), (12, 'Kol', 'Colossians', 4), (13, '1Th', '1 Thessalonians', 5), (14, '2Th', '2 Thessalonians', 3), (15, '1T', '1 Timothy', 6), (16, '2T', '2 Timothy', 4), (17, 'Tt', 'Titus', 3), (18, 'Phm', 'Philemon', 1), (19, 'H', 'Hebrews', 13), (20, 'Jc', 'James', 5), (21, '1P', '1 Peter', 5), (22, '2P', '2 Peter', 3), (23, '1J', '1 John', 5), (24, '2J', '2 John', 1), (25, '3J', '3 John', 1), (26, 'Jd', 'Jude', 1), (27, 'Ap', 'Revelation', 22), (210, '2Sam', '2 Samuel', 2)]¶: Titles of the NT books

BYZ_HSNR = {'2 Samuel': None, 'Acts': '(300010, 300180, 300350, 303300, 303980, 304240, 312410)', 'CL': '(300010, 300180, 300350, 303300, 303980, 304240, 312410)', 'John': '(200070, 200280, 200450, 300180, 300350, 302260, 313200)', 'Mark': '(300030, 300180, 300350, 301050, 302610, 303510, 326070)'}¶

Manuscripts attesting the Byzantine Text.

We use these manuscripts as templates to establish the Byzantine Text according to our rules.

FEHLVERSE = '\n (\n begadr >= 20716000 and endadr <= 20716999 or\n begadr >= 20944000 and endadr <= 20944999 or\n begadr >= 20946000 and endadr <= 20946999 or\n begadr >= 21126000 and endadr <= 21126999 or\n begadr >= 21528000 and endadr <= 21528999 or\n begadr >= 21609000 and endadr <= 21620999 or\n begadr >= 21608068 and endadr <= 21620999 or\n\n begadr >= 50837002 and endadr <= 50837047 or\n begadr >= 51534002 and endadr <= 51534013 or\n begadr >= 52406020 and endadr <= 52408015 or\n begadr >= 52829002 and endadr <= 52829025\n )\n '¶

Verses added in later times.

These verses were added to the NT in later times. Because they are not original they are not included in the text of manuscript ‘A’.

graphviz_layout(dot, format='dot')¶: Call the GraphViz dot program to generate an image but mostly to precompute the graph layout.

log(level, msg, *aargs, **_kwargs)¶: Low level log function