Common Functions¶
Here are the functions used by the server and the commandline scripts.
ntg_common.cbgm_common¶
Common routines for the CBGM.
- class CBGM_Params¶
- Structure that holds intermediate results of the CBGM. - ancestor_matrix = None¶
- Integer matrix (ranges x mss x mss) with counts of the passages that are older in ms1 than in ms2. This matrix is asymmetrical. 
 - and_matrix = None¶
- Integer matrix (ranges x mss x mss) with counts of the passages that are defined in both mss. 
 - def_matrix = None¶
- Boolean matrix (mss x passages) set if ms. is defined at passage. 
 - eq_matrix = None¶
- Integer matrix (ranges x mss x mss) with counts of the passages that are equal in both mss. 
 - labez_matrix = None¶
- Integer matrix (mss x passages) of labez. Each entry represents one reading: 0 = lacuna, 1 = ‘a’, 2 = ‘b’, … Used by the pre-coherence computations. 
 - n_mss = 0¶
- No. of manuscripts 
 - n_passages = 0¶
- No. of passages 
 - n_ranges = 0¶
- No. of ranges 
 - parent_matrix = None¶
- Integer matrix (ranges x mss x mss) with counts of the passages that are older in ms1 than in ms2, using only immediate descendence. This matrix is asymmetrical. 
 - ranges = None¶
- list of (named tuple Range) 
 - unclear_ancestor_matrix = None¶
- Integer matrix (ranges x mss x mss) with counts of the passages whose relationship is unclear in ms1 and ms2. 
 - unclear_parent_matrix = None¶
- Integer matrix (ranges x mss x mss) with counts of the passages whose relationship is unclear in ms1 and ms2, using only immediate descendence. 
 - variant_matrix = None¶
- Boolean (1 x passages) matrix of invariant passages. We will need this the day we decide not to eliminate all invariant readings from the database. 
 
- calculate_mss_similarity_postco(dba, parameters, val, do_checks=True)¶
- Calculate post-coherence mss similarity - Genealogical coherence outputs asymmetrical matrices. Loop over all mss O(n_mss² * n_ranges * n_passages). - The main idea in this function is to get the DAG (directed acyclic graph) into a representation that can be used by numpy. Numpy gives us a tremendous speed boost. - For every passage and every reading we build - a bitmask for the reading and 
- a bitmask for all prior readings of this reading. 
 - Then for every passage and every manuscript we look up what the manuscript offers and store the relative bitmasks in 2 matrices. - For illustration we refer to this passage (Mc 10:10/16-22 pass_id == 3240): - In a first step every reading (labez and clique) gets assigned a bitmask: - labez | clique | mask ------+--------+----------------------------------------------------------------- ? | | 0000000000000000000000000000000000000000000000000000000000000001 a | 1 | 0000000000000000000000000000000000000000000000000000000000000010 a | 2 | 0000000000000000000000000000000000000000000000000000000000000100 b | 1 | 0000000000000000000000000000000000000000000000000000000000001000 c | 1 | 0000000000000000000000000000000000000000000000000000000000010000 c | 2 | 0000000000000000000000000000000000000000000000000000000000100000 c | 3 | 0000000000000000000000000000000000000000000000000000000001000000 d | 1 | 0000000000000000000000000000000000000000000000000000000010000000 d | 2 | 0000000000000000000000000000000000000000000000000000000100000000 e | 1 | 0000000000000000000000000000000000000000000000000000001000000000 f | 1 | 0000000000000000000000000000000000000000000000000000010000000000 f | 2 | 0000000000000000000000000000000000000000000000000000100000000000 g | 1 | 0000000000000000000000000000000000000000000000000001000000000000 h | 1 | 0000000000000000000000000000000000000000000000000010000000000000 i | 1 | 0000000000000000000000000000000000000000000000000100000000000000 j | 1 | 0000000000000000000000000000000000000000000000001000000000000000 k | 1 | 0000000000000000000000000000000000000000000000010000000000000000 l | 1 | 0000000000000000000000000000000000000000000000100000000000000000 m | 1 | 0000000000000000000000000000000000000000000001000000000000000000 n | 1 | 0000000000000000000000000000000000000000000010000000000000000000 o | 1 | 0000000000000000000000000000000000000000000100000000000000000000 p | 1 | 0000000000000000000000000000000000000000001000000000000000000000 q | 1 | 0000000000000000000000000000000000000000010000000000000000000000 r | 1 | 0000000000000000000000000000000000000000100000000000000000000000 s | 1 | 0000000000000000000000000000000000000001000000000000000000000000 t | 1 | 0000000000000000000000000000000000000010000000000000000000000000 u | 1 | 0000000000000000000000000000000000000100000000000000000000000000 v | 1 | 0000000000000000000000000000000000001000000000000000000000000000 v | 2 | 0000000000000000000000000000000000010000000000000000000000000000 v | 3 | 0000000000000000000000000000000000100000000000000000000000000000 w | 1 | 0000000000000000000000000000000001000000000000000000000000000000 - Note that we have an extra bitmask for ‘?’. This allows quick testing for unknown origin. - In the second step we build the ancestor bitmasks. - Reading ‘f’ has prior readings ‘c’, ‘m’, and ‘a’. Thus the ancestor bitmask for reading ‘f’ is the bitwise_or of the masks for ‘c’, ‘m’, and ‘a’: - labez | clique | mask ------+--------+----------------------------------------------------------------- c | 1 | 0000000000000000000000000000000000000000000000000000000000010000 m | 1 | 0000000000000000000000000000000000000000000001000000000000000000 a | 1 | 0000000000000000000000000000000000000000000000000000000000000010 labez | clique | ancestor mask ------+--------+----------------------------------------------------------------- f | 1 | 0000000000000000000000000000000000000000000001000000000000010010 - Another example: Reading ‘w’ has prior readings ‘a2’, ‘a2’ is of unknown origin. The ancestor mask for ‘w’ is the bitwise_or of the masks for ‘a2’ and ‘?’: - labez | clique | mask ------+--------+----------------------------------------------------------------- a | 2 | 0000000000000000000000000000000000000000000000000000000000000100 ? | | 0000000000000000000000000000000000000000000000000000000000000001 labez | clique | ancestor mask ------+--------+----------------------------------------------------------------- w | 1 | 0000000000000000000000000000000000000000000000000000000000000101 - After building the masks for every reading at every passage we put the masks into 2 matrices of dimension (mss x passages), the mask_matrix and the ancestor_matrix. The mask_matrix contains the mask for the reading the manuscript offers, the ancestor_matrix contains the ancestor mask for that reading. - Manuscript 1457 (ms_id == 156) (at pass_id == 3240) reads ‘c’, so the mask_matrix contains: - mask_matrix[156,3240] = b'0000000000000000000000000000000000000000000000000000000000010000' - Manuscript 706 (ms_id == 102) (at pass_id == 3240) reads ‘f’, so the ancestor_matrix contains: - ancestor_matrix[102,3240] = b'0000000000000000000000000000000000000000000001000000000000010010' - To test for ancestrality between mss. 1457 and 706 we do a bitwise_and of the mask_matrix of 1457 and the ancestor_matrix of 706. If the result is non-zero then 1457 is ancestral to 706. - is_ancestral = np.bitwise_and (mask_matrix[156,3240], ancestor_matrix[102,3240]) > 0 - But that would be very slow. Numpy allows to operate on whole matrix rows at a time so we can calculate the ancestrality for all passages with a single call to numpy. - is_ancestral = np.bitwise_and (mask_matrix[156], ancestor_matrix[102]) > 0 - is_ancestral is an array of booleans. We only have to count how many elements of it are True to obtain the number of prior readings. - Reversing the role of the two manuscripts (mask_matrix and ancestor_matrix) gives us the number of posterior readings. 
- calculate_mss_similarity_preco(_dba, _parameters, val)¶
- Calculate pre-coherence mss similarity - The pre-coherence similarity is defined as: \[\mbox{similarity}=\frac{\mbox{equal passages}}{\mbox{passages in common}}\]- Kapitelweise füllen auf Basis von Vergleichen einzelner Variantenspektren in ECM_Acts_Sp. Vergleich von je zwei Handschriften: An wieviel Stellen haben sie gemeinsam Text, an wieviel Stellen stimmen sie überein bzw. unterscheiden sie sich (inklusive Quotient)? Die Informationen werden sowohl auf Kapitel- wie auch Buchebene festgehalten. - —VGA/VG05_all3.pl 
- count_by_range(a, range_starts, range_ends)¶
- Count true bits in array ranges - Count the bits that are true in multiple ranges of the same array of booleans. - Parameters
- a (np.Array of np.bool:) – Input array 
- range_starts (int[]) – Starting offsets of the ranges to count. 
- range_ends (int[]) – Ending offsets of the ranges to count. 
 
 
- create_labez_matrix(dba, parameters, val)¶
- Create the - labez matrix.
- write_affinity_table(dba, parameters, val)¶
- Write back the new affinity (and ms_ranges) tables. 
ntg_common.config¶
The commandline and configuration stuff.
- class Formatter(fmt=None, datefmt=None, style='%')¶
- Logging formatter. Allows colorful formatting of log lines. - format(record)¶
- Format the specified record as text. - The record’s attribute dictionary is used as the operand to a string formatting operation which yields the returned string. Before formatting the dictionary, a couple of preparatory steps are carried out. The message attribute of the record is computed using LogRecord.getMessage(). If the formatting string uses the time (as determined by a call to usesTime(), formatTime() is called to format the event time. If there is exception information, it is formatted using formatException() and appended to the message. 
 
- args = <ntg_common.config.Args object>¶
- Globally accessible arguments from command line. 
- config_from_pyfile(filename)¶
- Emulate the Flask config file parser. - Emulate the Flask config file parser so we can use the same config files for the server and the commandline script. 
- init_logging(args, *handlers)¶
- Init the logging stuff. 
ntg_common.db_tools¶
This module contains functions for database access.
- class MySQLEngine(fn=('/etc/my.cnf', '/etc/mysql/my.cnf', '~/.my.cnf'), group=('mysql', 'client', 'client-server', 'client-mariadb'), db='')¶
- Database Interface 
- class PostgreSQLEngine(**kwargs)¶
- PostgreSQL Database Interface - get_connection_params(args={})¶
- Get sqlalchemy connection parameters. - Try to get the connection parameters in turn from these sources: - get host, port, database, user from args 
- get PGHOST, PGPORT, PGDATABASE, PGUSER from args 
- get PGHOST, PGPORT, PGDATABASE, PGUSER from environment 
- use defaults 
 - N.B. The postgres client library automatically reads the password from the file - ~/.pgpass. It should be configured there.
 - static receive_checkout(dbapi_connection, _connection_record, _connection_proxy)¶
- Set a default for the postgres variable ntg.user_id that is used by Transaction-Time State Tables tables. 
 - vacuum()¶
- Vacuum the database. 
 - wait_for_server(retries=60)¶
- Wait for the Postgres server to come up. 
 
- fix(conn, msg, check_sql, fix_sql, parameters)¶
- Check and eventually fix errors. - Executes the check_sql statement to check for a possible error conditions and, if rows emerge, prints a warning and executes the fix_sql statement. The fix_sql statement should be written as to fix the errors reported by the check_sql statement. Finally it executes the check_sql statement again and reports an error if the error condition still exists. - Parameters
- msg (str) – The warning / error message 
- check_sql (str) – The sql statement that checks for the error condition. 
- fix_sql (str) – The sql statement that fixes the error condition. 
 
 
- init_default_cliques(conn)¶
- Generate a default cliques table. - In a default cliques table there is a default clique ‘1’ for every reading in the readings table. 
- init_default_locstem(conn)¶
- Generate a default locstem table. - In a default LocStemEd, labez ‘a’ is the original reading and every other reading depends on labez ‘a’, except in a Fehlvers, where ‘b’ is of unknown origin and every other reading depends on ‘b’. 
- init_default_ms_cliques(conn)¶
- Generate a default ms_cliques table. - In a default ms_cliques table there is a default clique ‘1’ for every reading in the apparatus table. 
- local_stemma_to_nx(conn, pass_id, add_isolated_roots=False)¶
- Load a passage from the database into an nx Graph. - Parameters
- add_isolated_roots (bool) – Add an ‘*’ or ‘?’ node even if they are isolated. Needed in edit mode. 
 
- tabulate(res)¶
- Format and output a rowset - Uses an output format similar to the one produced by the mysql commandline utility. 
ntg_common.exceptions¶
- exception EditError(message, status_code=None, payload=None)¶
- exception EditException(message, status_code=None, payload=None)¶
- exception PrivilegeError(message, status_code=None, payload=None)¶
ntg_common.tools¶
This module contains some useful functions.
- BOOKS = [(1, 'Mt', 'Matthew', 28), (2, 'Mc', 'Mark', 16), (3, 'L', 'Luke', 24), (4, 'J', 'John', 21), (5, 'Acts', 'Acts', 28), (6, 'R', 'Romans', 16), (7, '1K', '1 Corinthians', 16), (8, '2K', '2 Corinthians', 13), (9, 'G', 'Galatians', 6), (10, 'E', 'Ephesians', 6), (11, 'Ph', 'Philippians', 4), (12, 'Kol', 'Colossians', 4), (13, '1Th', '1 Thessalonians', 5), (14, '2Th', '2 Thessalonians', 3), (15, '1T', '1 Timothy', 6), (16, '2T', '2 Timothy', 4), (17, 'Tt', 'Titus', 3), (18, 'Phm', 'Philemon', 1), (19, 'H', 'Hebrews', 13), (20, 'Jc', 'James', 5), (21, '1P', '1 Peter', 5), (22, '2P', '2 Peter', 3), (23, '1J', '1 John', 5), (24, '2J', '2 John', 1), (25, '3J', '3 John', 1), (26, 'Jd', 'Jude', 1), (27, 'Ap', 'Revelation', 22), (210, '2Sam', '2 Samuel', 2)]¶
- Titles of the NT books 
- BYZ_HSNR = {'2 Samuel': None, 'Acts': '(300010, 300180, 300350, 303300, 303980, 304240, 312410)', 'CL': '(300010, 300180, 300350, 303300, 303980, 304240, 312410)', 'John': '(200070, 200280, 200450, 300180, 300350, 302260, 313200)', 'Mark': '(300030, 300180, 300350, 301050, 302610, 303510, 326070)'}¶
- Manuscripts attesting the Byzantine Text. - We use these manuscripts as templates to establish the Byzantine Text according to our rules. 
- FEHLVERSE = '\n (\n begadr >= 20716000 and endadr <= 20716999 or\n begadr >= 20944000 and endadr <= 20944999 or\n begadr >= 20946000 and endadr <= 20946999 or\n begadr >= 21126000 and endadr <= 21126999 or\n begadr >= 21528000 and endadr <= 21528999 or\n begadr >= 21609000 and endadr <= 21620999 or\n begadr >= 21608068 and endadr <= 21620999 or\n\n begadr >= 50837002 and endadr <= 50837047 or\n begadr >= 51534002 and endadr <= 51534013 or\n begadr >= 52406020 and endadr <= 52408015 or\n begadr >= 52829002 and endadr <= 52829025\n )\n '¶
- Verses added in later times. - These verses were added to the NT in later times. Because they are not original they are not included in the text of manuscript ‘A’. 
- graphviz_layout(dot, format='dot')¶
- Call the GraphViz dot program to generate an image but mostly to precompute the graph layout. 
- log(level, msg, *aargs, **_kwargs)¶
- Low level log function