Database Structure

CBGM Database

The CBGM database is built from the Source Databases by the prepare script. The API server uses this database.

Blue: initialized by the scripts.cceh.prepare script. These tables change only between phases. The Apparatus table is also updated by the scripts.cceh.cbgm script where manuscript ‘A’ is concerned.

Green: updated by the server. Backed up by scripts.cceh.save_edits. Restored by scripts.cceh.load_edits. The editors update these tables through a graph editor. These tables are journalled to eventually provide undo functionality.

Red: updated by the scripts.cceh.cbgm script.

_images/uml.svg

Overview of the CBGM database (some columns omitted)

Transaction-Time State Tables

A transaction-time state table keeps track of the table’s contents as it changes over time. This is the basis for our undo-functionality in the graphical stemma editors.

Our TTS tables are temporally partitioned (See [SNODGRASS2000] Section 9.4). We have one table that holds the current rows and another table that holds the rows that were valid at some time in the past.

We have triggers in place that hide the details of TTS table maintenance: operations on the current table automagically update the past table also.

Tables

This module contains the sqlalchemy classes that create the database structure we need for doing the CBGM and running the application server.

class Affinity(**kwargs)

A table that contains CBGM output related to each pair of manuscripts.

This table contains the actual results of applying the CBGM: the priority of the manuscripts. It has one row for each pair of manuscripts that have at least one passage in common and each range we are interested in.

Two sets of data are included, one for the recursive interpretation of the locstem data, and one for the backward-compatible non-recurisve interpretation (with ‘p_’ prefix).

common

No. of passages defined in both manuscripts.

equal

No. of passages that have the same reading in both manuscripts.

affinity

equal / common

older

No. of passages that have an older reading in ms1.

newer

No. of passages that have an newer reading in ms1.

unclear

No. of passages where it is unclear which reading is older.

class Apparatus(**kwargs)

A table that contains the positive apparatus.

cbgm

True if this entry is eligible for CBGM, eg. is by the orginal scribe and is 100% certain. There can be only one entry eligible for CBGM for every manuscript and passage.

labezsuf

Contains auxiliary information about the reading:

f

Fehler. The reading is considered a scribal error.

o

Orthographicum. The reading is considered an orthographical variant, eg. a variant place name.

certainty

Certainty of the reading for the purposes of the CBGM. Only readings with a certainty of 1.0 are used by the CBGM. A certainty of 1.0 is given if the reading unequivocally witnesses for one labez.

There can be only one reading with a certainty of 1.0, but multiple readings with certainty < 1.0, all of them summing to no more than 1.0.

lesart

The actual reading offered by the manuscript. A lacuna is stored as NULL. Omitted text is stored as the empty string.

This field is also set to NULL if the manuscript offers the same reading as recorded in the Readings table for the manuscript’s labez. (Saves space and can easily be reconstructed.)

As a rule of thumb: For ‘f’ and ‘o’ readings (errors and orthographic variants) the actual reading will be inserted.

origin

Used only for debugging. Shows where this entry in the positive apparatus came from.

ATT

Copied from negative apparatus in att table in fill_apparatus_table().

BYZ

Deduced mt in build_MT_text().

DEF

Default value from conversion to positive apparatus in fill_apparatus_table().

LAC

Unrolled lacuna from lac table in fill_apparatus_table().

LOC

Deduced from locstem table in build_A_text().

ZW

Uncertain reading from unrolling of ‘zw’ in unroll_zw().

class Att(**kwargs)

Input buffer table for the Nestle-Aland ECM_Acts*GVZ tables.

The Nestle-Aland database contains one table for each chapter (for historical reasons). As first step we copy those many tables into one big table, this one.

This table contains a negative apparatus of all manuscripts. For the CBGM the data in this table has to be normalised into our database structure and converted into a positive apparatus.

hsnr

Interne Handschriftnummer.

hs

Siglum der Handschrift. An das Siglum werden Suffixe angehängt, die die Hand und die Lesung bezeichnen. Im Laufe der Verarbeitung werden die Lesarten reduziert, bis nur eine Lesart pro Handschrift übrigbleibt. Parallel dazu werden die Suffixe von den Siglen entfernt.

begadr, endadr

Zusammengesetzt aus Buch, Kapitel, Vers, Wort. Es werden Wörter und Zwischenräume gezählt. Gerade Zahlen bezeichnen ein Wort, ungerade einen Zwischenraum.

labez

See the description of this field in table readings.

labezsuf

Contains auxiliary information about the reading:

f

Fehler. The reading is considered a scribal error.

o

Orthographicum. The reading is considered an orthographical variant, eg. a variant place name.

If the labez is ‘zw’ labezsuf contains a “/”-separated list of possible readings, eg. “a/b_o/c_f” means this reading may be ‘a’ or an orthographicum of ‘b’ or a scribal error of ‘c’.

suffix
\*

Erste, ursprüngliche Hand

C*

Von erster Hand korrigiert

C1

Erster Korrektor (Korrektoren der ersten Stunde)

C2

Zweiter Korrektor (Korrektoren aus späteren Jahrhunderten)

C

Korrektor (Korrektor aus ungewisser Epoche)

L1, L2

Unterschiedliche Lesungen in einem Lektionar. L2 ist für die CBGM nicht relevant.

T1, T2

Unterschiedliche Lesungen des Textes der ersten Hand. Die erste Hand hat diese Passagen mehrmals abgeschrieben, vielleicht aus unterschiedlicher Quelle. Bei fehlender Übereinstimmung muß ‘zw’ gesetzt werden.

A

Vom Schreiber selbst gekennzeichnete alternative Lesart. Für die CBGM nicht relevant.

K

Varianten im Kommentar einer Handschrift. Für die CBGM nicht relevant.

s, s1, s2

(supplement) Nachträgliche Ergänzung verlorener Stellen. Bei nur einer Ergänzung wird ‘s’ verwendet. Bei mehreren Ergänzungen werden ‘s1’, ‘s2’, etc. für jeweils einen Abschnitt verwendet. Ergänzungen können nicht die Authorität der jeweiligen Hs beanspruchen.

V

(vid, ut videtur) augenscheinlich. Unsichere aber höchst wahrscheinliche Lesung. Ist für die CBGM als sichere Lesart zu akzeptieren.

r

Regularisiert

fehler

Denotes an orthografical error in Catholic Letters. This became part of labezsuf in the other books.

base

Basistext. Nur relevant bei Fehlversen.

“‘base = b’ steht für eine alternative Subvariante (dem Textus receptus).” – prepare4cbgm_10.py

a

Urtext

b

Textus Receptus.

comp

“Eine variierte Stelle ist eine umfasste Stelle, wenn comp = ‘x’ ist.” – prepare4cbgm_10.py

x

Umfaßte Variante.

lekt

Lektionen in einem Lektionar.

class Books(**kwargs)

A table that lists all the books of the New Testament.

This table lists all the books of the NT and the book id given to them.

bk_id

The book id: 1 - 27 (Matthew - Revelation).

siglum

The book siglum, eg. ‘Mt’

book

The book name, eg. ‘Matthew’

passage

The book extent in passages.

class Cliques(**kwargs)

A table that contains the cliques at every passage

A clique is a set of strongly related manuscripts that offer the same reading. A reading may have been originated independently more than once, but in a clique a reading has been originated only once.

This is the current table of a transaction state table pair. Cliques_TTS is the table that contains the past rows. See transaction-time state tables.

clique

Name of the Clique. ‘1’, ‘2’ …

sys_period

The time period in which this row is valid. In this table all rows are still valid, so the end of the period is not set.

user_id_start

The id of the user making the change at the start of the validity period. See: User.

user_id_stop

The id of the user making the change at the end of the validity period. See: User.

class Cliques_TTS(**kwargs)

A table that contains the cliques at every passage

This is the past table of a transaction state table pair. Cliques is the table that contains the current rows. The structure of the two tables is the same. See transaction-time state tables.

class CreateFDW(name, pg_db, mysql_db)
class CreateFunction(name, params, returns, sql, **kw)
class CreateGeneric(create_cmd)
class CreateView(name, sql)
class DropFDW(name, pg_db, mysql_db)
class DropFunction(name, params)
class DropGeneric(drop_cmd)
class DropView(name)
class Import_Cliques(**kwargs)

A table to help importing of saved state.

This table is used only by the load_edits.py script.

class Import_LocStem(**kwargs)

A table to help importing of saved state.

This table is only used by the load_edits.py script.

class Import_MsCliques(**kwargs)

A table to help importing of saved state.

This table is used only by the load_edits.py script.

class Import_Notes(**kwargs)

A table to help importing of saved state.

This table is only used by the load_edits.py script.

class Lac(**kwargs)

Input buffer table for the Nestle-Aland ECM_Acts*GVZLac tables.

The Nestle-Aland database contains one table for each chapter (for historical reasons). As first step we copy those many tables into one big table, this one.

This table contains a list of all lacunae in all manuscripts. It records the start and end of each lacuna. A lacuna entry generally spans many passages in the Att table.

This table has the same structure as table Att. For the description of the columns see table Att.

class LocStem(**kwargs)

A table that contains the priority of the cliques at each passage

This table contains one DAG (directed acyclic graph) of cliques for each passage. The editors decide from which other clique(s) each clique is derived, or if it is original.

This is the current table of a transaction state table pair. LocStem_TTS is the table that contains the past rows. See transaction-time state tables.

labez, clique

The younger clique which was derived from the older clique.

source_labez, source_clique

The older clique that was the source of the younger clique, or ‘*’ if the reading is original, or ‘?’ if the source is unknown.

Note: These columns should have a foreign key constraint into the cliques table but do not, because postgres doesn’t support partial foreign keys and the cliques table does not contain the ‘*’ and ‘?’ pseudo-cliques.

sys_period

The time period in which this row is valid. In this table all rows are still valid, so the end of the period is not set.

user_id_start

The id of the user making the change at the start of the validity period. See: User.

user_id_stop

The id of the user making the change at the end of the validity period. See: User.

class LocStem_TTS(**kwargs)

A table that contains the priority of the cliques at each passage

This is the past table of a transaction state table pair. LocStem is the table that contains the current rows. The structure of the two tables is the same. See transaction-time state tables.

class Manuscripts(**kwargs)

A table that lists all the manuscripts.

This table lists all the manuscripts of New Testament that we have collated for the edition.

ms_id

The primary key. We use a surrogate integer key because we need to interface with numpy, which only allows for integer row and column indices. If we every lose this requirement, hsnr will become our primary key.

hsnr

The project-internal number of the manuscript.

This is a six-digit number. The first digit encodes the type of the manuscript: 1 = papyrus, 2 = uncial, 3 = minuscule, 4 = lectionary, 5 = patristic citations and versions. The next 4 digits are taken from the digits of the Gregory-Aland number, eg. P45 would yield 0045. The last digit encodes supplements: 0 = original ms., 1 = first supplement, 2 = second supplement.

N.B. Patristic citations and versions are not used in the CBGM, and thus purged from the database.

hs

The Gregory-Aland number of the manuscript. eg. ‘P45’, ‘03’, or ‘1739’.

class MsCliques(**kwargs)

A table that relates manuscripts and cliques.

This table records the editorial decisions regarding which manuscripts represent each clique. The editors decide which reading is derived from which other reading(s) at each passage.

This is the current table of a transaction state table pair. MsCliques_TTS is the table that contains the past rows. See transaction-time state tables.

labez, clique

The clique this manuscript represents.

sys_period

The time period in which this row is valid. In this table all rows are still valid, so the end of the period is not set.

user_id_start

The id of the user making the change at the start of the validity period. See: User.

user_id_stop

The id of the user making the change at the end of the validity period. See: User.

class MsCliques_TTS(**kwargs)

A table that relates manuscripts and cliques.

This is the past table of a transaction state table pair. MsCliques is the table that contains the current rows. The structure of the two tables is the same. See transaction-time state tables.

class Ms_Ranges(**kwargs)

A table that contains CBGM output related to each manuscript.

Here we hold values that are calculated by CBGM related to one manuscript.

length

Calculated: no. of defined passages in the manuscript inside this range.

class Nestle(**kwargs)

The Leitzeile. Usually from the Nestle-Aland

Dient der Darstellung der Leitzeile im Navigator.

begadr, endadr

Zusammengesetzt aus Buch, Kapitel, Vers, Wort. Es werden Wörter und Zwischenräume gezählt. Gerade Zahlen bezeichnen ein Wort, ungerade einen Zwischenraum. In dieser Tabelle sind nur Wörter enthalten, keine Zwischenräume. Jedes Wort hat einen eigenen Eintrag, d.h. für alle Einträge gilt: begadr = endadr.

lemma

Das Lemma-Wort in der Leitzeile.

class Notes(**kwargs)

A table that contains editorial notes attached to passages.

This is the current table of a transaction state table pair. Notes_TTS is the table that contains the past rows. See transaction-time state tables.

class Notes_TTS(**kwargs)

A table that contains editorial notes attached to passages.

This is the past table of a transaction state table pair. Notes is the table that contains the current rows. The structure of the two tables is the same. See transaction-time state tables.

class Passages(**kwargs)

A table that lists the variant passages.

This table lists all the passages we established during the collation of the manuscripts. Passages that are the same in all manuscripts (invariant) are purged because they are irrelevant to the CBGM.

pass_id

The primary key. We use a surrogate integer key because we need to interface with numpy, which only allows for integer row and column indices. If we every lose this requirement, passage will become our primary key (but join performance should also be considered).

passage

The extent of the passage.

The beginning and end of every passage is encoded in this way:

book id * 10,000,000 + chapter * 100,000 + verse * 1,000 + word * 2

Words are always even and the space between to words is always odd.

variant

True if this passage is a variant passage. (It has at least two different certain readings.)

spanning

True if this passage is spanning other passages.

spanned

True if this passage is spanned by other passages.

fehlvers

True if this passage is a later addition, eg. the pericope adulterae.

Only set if the passages is spanned.

class Ranges(**kwargs)

A table that contains the ranges of passages for the CBGM.

The CBGM is agnostic about the division in books and chapters of the NT. It can be run on any range of passages (in theory even on sets of non-contiguous passages, although not yet in this implementation).

This table contains all ranges we are interested in, that is, one range for each chapter of each book and also one range for each whole book. The ranges corresponding to chapters are named by the chapter number, ‘1’, ‘2’, … The whole book range is called ‘All’.

range

The name of the range, eg. ‘1’ for Chapter 1.

passage

The extent of the range.

class Readings(**kwargs)

A table that contains the different readings found at each passage.

First scribal errors are corrected and orthographical differences are normalized, then equal readings are grouped. Each group of readings is assigned an id, the ‘labez’.

labez

(LesArtBEZeichnung). Usually ‘a’ indicates the original reading, while ‘b’…’y’ indicate readings relegated to the apparatus, although this is not necessarily so. Exceptions are the Fehlverse and the cases where no original reading could be assessed.

Readings starting with ‘z’ have special meaning:

zu

Hier nicht zitierbar aufgrund einer übergreifenden Variante. Diese umfaßte Variante wurde schon in der umfassenden Variante verzeichnet. Entspricht in der ECM einem Pfeil nach oben. In der CBGM ist ‘zu’ wie ‘zz’ zu behandeln.

zv

There is an illegible addition in the manuscript(s) cited which makes it impossible to ascribe it to a known variant.

zw

What remains of the text of the manuscript(s) cited would allow reconstruction in agreement with two or more different variants. Entspricht in der ECM einem Doppelpfeil nach links-rechts.

The reading ‘zw’ is used only in the Att table. In this table each questionable reading will get its own entry with a certainty < 1.0.

zz

The reading is too lacunose to be identified.

Alle Verzeichnungen, die aus der Tabelle der Lacunae erzeugt wurden, erhalten labez = ‘zz’.

Ein Wort steht nicht in der systematischen Lückenliste wenn mindestens ein Buchstabe vorhanden ist. In diesem Fall steht es in der stellenbezogenen Lückenliste.

Caveat: die Lesart der Handschrift ‘A’ kann trotz negativem Apparat in der Tabelle Att in derselben Passage mehrmals vorkommen, weil an einigen Stellen im Nestle-Aland ein positiver Apparat benutzt wurde.

lesart

The normalized reading. Scribal errors are silently corrected and orthographic variants are normalized. Following abbreviations are used:

om

Missing text (omissio). The scribe did not write any text.

NULL

Missing substrate (lacuna). The manuscript is damaged or missing.

class Role(**kwargs)
class Roles_Users(**kwargs)
class User(**kwargs)

Source Databases

These are the databases output by the NTVMR. There is one instances of each table for every chapter, eg. in Acts there are 28 instances of each table. These tables are only used once for building the CBGM database. The database system is MySQL.

Tables

class Acts01GVZ(**kwargs)

This table contains a negative apparatus.

class Acts01GVZlac(**kwargs)

This table contains a list of lacunae.

class LocStemEdAct01(**kwargs)

A table that contains the priority of the cliques at each passage

This table contains the main output of the editors. The editors decide which reading is derived from which other reading(s) at each passage.

id

Primary key

varid

Same as labez.

varnew

This is the labez concatenated with the number of the split.

s1

Source of this reading.

s2

Optionally second source of reading.

begadr, endadr

The passage.

w

Flag “Western Text”. Not needed for the CBGM.

class RdgAct01(**kwargs)

This table contains readings.

class VarGenAttAct01(**kwargs)

This table relates splits to manuscripts.

class Nestle29(**kwargs)

The Leitzeile.

Literature

SNODGRASS2000

Snodgrass, R.T. Developing Time-Oriented Database Applications in SQL. 2000. Morgan Kaufmann Publishers, San Francisco.