Word Level Attributes

usfm3.0

USFM 3.0 provides a general syntax for adding named attributes to character markers. Attributes define additional properties of the marked content. The addition of marker attributes is a means of extending the meta information contained within in a USFM text.

USFM will formally provide descriptive attributes for a subset of character markers. Each marker in this set will have a defined list of attributes, which are relevant to the overall purpose of the marker.

General Syntax

Within a character marker span, an attributes list is separated from the text content by a vertical bar |. Attributes are listed as pairs of name and corresponding value using the syntax: attribute="value". The attribute name is a single ASCII string. The value is wrapped in quotes.

Example:

\w gracious|lemma="grace"\w*

Default Attribute

When content is supplied in the position of a marker attribute, but without an explicit attribute name, the USFM specification defines a single default. Defaults are only provided for markers which formally provide attributes in the current version of USFM.

Example:

\w gracious|grace\w*

… where the default attribute for \w ...\w* is defined as being “lemma”. This allows a commonly used attribute (the default) to be expressed with as little additional markup as possible within the text.

Multiple Attribute Values

In cases where more than one value should be provided for an attribute key, the author should provide a comma separated list within the value string. Leading and trailing space characters adjacent to the comma separators are ignored.

Example:

\w gracious|strong="H1234,G5485"\w*

ico_See See the attributes for \w …\w* for additional examples.

Multiple Attribute Parts

In cases where an attribute value is composed of multiple parts (e.g. a compound word or phrase), the author can (optionally) separate the parts using a colon : within the value string.

ico_See See the gloss attribute for \rb …\rb* an example of the use of this syntax.

Backward Compatibility

Any pre-existing markers which formally provide attributes in USFM 3.0 (or newer) may always continue to be used “un-decorated” (without attributes). \w gracious\w* remains valid USFM content.

User Defined Attributes

Using the general syntax, attributes may be added to any character markers beyond the formally provided set for the current version of USFM. These will not be considered strictly USFM compliant, and there is no assurance that they will be supported by compliant software tools or processes. Future versions of USFM may formally provide additional attributes.

Any user defined attributes must begin with the prefix x-.

Examples:

\w gracious|x-myattr="metadata"\w*
\w gracious|lemma="grace" x-myattr="metadata"\w*

Character Markers Providing Attributes