WIKINDX API trunk

CITELOC

Table of Contents

Functions

getType()  : array<string|int, string>
Return an array containing the most used grammar types of a locale
compileQuotationMarkers()  : array<string|int, mixed>
Return array of quotation markers for each locale.
compileHardPunctuation()  : array<string|int, mixed>
Return array of hard punctuation (END of sentence only) for each locale.
compileSoftPunctuation()  : array<string|int, mixed>
Return array of soft punctuation (within sentence) for each locale.
compileAllPunctuation()  : array<string|int, mixed>
Return array of all hard and soft punctuation.
compileAbbreviations()  : array<string|int, mixed>
Return array of abbreviations for each locale.
compileNumbers()  : array<string|int, mixed>
Return array of number abbreviations (ordinals and measurements) for each locale.
compileEllipses()  : array<string|int, mixed>
Return array of ellipses for each locale.
compilePossessives()  : array<string|int, string>
Return array of possessives for each locale.
compileEtAl()  : array<string|int, mixed>
Return array of et al. possibilities for each locale.
compileCreatorConjunctions()  : array<string|int, mixed>
Return array of creator conjunction possibilities for each locale.
compileWordSpace()  : array<string|int, mixed>
Return array of word space characters for each locale.
compileApostrophe()  : array<string|int, mixed>
Return array of apostophe for each locale (e.g., in English the "'" of "it's" (short for "it is")
compileInitialPunctuation()  : array<string|int, mixed>
Return array of initial punctuation for each locale.
compileNumberConjunctions()  : array<string|int, mixed>
Return array of number conjunctions for each locale.
abbreviationsAppend()  : mixed
Append common (Latin) bibliographic abbreviations

Functions

getType()

Return an array containing the most used grammar types of a locale

getType([string $locale = "en" ][, string $type = "" ]) : array<string|int, string>

When grammar type is not defined for a locale, the grammar type of the language is used, and in last resort the grammar type of 'en'.

'en' is not 'en_GB' but a generic/default form using typical English grammar types (e.g., " and " as quotes). It must exist in the array.

Parameters
$locale : string = "en"

Code of a locale (format: ll[_Script][_CC])(optional, 'en' by default)

$type : string = ""

Grammar type ('' by default). One of: 'quotation', 'hardPunctuation', 'softPunctuation', 'allPunctuation', 'abbreviations', 'numbers', 'ellipses', 'possessives', 'etAl', 'creatorConjunctions', 'wordSpace', 'apostrophe', 'initialPunctuation', 'numberConjunctions'

Tags
see
https://github.com/unicode-org/cldr/tree/main/common/main

CLDR Unicode Database

see
https://en.wikipedia.org/wiki/Quotation_mark
see
https://op.europa.eu/en/web/eu-vocabularies/formex/physical-specifications/character-encoding/use-of-quotation-marks-in-the-different-languages
Return values
array<string|int, string>

[0 => "", 1 => ""] Where the value of 0 is the starting quotation and 1 is the ending quotation

compileQuotationMarkers()

Return array of quotation markers for each locale.

compileQuotationMarkers() : array<string|int, mixed>

'en' must be a key of the returned array.

If a language does not use it, put in an empty array or else the 'en' array will be used by default.

The first member of the array opens the quotation, the second member closes it.

Some languages have alternate quote marks (se 'da' below for an example). Ensure there are an even number of elements in the array.

Tags
todo

Check locale validity with later versions of Intl

see
https://en.wikipedia.org/wiki/Quotation_mark
see
https://op.europa.eu/en/web/eu-vocabularies/formex/physical-specifications/character-encoding/use-of-quotation-marks-in-the-different-languages
Return values
array<string|int, mixed>

compileHardPunctuation()

Return array of hard punctuation (END of sentence only) for each locale.

compileHardPunctuation() : array<string|int, mixed>

'en' must be a key of the returned array.

If a language does not use it, put in an empty array or else the 'en' array will be used by default.

Spaces around punctuation must be present if required. Space characters here are assumed to be the same as those entered for compileWordSpace() below.

Return values
array<string|int, mixed>

compileSoftPunctuation()

Return array of soft punctuation (within sentence) for each locale.

compileSoftPunctuation() : array<string|int, mixed>

'en' must be a key of the returned array.

If a language does not use it, put in an empty array or else the 'en' array will be used by default.

Spaces around punctuation must be present if required. Space characters here are assumed to be the same as those entered for compileWordSpace() below.

Used for finding creator names in sentences—see compileCreatorConjunctions()

Return values
array<string|int, mixed>

compileAllPunctuation()

Return array of all hard and soft punctuation.

compileAllPunctuation() : array<string|int, mixed>

Used when compiling bibliographies and deciding whether to remove the title-subtitle conjunction (removed if title is closed with punctuation).

Return values
array<string|int, mixed>

compileAbbreviations()

Return array of abbreviations for each locale.

compileAbbreviations() : array<string|int, mixed>

'en' must be a key of the returned array.

If a language does not use it, put in an empty array or else the 'en' array will be used by default.

This is not necessarily an exhaustive list for each language.

———> The point is to avoid a false end of sentence <——— Only abbreviations with a locale-specific, end-of-sentence character at the end should be listed (see compileHardPunctuation() for each locale). –––> Do NOT add this final end-of-sentence character. <–––

Abbreviations are case sensitive so give all possible forms.

Common (Latin) bibliographic abbreviations are appended to each locale.

Abbreviation arrays are used to distinguish false sentence endings (abbreviations) from real sentence endings.

Abbreviations such as U.S.A. etc. are dealt with in the code.

Return values
array<string|int, mixed>

compileNumbers()

Return array of number abbreviations (ordinals and measurements) for each locale.

compileNumbers() : array<string|int, mixed>

'en' must be a key of the returned array.

If a language does not use it, put in an empty array or else the 'en' array will be used by default.

In English, an ordinal would be '2nd.', '44th.' etc. The '.' is a false end-of-sentence character. You can also add abbreviations following a number where there is no space inbetween (e.g., units such as weight, measurement, time). These can also be listed in abbreviations above but there a space is inserted in the code before the abbreviation. It is probably safest to have the number abbreviation duplicated in abbreviations.

Anything listed here is assumed to have a cardinal/digit before it (checked in regexp with '\d?') and no intervening space.

———> The point is to avoid a false end of sentence <——— Only abbreviations with a locale-specific, end-of-sentence character at the end should be listed (see compileHardPunctuation() for each locale). –––> Do NOT add this final end-of-sentence character. <–––

Abbreviations are case sensitive so give all possible forms.

Tags
see
https://en.wikipedia.org/wiki/Imperial_units
see
https://en.wikipedia.org/wiki/United_States_customary_units
see
https://en.wikipedia.org/wiki/Metrication
see
https://en.wikipedia.org/wiki/International_System_of_Units
see
https://www.ramat.ca/

Ramat, A., & Benoit, A.-M. (2020). Le ramat de la typographie. 11th ed. Québec: Anne-Marie Benoit éditrice. (Original work published 2017).

Return values
array<string|int, mixed>

compileEllipses()

Return array of ellipses for each locale.

compileEllipses() : array<string|int, mixed>

'en' must be a key of the returned array.

If a language does not use it, put in an empty array or else the 'en' array will be used by default.

Needs listing ONLY if the ellipses contain hard punctuation for the locale. NB Necessary to order by size so that longer strings end up in the regexp as the first matches tried. Particularly important if using similar characters as in English.

Tags
see
https://en.wikipedia.org/wiki/Ellipsis
Return values
array<string|int, mixed>

compilePossessives()

Return array of possessives for each locale.

compilePossessives() : array<string|int, string>

'en' must be a key of the returned array.

If a language does not use it, put in an empty array or else the 'en' array will be used by default.

When using the word processor, how is the possessive defined. This is only for a single creator's surname (or following values from the etAll array below) and is used to detect if that name is in the same sentence as the citation (e.g., "Grimshaw's", "Jones'", "and colleagues'"). English has two forms (singular or plural/words that end in 's').

Tags
see
https://en.wikipedia.org/wiki/Possessive#From_nouns
see
https://wals.info/chapter/57
Return values
array<string|int, string>

compileEtAl()

Return array of et al. possibilities for each locale.

compileEtAl() : array<string|int, mixed>

'en' must be a key of the returned array.

If a language does not use it, put in an empty array or else the 'en' array will be used by default.

This is used only for checking the occurrence of creator names in a sentence (e.g., 'Grimshaw et al.', 'Jones and colleagues'). It relates to repositioning of the citation after the creator's name. Each locale should list common abbreviations for multiple creator names.

Add spaces to ONLY the start of the phrase if required. Space characters here are assumed to be the same as those entered for compileWordSpace() below.

Return values
array<string|int, mixed>

compileCreatorConjunctions()

Return array of creator conjunction possibilities for each locale.

compileCreatorConjunctions() : array<string|int, mixed>

'en' must be a key of the returned array.

If a language does not use it, put in an empty array or else the 'en' array will be used by default.

This is used only for checking the occurrence of creator names in a sentence. An example would be 'Grimshaw, Grimshaw-Aagaard & Aulery claim that . . .' where '&' is the conjunction. Checking for the ',' is done through compileSoftPunctuation().

It relates to repositioning of the citation after the creator's name. Each locale should list common conjunctions between multiple creator names.

Add spaces to either or both sides of the phrase if required. Space characters here are assumed to be the same as those entered for compileWordSpace() below.

Return values
array<string|int, mixed>

compileWordSpace()

Return array of word space characters for each locale.

compileWordSpace() : array<string|int, mixed>

'en' must be a key of the returned array.

If a language does not use it, put in an empty array element thus [""] or else the 'en' array will be used by default.

This is used in regexps for identifying potential gaps between words that a language might have. As the regexps are UTF-8-safe, you can use not only multiple characters but also multibyte characters here.

Each array should have only ONE element.

Word space characters here are assumed to be the same as those entered in other functions here where it is indicated they might be required either side of the characters being entered.

Note, that if your language uses the same word space character as the English 'en' locale, then you need not add your locale here as the 'en' character will be used.

Return values
array<string|int, mixed>

compileApostrophe()

Return array of apostophe for each locale (e.g., in English the "'" of "it's" (short for "it is")

compileApostrophe() : array<string|int, mixed>

'en' must be a key of the returned array.

If a language does not use it, put in an empty array element thus [""] or else the 'en' array will be used by default.

This is used in regexps for avoiding misidentification of quotation marks. You should only add a locale and apostrophe forms here (if not using the default 'en' locale) if the locale's apostrophe is the same as one of the quotation marks for that locale—see compileQuotationMarkers().

The regexp assumes an apostrophe is preceded and followed by a word character.

As the regexps are UTF-8-safe, you can use not only multiple characters but also multibyte characters here.

Note, that if your language uses the same apostrophe character as the English 'en' locale, then you need not add your locale here as the 'en' character will be used.

If you put in an empty array for your locale, no conjunctions will be used between numbers.

Return values
array<string|int, mixed>

compileInitialPunctuation()

Return array of initial punctuation for each locale.

compileInitialPunctuation() : array<string|int, mixed>

'en' must be a key of the returned array.

If a language does not use it, put in an empty array element thus [""] or else the 'en' array will be used by default.

This is used in regexps for identifying initials when part of a name (e.g. M.N. Grimshaw-Aagaard). As the regexps are UTF-8-safe, you can use not only multiple characters but also multibyte characters here.

It is also used in citation and bibliographic formatting to replace the '.' character following creator initials if a creator initial option is chosen that uses '.' (default English) while the style localisation specifies something else.

If there is no equivalent in your locale for the English character, put in an empty array element.

The first element of the 'en' array represents the character(s) between initials. For example: M. N. Grimshaw-Aagaard where periods are extracted from the 'en' array. Spaces come from compileWordSpace() above.

Each array should have only ONE element.

There should be no whitespace characters either side—the regexp takes care of this along with compileWordSpace() above.

Note, that if your language uses the same initial character as the English 'en' locale, then you need not add your locale here as the 'en' character will be used.

Return values
array<string|int, mixed>

compileNumberConjunctions()

Return array of number conjunctions for each locale.

compileNumberConjunctions() : array<string|int, mixed>

'en' must be a key of the returned array.

If a language does not use such conjunctions, put in an empty array element thus [""] or else the 'en' array will be used by default.

When presenting, for example, running time (film, TV, etc.), numbers can be conjoined by characters. For example, running time might be given as 2'45", 2:45, and so on, but these can be replaced if appropriate characters are provided for your locale and the bibliographic style localisation is set to that locale.

It is important that there be parity between the lengths of the 'en' array and another locale and across conjunction positions in the array. If there is no equivalent in your locale for the English conjunction, put in an empty array element.

For example: 2 hours, 45 minutes where comma is extracted from the 'en' array. Spaces come from compileWordSpace() above. Note that the last element has the spaces around the English word.

Note, that if your language uses the same characters as the English 'en' locale, then you need not add your locale here as the 'en' character set will be used.

Return values
array<string|int, mixed>

        
On this page

Search results