When texts are transcribed and added to the Institute of Australian Culture site, it is highly desired that the text be accurate (or as far as possible, considering that some texts may be damaged or illegible).
Earlier articles on the IAC site did not usually include non-standard alphabet characters, but instead rendered them as standard characters, due to the inability of some early browsers to handle some non-standard characters. Later articles normally include those characters; however, the earlier articles will likely remain as is, unless there is a need or opportunity to revisit and reassess those articles.
Occasionally articles on the Trove website use non-standard alphabet characters; and, whilst the Trove site provides some non-standard characters (which can be inserted into corrections of their OCR text), they do not provide all such characters. Therefore, Trove users (or, Trovers) may find this list to be of some use.
It should be noted that on the IAC site the archaic “long s” character is transcribed as a standard “s”, so as to make it easier for the text to be read.
This list should be of use to those who are correcting OCR texts on Trove, or who are transcribing texts from other sources. Links to the relevant Wikipedia or Wiktionary pages have been provided here; whilst Wikipedia and Wiktionary may not be highly regarded in academic circles, they do provide easy-to-read and useful starting points for various subjects.
The bulk of the characters are divided into four basic sections: 1) alphabet characters (non-standard), 2) currency symbols, 3) fractions, and 4) symbols (non-standard).
———♦——— = dividing line with a diamond (this type of dividing line, or dinkus, is quite common in historical newspapers in Trove)
1) alphabet characters (non-standard)
Á á = A/a with acute
 â = A/a with circumflex
À à = A/a with grave
Ā ā = A/a with macron
Ä ä = A/a with umlaut
Æ æ = AE/ae ligature; Ash; Ashe (e.g. encyclopædia, mediæval)
Ç, ç = C/c with a cedilla
É é = E/e with acute (e.g. fiancée, née, resumé; the latter can also be spelt résumé)
ę = E/e with ogonek (see also: E/e with caudata; hooked e; looped e; tailed e)
Ê, ê = E/e with circumflex
È è = E/e with grave (e.g. blessèd)
Ē ē = E/e with macron; see: Macron (diacritic)
Ë ë = E/e with umlaut
Í í = I/i with acute
Î î = I/i with circumflex
Ì ì = I/i with grave
Ï ï = I/i with Umlaut (e.g. naïve)
Ô ô = O with circumflex
Õ õ = O/o with tilde
Ö ö = O/o with umlaut
Œ œ = OE/oe ligature (e.g. phœnix)
ſ = long s (an archaic “s” character)
ß = sharp S; Eszett; scharfes S (a German “S” character)
Ú, ú = U/u with acute
Û û = U/u with circumflex (e.g. divûm, jeûne)
Ū ū = U/u with macron
Ü ü = U/u with umlaut
See also: The 26 letters of the modern English alphabet:
(These links include information on the accents and diacritics related to each letter.)
A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z.
2) currency symbols
¢ = cent
$ = dollar
€ = euro (European Union)
ƒ = florin (Aruban florin; Netherlands Antillean guilder; Dutch guilder 1434 to 2002)
₣ = franc (French franc)
£ = pound (United Kingdom; British Empire)
₽ = ruble (Russia)
¥ = yen (Japan); yuan (China)
See also:
Currency symbol
Category:Currency symbols
Currency Symbols (Unicode block)
3) fractions
(At the time of writing, most of the Wikipedia fraction pages were basically redirection pages, with the exception of “one half”.)
½ = one half
⅓ = one third
⅔ = two thirds
¼ = one quarter
¾ = three quarters
⅕ = one fifth
⅖ = two fifths
⅗ = three fifths
⅘ = four fifths
⅙ = one sixth
⅚ = five sixths
⅐ = one seventh
⅛ = one eighth
⅜ = three eighths
⅝ = five eighths
⅞ = seven eighths
⅑ = one ninth
⅒ = one tenth
Faction symbols not obtained: 2-6/7, 2-8/9, 3/10, 7/10, 9/10.
Faction symbols not obtained (presumably not generally used, as they can be replaced by other (common) faction symbols: 2/4 (1/2), 2/6 (1/3), 3/6 (1/2), 4/6 (2/3), 2/8 (1/4), 4/8 (1/2), 6/8 (3/4), 2/10 (1/5), 4/10 (2/5), 5/10 (1/2), 6/10 (3/5), 8/10 (3/4).
See also:
Category:Fractions (mathematics)
Fraction
Number Forms
↉ = zero-thirds (used in: baseball; Japan)
⅟ = one (not a fraction)
4) symbols (non-standard)
⁂ = asterism (typography)
● = black circle large; see: bullet (typography) (Unicode: U+25CF) [see also: bullet]
◆ = black diamond large; see: Geometric Shapes (Unicode block)
♦ = black diamond small
■ = black square; black block; black box; see: Geometric Shapes (Unicode block)
▼ = black triangle large downward-pointing; arrow down
► = black triangle large right-pointing; arrow right
◀ = black triangle small left-pointing; see: Geometric Shapes (Unicode block)
▶ = black triangle small right-pointing; see: Geometric Shapes (Unicode block)
• = bullet; bullet point; black dot; see: bullet (typography) (Unicode: U+2022)
℅ = care of (normal case: c/o)
© = copyright symbol; copyright sign
† = dagger (mark); obelisk; obelus (see also: double dagger)
° = degree symbol; degree sign
÷ = division sign
‡ = double dagger; diesis; see: dagger (mark)
— = em dash
– = en dash (traditionally, an “en dash” is half the width of an “em dash”; but, in actual practise, it can be of differing lengths)
« » = guillemets (sideways double chevrons, used as quotation marks)
∞ = infinity symbol; lazy eight; lemniscate
¡ = inverted exclamation mark
¿ = inverted question mark
◊ = lozenge (shape); diamond
✠ = Maltese cross
☞ = manicule; hand pointing [the link includes various other hand symbols]
№ = numero sign
¶ = pilcrow; paragraph mark; paragraph sign
‰ = per mille; per mil; per mill; permil; permill; permille
± = plus–minus sign
£ = pound sign
※ = reference mark; reference symbol
® = registered trademark symbol
℺ = rotated capital Q
§ = section sign; section mark
★ = star (glyph) [the link includes various star symbols]
™ = trademark_symbol,
See also:
Letterlike Symbols
Dingbat [the link includes various dingbats, including the following:]
❖ = black diamond minus white X
❦= floral heart; hedera (ivy leaf); see: fleuron (typography) (see also: reversed rotated floral heart bullet; rotated floral heart bullet)
☙ = reversed rotated floral heart bullet; see: fleuron (typography)
❧ = rotated floral heart bullet; see: fleuron (typography)
✿ = flower
❤ = heavy black heart
✺ = sixteen pointed asterisk
❀ = white florette
▬▬▬▬▬▬▬▬▬▬ஜ۩۞۩ஜ▬▬▬▬▬▬▬▬▬▬▬▬▬▬
Further reading
(Various pages on Wikipedia that may be of interest.)
Acute accent
ASCII [American Standard Code for Information Interchange]
Asterisk; little star; star
Bullet (typography)
Category:Currency symbols
Category:Latin-script ligatures
Category:Mathematical symbols
Category:Typographic ligatures
Category:Typographical symbols
Circumflex
Cyrillic script
Diacritic
Digraph (orthography)
Dingbat
Dinkus (three spaced asterisks in a horizontal row; a text divider) (“dinkus” can also refer to other types of dividing lines)
Diphthong
Ellipsis (three dots or periods in mid-sentence)
Esperanto orthography
Fleuron (typography); printers’ flower
Geometric Shapes (Unicode block)
ISO/IEC 646 [7-bit coded character set, from the International Organization for Standardization (ISO)]
Keyboard layout (see also: QWERTY)
Latin letters used in mathematics
Latin script in Unicode
Latin-script multigraph
Ligature (writing)
List of Latin-script letters
List of Latin-script pentagraphs
List of Latin-script tetragraphs
List of Latin-script trigraphs
List of mathematical symbols by subject
List of precomposed Latin characters in Unicode
List of typographical symbols and punctuation marks [* very useful]
Macron (diacritic)
Orthography
Punctuation
QWERTY (keyboard layout for Latin-script alphabets)
Signature mark
Staurogram
Typography
Unicode symbols
Uralic Phonetic Alphabet
Western Latin character sets (computing)
See also:
“Entity codes for HTML”, Pennsylvania State University
Fonts » Times New Roman, Regular » Glyphs [Graphemica site]
Notes:
1) — = em dash can be used to indicate that an author’s name comes after it (i.e. — can be used to precede an author’s name; e.g. “— John Citizen”)
2) ~ = tilde can be used to indicate that an author’s name comes after it (i.e. ~ can be used to precede an author’s name; e.g. “~ Jane Citizen”)
3) | = vertical bar (bar; pipe; vbar) is used in Trove to divide a line of OCR text (scanned text) which includes text from two or more columns.
Leave a Reply