ArabTEX

a System for Typesetting Arabic

User Manual Version 3.00 1 2

Klaus Lagally

November 22, 1993

1Report Nr. 1993/11, Universität Stuttgart, Fakultät Informatik, 
Breitwiesenstraße 20{22, 70565 Stuttgart, Germany
2This Report supersedes Report Nr. 1992/06


Overview

ArabTEX is a package extending the capabilities of TEX/LaTEX to generate 
the Arabic writing from an ASCII transliteration for texts in several 
languages using the Arabic script. It consists of a TEX macro package 
and an Arabic font in several sizes, presently only available in the 
Naskhi style. ArabTEX will run with Plain TEX and also with LaTEX. It is 
compatible with NFSS, NFSS2 and the EDMAC package; other additions to 
TEX have not been tried.

ArabTEX is primarily intended for generating the Arabic writing, but the 
standard scientific transliteration can also be easily produced. For 
languages other than Arabic that are customarily written in the Arabic 
script some limited support is available.

ArabTEX defines its own input notation which is both machine, and human, 
readable, and suited for electronic transmission and Email 
communication. However, texts in some of the Arabic standard encodings 
can also be processed.

ArabTEX is copyrighted, but free use for scientific, experimental and 
other strictly private, noncommercial purposes is granted. Offprints of 
publications using ArabTEX are welcome. Using ArabTEX otherwise requires 
a license agreement. There is no warranty of any kind, either expressed 
or implied. The entire risk as to the quality and performance rests with 
the user.

Please send error reports, suggestions and inquiries to the author:

Prof. Klaus Lagally
Institut für Informatik
Universität Stuttgart
Breitwiesenstraße 20-22
70565 Stuttgart
GERMANY

lagally@informatik.uni-stuttgart.de

Copyright cfl 1992, 1993, Klaus Lagally


Contents

1 Activating ArabTEX 5

2 Input to ArabTEX 6

2.1 Arabic text elements : : : : : : : : : : : : : : : : : : : : : : : : 
: 6

2.2 Commands in an Arabic context : : : : : : : : : : : : : : : : : : 7

3 Language selection 10

4 Font selection 11

5 Input coding conventions 12

5.1 Standard Arabic and Persian characters : : : : : : : : : : : : : : 
12

5.2 Quoting : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 
: : 15

5.3 Ligatures : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 
: : 16

5.4 Vowelization : : : : : : : : : : : : : : : : : : : : : : : : : : : : 
: : 16

5.5 Verbatim input : : : : : : : : : : : : : : : : : : : : : : : : : : : 
: 17

5.6 Alternate input codings : : : : : : : : : : : : : : : : : : : : : : 
: 17

6 Transliteration 19

6.1 ZDMG transliteration style : : : : : : : : : : : : : : : : : : : : : 
19

6.2 Encyclopedia of Islam style : : : : : : : : : : : : : : : : : : : : 
: 20

7 Support for other languages besides Arabic 21

7.1 Persian (Farsi, Dari), also Ottoman, Kurdish : : : : : : : : : : : 
21

1


CONTENTS 2

7.2 Urdu : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 
: : 22

7.3 Pashto (Afghanic) : : : : : : : : : : : : : : : : : : : : : : : : : 
: 22

7.4 Maghribi : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 
: : 23

7.5 Other languages : : : : : : : : : : : : : : : : : : : : : : : : : : 
: 24

8 Miscellaneous features 25

8.1 Automatic stretching : : : : : : : : : : : : : : : : : : : : : : : : 
: 25

8.2 Dots on y?a' : : : : : : : : : : : : : : : : : : : : : : : : : : : : 
: : 25

8.3 Additional codings : : : : : : : : : : : : : : : : : : : : : : : : : 
: 25

8.4 Progress report : : : : : : : : : : : : : : : : : : : : : : : : : : 
: : 26

8.5 Verbatim copy of the input : : : : : : : : : : : : : : : : : : : : : 
27

8.6 Using ArabTEX with EDMAC : : : : : : : : : : : : : : : : : : : : 27

9 Acknowledgments 28

10 References 29

A Obtaining ArabTEX 32

B Installing ArabTEX 33

C Release history 34

D Sample ArabTEX input 36

E Sample ArabTEX output 37

F Coding examples for Arabic 38

G Coding examples for Persian 45

H Alternate input encodings 48

H.1 ASMO 449 = ISO 9036 : : : : : : : : : : : : : : : : : : : : : : : 48

H.2 ASMO 449E = ISO 8859 - 6 : : : : : : : : : : : : : : : : : : : : : 
50


CONTENTS 3

I Miscellaneous utilities 52

I.1 twoblks.sty : : : : : : : : : : : : : : : : : : : : : : : : : : : : 
: : 52

I.2 abjad.sty : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 
: : 53

I.3 MLS2ARAB : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 
53

Index 54


List of Tables

5.1 Standard codings for Arabic and Persian. : : : : : : : : : : : : : 
13

5.2 Additional codings generally available. : : : : : : : : : : : : : : 
: 14

5.3 Verbatim codings for the carrier of hamza : : : : : : : : : : : : : 
17

7.1 Additional codings for Urdu. : : : : : : : : : : : : : : : : : : : : 
23

7.2 Additional codings for Pashto. : : : : : : : : : : : : : : : : : : : 
24

8.1 Additional codings for special purposes. : : : : : : : : : : : : : : 
26

H.1 ASMO 449 code table : : : : : : : : : : : : : : : : : : : : : : : : 
49

H.2 ISO 8859-6 code table : : : : : : : : : : : : : : : : : : : : : : : 
: 51

4


Chapter 1

Activating ArabTEX

With Plain TEX, load the ArabTEX macros by \input arabtex.tex. With 
LaTEX, include the option "arabtex" in the document header. In both 
cases some additional files will be loaded automatically.

ArabTEX defines several user commands as indicated below. There is also 
a large number of (hidden) internal commands which could lead to storage 
(hash table1) overflow in a small TEX implementation. All internal 
commands contain an \at" sign (@) in their names and thus should not 
interfere with any user defined commands (but could possibly with other 
TEX extensions we do not know about).

With Plain TEX, the Arabic font by default is only available at the 
normal 14 point size which ought to cooperate well with the "cm" fonts 
at 10 points. A bold variant is also provided. For other sizes, the user 
has to change the \magnification or to define additional font 
identifiers himself. To change the default, inspect the file 
"arabtex.tex" and redefine the \pnash and/or \pnashbf command 
accordingly. With LaTEX, the usual size changing commands will also 
operate on the Arabic font.

1A TEX hash table size of 3000 to 3500 is recommended

5


Chapter 2

Input to ArabTEX

After activating ArabTEX, select one of the Arabic writing styles, e.g., 
\setarab (see Section 3). Your modified TEX/LaTEX system will recognize 
the following items:

- normal TEX/LaTEX text and commands,

- short Arabic quotations bracketed by < and > . These must normally fit 
onto one line of output, except if explicitly broken up by \\ or \| 
commands (see below). A quotation may also be started with \< except 
inside a LaTEX {tabbing} environment.

- longer Arabic texts which are bracketed by \begin{arabtext} and 
\end{arabtext}, (even when using Plain TEX!), called Arabic Environments 
in the sequel. An Arabic Environment consists of one or more paragraphs 
separated by blank lines or \par commands.

Arabic quotations and Arabic environments are called Arabic contexts in 
the sequel.

2.1 Arabic text elements

Every Arabic paragraph and every Arabic quotation is a sequence of the 
following kinds of Arabic items, separated by blank spaces or newlines:

- isolated punctuation marks, interpreted as the corresponding Arabic 
punctuation mark;

6


CHAPTER 2. INPUT TO ARABTEX 7

- \numbers", i.e. character sequences starting with a digit. A \number" 
will be processed using the normal writing sequence from left to right 
even if it contains letters and/or special characters; however, if the 
final character is a punctuation mark, it will be split off and 
processed separately.

- \Arabic quotes" coded as two left quotes or two right quotes each; 
they may also be written directly adjacent to a word.

- \words", i.e. character sequences starting with a letter or a special 
(nondigit) character followed by a letter. A final punctuation mark will 
be split off and processed separately. The (coded) characters of a word 
will in the output be arranged from right to left.

- a sequence of words, numbers, and special characters enclosed in curly 
braces { and } . This introduces a new level of TEX grouping; otherwise 
the constituents are processed normally. This feature may be nested.

Output from all items will be arranged from right to left, lines will be 
broken as necessary.

Inside an Arabic Environment, or in an Arabic quotation, you may also 
have:

- ArabTEX commands with or without parameters. These will be executed 
immediately.

- Some, but not all, TEX/LaTEX commands (see below). These will be 
executed immediately.

- Short mathematical insertions, bracketed by single $ signs. They must 
fit on one output line and are processed as usual. TEX Display mode 
within an Arabic environment is not provided; if it is required, the 
user has to leave the Arabic environment temporarily.

- short non-Arabic (\Roman") quotations, containing text and possibly 
also TEX/LaTEX commands, bracketed by < and > . These must fit on one 
output line and introduce a new level of grouping, so if they contain 
any TEX/LaTEX assignments the effects of these will be local by default. 
This feature is not available within an Arabic quotation. The alternate 
notation \< is also not provided.

2.2 Commands in an Arabic context

A control sequence inside an Arabic context must be separated from the 
preceding text item by at least one blank space, newline, or another 
control sequence, and may be of the following kinds:


CHAPTER 2. INPUT TO ARABTEX 8

- ArabTEX option changing commands. These may also be used outside an 
Arabic Context, and usually follow the TEX grouping rules.

- \\ for a line break; the last line will be padded on the left with 
spaces.

- \| for a line break; the last line will be aligned. If it comes out 
very badly spaced, automatic stretching might help (see Section 8).

- \indent or \par (or a blank line) for a new paragraph, \noindent for a 
new paragraph without indentation; (not inside Arabic quotations).

- \emphasize Arabic item will put a bar over the Arabic item.

- \emphasize {group of Arabic items} will put a bar over the indicated 
group of Arabic items.

- \setnash, \setnashbf, \setnastaliq font selection commands, see 
Section 4.

- size changing LaTEX commands like \large etc., only if LaTEX is used!

- the following commands: \footnote (observe that the syntax for Plain 
TEX and LaTEX is different!), \marginpar (also with Plain TEX, analogous 
to the LaTEX usage).

- the TEX/LaTEX commands \smallskip, \medskip, \bigskip, \input, \hfill, 
\  (for a space), \space with their usual meaning.

- \nospace will place the adjacent items in the output in contact, 
without any intervening space.

- \hspace {width} will introduce the indicated amount of spacing in the 
output.

- \mbox {text} puts the text into a box that will not be split across a 
line break.

- \spreadbox {width}{text} spreads out the text to the indicated width. 
This may be useful e.g., when typesetting poetry.
\spreadbox {width}{text\hfill } will inhibit the spreading, \spreadbox 
{width}{\hfill text\hfill } will center the text inside the box.
\spreadbox {width}{\hfill } or \spreadbox {width}{ } just introduces the 
indicated amount of horizontal space, as will \hspace {width}.

If two boxing commands follow each other without any intervening blank 
space in the input, there will also be no resulting space between the 
boxes in the output.


CHAPTER 2. INPUT TO ARABTEX 9

- \centerline {text} will start a new line whose contents are centered 
(not inside Arabic quotations).

- \spreadline {text} will start a new line whose contents are spread out 
over the whole width of the page (not inside Arabic quotations). It is 
approximately equivalent to \spreadbox {\hsize }{text}.

- User defined commands whose expansion produces legal ArabTEX input may 
be called by \docommand {command and parameters}. The command is 
expanded exactly once,1 and the result is processed by ArabTEX again. 
Any side effects of the expansion will be local.

- Parameter assignments inside an Arabic context may be performed by 
\doassign {parameter}{value}. The effect is normally local except if the 
form \doassign {\global parameter}{value} is used.

- Any non-recognized command will generate an error message and will be 
echoed verbatim in the output. Even though ArabTEX tries hard to get 
into synchronization again, additional spurious errors may occur.

- inside an Arabic Context no further LaTEX or ArabTEX environment may 
be nested (with the possible future exception of list environments; 
these are not yet implemented.)

For a list of all available commands, consult the Index to this report. 
As a reminder, a list of all commands that are valid inside Arabic text 
will appear in the log file.

1This is no strong restriction as the expansion may contain \docommand 
calls again.

Chapter 3

Language selection

The processing of input text to be written in the Arabic script is 
somewhat language dependent. Thus before the first Arabic quotation or 
Arabic environment you have to indicate the desired processing mode by 
one of the commands \setarab, \setfarsi, \seturdu, \setpashto, 
\setmaghribi, or \setverb (no special processing; see however Section 
5.5). The processing mode may be changed at any time, even inside an 
Arabic environment or an Arabic quotation.

After selecting a language, the symbols < and > serve to bracket short 
insertions in the chosen language. Whereas this is usually convenient, 
observe that they can thus no more be used for other purposes, except in 
mathematical mode where they retain their normal meaning as relational 
operators. To temporarily return them to their normal mode of operation, 
deselect the language by \setnone. Arabic insertions may also be started 
by \<.1

For further details on supported languages, see Section 7.

1Note for advanced TEX users: All language selecting commands except 
\setnone set the character < active. If Arabic insertions are not 
needed, or are always started with \<, the user may reuse the command < 
for other purposes, or deactivate it by \catcode `\<=12 to return it to 
its normal meaning.

10


Chapter 4

Font selection

For space economy, only the Naskh font is available by default. With 
LaTEX, additional fonts can be loaded by the document style options 
"nashbf" (for bold-face) and/or "nastaliq" (when available). Users of 
Plain TEX are considered specialists and have to define and load 
suitable fonts at the required sizes themselves.

The following font selection commands are available:

- \setnash (default) selects the Naskh font.

- \setnashbf selects a bold-face version of Naskh.

- \setnastaliq selects the Nasta`liq font.

If a font is not available or has not been loaded, the corresponding 
command will select the default font.

With LaTEX, the size changing commands will also operate on the 
additional fonts.

11


Chapter 5

Input coding conventions

The ASCII input notation for Arabic text has been modelled closely after 
the transliteration standards ISO/R 233 and DIN 31 635. As these 
standards do not guarantee unique re-transliteration and are also not 
7-bit ASCII compatible, some modifications were necessary. These follow 
the general rules:

- whenever the transliteration uses a single letter, code that letter;

- whenever the transliteration uses a letter with a diacritical mark, 
put the punctuation character most closely resembling the diacritical 
mark before the letter (and not behind it as in some other coding 
proposals, as otherwise the readability of the input would suffer).

- use capital letters for writing variants

5.1 Standard Arabic and Persian characters

The standard codings for Arabic and Persian are given in Table 5.1 and 
Table 5.2.

- For long vowels, use the capital letters <A>, <I>, <U> or <aa>, <iy>, 
<uw>.

- To get the defective writing of long vowels, use <_a>, <_i>, <_u>.

- 'Alif maqs.?ura is <_A> or <Y>.

- The short vowels fath.a, kasra, d.amma are coded <a>, <i>, <u> and 
need not normally be written except in the following cases:

12


CHAPTER 5. INPUT CODING CONVENTIONS 13

a @ a 'alif b H. b b?a' p H? p p?a'

t ?H t t?a' _t ?H t? t??a' ^g ` <=g <=g??m

.h h h. h.?a' _h p h>= h>=?a' d X d d?al

_d ?X d? d??al r P r r?a' z R z z?ay

s Ä s s??n ^s ? <=s <=s??n .s ? s. s.?ad

.d ? d. d.?ad .t ? t. t.?a' .z ? z. z.?a'

` ? , `ayn .g ?? _g _gayn f ? f f?a'

q ? q q?af v ö v v?a' k ? k k?af

g ? g g?af l ? l l?am m ? m m??m

n ?? n n?un h ? h h?a' w ? w w?aw

y ?? y y?a' _A ? ?a 'alif T ?? t t?a'

maqs.?ura marbut.a

Table 5.1: Standard codings for Arabic and Persian.

{ at the beginning of a word where they generate 'alif ,

{ adjacent to hamza where they will influence its carrier,

{ when the transliteration is required,

{ in the \fullvocalize mode.

- Tanw??n is coded <aN>, <iN>, or <uN>. A silent 'alif , if required, is 
supplied automatically; it may also be explicitly written: <aNA>. 
Likewise, a silent w?aw may be written <NU> as in <`amruNU>.

- hamza is denoted by a single right quote <'>. After selecting a 
language by \setarab etc., the hamza carrier will be determined from the 
context according to the rules for writing Arabic words; if that is not 
wanted, \quote" the hamza (see Section 5.2 below). In the \setverb mode, 
the hamza carrier is determined by the following letter; see Section 
5.5.

- madda on 'alif is generated by a right quote (hamza) before <A>: <'A>.

CHAPTER 5. INPUT CODING CONVENTIONS 14

c flh c h.?a' with hamza

^c x <=c <=g??m with three dots (below)

,c ?h ?c h>=?a' with three dots (above)

^z T <=z z?ay with three dots (above)

~n ?? ~n k?af with three dots (Ottoman)

~l ?? ~l l?am with a bow accent (Kurdish)

.r v _r r?a' with two bows (Kurdish)

Table 5.2: Additional codings generally available.

It may also be written <~A>; likewise, <~I> and <~U> will produce madda 
on y?a' and on w?aw , as required in some older writing conventions.

- The coding <`> for `ayn is a single left quote, beware of confusing it 
with hamza!

- The \invisible consonant" <|> may be inserted in order to break 
unwanted ligatures and to influence the hamza writing. It will not show 
in the Arabic output or in the transliteration. At the beginning of a 
word it will suppress a following short vowel; otherwise it acts like a 
consonant.

- The sequence <||> will insert a small space, as does <"|> (see Section 
5.2 below). The adjacent characters will not be connected.

- <=Sadda is indicated by doubling the appropriate letter coding.

- The definite article is separated from the following word by a hyphen. 
It may be written in the assimilated form (if it exists): <as-salaamu>, 
or always as <al->; in that case a subsequent \sun letter" must be 
doubled: <al-ssalaamu>, to receive a <=sadda, and to prevent a suk?un on 
the l?am. The transliteration in both cases is identical.

- Hyphens <-> are used for tying words together, or for indicating a 
connecting vowel in Arabic, or an iz.?afet connection in Persian. They 
may be used freely, and generally do not change the writing, but will 
show up in the transliteration. Additionally, at the beginning and the 
end of an

CHAPTER 5. INPUT CODING CONVENTIONS 15

otherwise isolated word they enforce the use of the connecting form of 
the adjacent letter (if it exists), like e.g. in the date <1400 h->.

- A double hyphen <--> between two otherwise joining letters will break 
any ligature and will insert a horizontal stroke (tatw??l , ka<=s??da) 
without appearing in the transliteration. It may be used repeatedly. See 
also Section 8: automatic stretching.

For special applications, it can also be coded <B>; and <|B> will behave 
like an ordinary consonant and may carry vowel indicators, tanw??n, 
suk?un, and, in the combination <|BB>: <=sadda.

5.2 Quoting

In \novocalize mode (see Section 5.4), a double quote <"> will modify 
the meaning of the following character as follows:

- if a short vowel follows, the appropriate diacritical mark fath.a, 
kasra, d.amma will be put on the preceding character.

{ If <N> follows the short vowel, the appropriate form of tanw??n will 
be generated instead.

{ At the beginning of a word, 'alif is assumed as the first character.

- if the following character is a single right quote, a hamza mark will 
be put on the preceding character even if in conflict with the hamza 
rules.

At the beginning of a word, an isolated hamza will be generated.

- if the following character is the \invisible consonant" <|>, the 
connection between the adjacent letters will be broken and a small space 
inserted. This can also be denoted <||> instead of <"|>.

At the beginning of a word, 'alif with was.la will be generated.

- otherwise: a suk?un will be put on the preceding character. The 
following character will be processed again.

The double quote will not show up in the transliteration.

In \vocalize mode, (see Section 5.4), quoting will turn a short vowel 
off; likewise, in \fullvocalize mode, quoting will also turn a suk?un 
off. Put differently: quoting will toggle the generation of short vowel 
indicators and suk?un on and off.


CHAPTER 5. INPUT CODING CONVENTIONS 16

5.3 Ligatures

There is no way to explicitly enforce ligatures as a large number of 
them are generated automatically. The results will not always look 
satisfactory, so we recommend inspecting the output after the first run. 
Any unwanted ligature can be suppressed by interposing the invisible 
character <|> between the two letters otherwise combined into a 
ligature. After \ligsfalse, in the middle of a word fewer ligatures will 
be produced; for some texts this looks better. You can return to the 
normal strategy by \ligstrue.

5.4 Vowelization

There are three modes of rendering short vowels:

- \fullvocalize:

{ Every short vowel written will generate the corresponding diacritical 
mark fath.a, kasra, d.amma, except if quoted.

{ If <N> follows a short vowel, the corresponding form of tanw??n is 
generated instead.

{ Defective writing: The coding <_a> will produce a Qur'an 'alif accent 
(also called dagger 'alif ) instead of an explicit 'alif character which 
would be coded <A> or <aa>. Likewise, <_i> will produce a small 'alif 
below the preceding consonant in place of <I> (<iy>), and <_u> will 
produce an inverted d.amma in place of <U> (<uw>).

{ If a long vowel follows a consonant, the corresponding short vowel is 
implied. The long vowel itself carries no diacritical mark.

{ If no vowel is given after a consonant, suk?un will be generated 
except if a double quote precedes the next consonant. The l?am of the 
definite article receives no suk?un if a double \sun letter" follows.

{ 'alif at the beginning of a word carries was.la instead of the vowel 
indicator if the preceding word ended with a vowel.

- \vocalize: As above, but suk?un and was.la will not be generated 
except if explicitly indicated by \quoting".

- \novocalize: No diacritics will be generated except if explicitly 
asked for by \quoting".

In all modes, a double consonant will generate <=sadda, and <'A> always 
generates madda on 'alif .


CHAPTER 5. INPUT CODING CONVENTIONS 17

After <aN> the silent 'alif character is generated if necessary. The 
silent 'alif may also be explicitly indicated by <aNA>, or coded 
literally as <A> in \novocalize mode. If a silent 'alif maqs.?ura is 
wanted instead, write <aN_A>, <aNY>, <_A> or <Y>.

The tanw??n fath.a is normally put on the last consonant of the word, 
even if a silent 'alif follows. If it is instead supposed to go onto the 
'alif as in some modern Arabic conventions, or in Persian, this 
behaviour can be achieved by the option \newtanwin. The option 
\oldtanwin will restore the classical behaviour.

A silent 'alif after w?aw is indicated by <UA> or <WA> (with a capital 
<W>!).

5.5 Verbatim input

'a fl@ hamza on 'alif 'i @fl hamza below 'alif

'w fl? hamza on w?aw 'y flK hamza on a tooth

'h fl? hamza on h?a' 'B fl? hamza on the line

'| Z isolated hamza 'A ffi@ madda on 'alif

Table 5.3: Verbatim codings for the carrier of hamza

After disabling language specific processing by \setverb or \setnone, 
ArabTEX will not use any context information to determine the carrier of 
hamza. Instead the user has to supply this information himself by the 
next character typed after <'>. Generally this character will be used as 
the carrier; for examples and some exceptions see Table 5.3. A short 
vowel indicator may follow.

To ease automatic conversion, an initial 'alif may also be coded <A>.

5.6 Alternate input codings

The ArabTEX input notation has been very carefully designed for 
flexibility, readability, and ease of use for linguists confined to 
standard 7-bit ASCII equipment for processing and transmitting data. 
However, it does not make much sense recoding existing machine-readable 
text files coded according to other standards. Thus, some alternate 
reading modules have been written (as there

CHAPTER 5. INPUT CODING CONVENTIONS 18

are more than 10 different codings in current use, this is an open-ended 
activity), and a general code switching procedure has been provided.

An alternate reading module, e.g. asmo449.sty for the ASMO 449 code, is 
installed by adding its name (asmo449) as a LaTEX style option, or by 
\input asmo449.sty. Afterwards, a code name (in this case asmo449) is 
defined.

Input coding is switched by the command \setcode {code name} that 
changes the coding for Arabic text globally, or by the environment 
\begin {setcode}{code name} ? ? ? \end {setcode} which follows the 
normal TEX grouping rules.

Coding may be switched several times in the same document, provided the 
appropriate reading modules are installed; \setcode {arabtex} reverts to 
the standard ArabTEX notation.

Please observe that only Arabic text is affected by \setcode {code 
name}; text outside of Arabic contexts, and control sequence names, are 
still assumed to be in 7-bit ASCII. As existing text files presumably do 
not contain any control sequences or non-Arabic text anyway, we suggest 
using a small ASCII TEX/LaTEX driver file setting all relevant options 
and containing any non-Arabic text, and calling the Arabic text files by 
\input {file name} from within an Arabic environment .

For details on available additional reading modules, see Appendix H.

Chapter 6

Transliteration

6.1 ZDMG transliteration style

In addition to the arabic writing, the standard scientific 
transliteration may also be obtained from a fully vowelized input text. 
This mode is activated by \transtrue and may be switched off again by 
\transfalse. If only the transliteration is wanted, you can deactivate 
the arabic writing by \arabfalse; it can be reactivated by \arabtrue. If 
both modes are active their output will be interleaved line by line.

The transliteration mode assumes that the input text is in the Arabic or 
Persian language and has been coded according to the rules given above. 
For words from other languages the transliteration might be in error. 
For Arabic text, the following special cases are handled:

- after the definite article, a double consonant will be assimilated;

- an initial vowel will be replaced by an apostrophe whenever the 
preceding word ended with a vowel (in this case a was.la appears in the 
Arabic writing). If that is not wanted, start with hamza.

- a silent 'alif or 'alif maqs.?ura after <N> (tanw??n) and <U> is 
omitted in the transliteration. The same happens after w?aw if it is 
written as a capital <W>.

- To correctly reproduce some historical writings, a silent long vowel 
after <_a> is omitted in the transliteration. For examples, see the 
Appendix.

For economy of space, the transliteration module is not loaded by 
default. If

19


CHAPTER 6. TRANSLITERATION 20

you want to use it, add the style option "atrans" with LaTEX; and with 
Plain TEX, say \input atrans.sty after loading ArabTEX.

6.2 Encyclopedia of Islam style

For special purposes, the standard transliteration output may be 
modified by including the LaTEX option "etrans", or by loading the file 
"etrans.sty" when working with Plain TEX. After this modification, the 
transliteration will follow the style of the Encyclopedia of Islam.


Chapter 7

Support for other languages

besides Arabic

ArabTEX is primarily intended for typesetting texts in classical and 
modern Arabic, but it also provides some support for several other 
languages that are customarily written in the Arabic alphabet.

In order to switch to the conventions for one of these languages, say 
\setfarsi, \seturdu, \setpashto, \setmaghribi; \setverb will switch off 
any language specific processing. \setarab can be used to switch back to 
the Arabic conventions. After selecting the language, < and > serve as 
delimiters for quotations; \setnone will, like \setverb, deselect any 
language, and will also return < and > to their normal TEX meaning.

This part of ArabTEX relies heavily on contributions from the user 
community; we want to especially mention Ivan Dershanski who completely 
reimplemented the routines for processing Persian. As we extensively 
modified these contributions while integrating the system, we are solely 
responsible for any remaining, or newly introduced, errors.

7.1 Persian (Farsi, Dari), also Ottoman, Kur-

dish

- All characters needed for writing Farsi are available by default. The 
short vowels <e> and <o> are mapped to <i> and <u>, the long vowels <E> 
and <O> to <I> and <U> without a vowel indicator. <H> denotes final 
silent h?a' . This h?a' receives no suk?un even in fully vowelized mode.

21


CHAPTER 7. SUPPORT FOR OTHER LANGUAGES BESIDES ARABIC22

- For fath.a or kasra followed by a final silent h?a' you can also write 
<,a> or <,e> in place of <aH> and <eH>.

- The iz.?afet connection may always be written <-i> or <-e> (with 
hyphen); then the correct spelling will be determined from the context. 
Likewise the y?a'-i-wah.dat can always be written <-I> or <-E>.

- The present tense forms of the copula are coded <-am>, <-I>, <-ast>, 
<-Im>, <-Id>, <-and>. In the output they are written as separate words 
after a little space.

- The final y?a' carries no dots. Farsi uses the Nasta`liq font if 
available, otherwise Naskh.

For further details see Appendix G.

7.2 Urdu

- For Urdu, additional codings are available, see Table 7.1. Some of the 
given codings also occur in Pashto but with a different meaning, see 
Section 7.3.

- The short vowels <e> and <o> are mapped to <i> and <u>. <H>, <,a> and 
<,e> are used as in Persian.

- Even in fully vowelized mode, an aspirated consonant before <h> 
receives no suk?un since the two are technically a single letter.

- Urdu uses the Nasta`liq font if available, otherwise Naskh.

7.3 Pashto (Afghanic)

- For Pashto, additional codings are available, see Table 7.2. Some of 
the given codings also occur in Urdu but with a different meaning, see 
Section 7.2.

- The short vowel <e> is indicated by a zwarakay , <o> by an inverted 
d.amma.

Observe also the following codings:

<w"'> hamza on w?aw
<h"'> hamza on h?a' , if not generated by iz.?afet


CHAPTER 7. SUPPORT FOR OTHER LANGUAGES BESIDES ARABIC23

h ? h always denotes the \two-eyed" h?a'

,h ? h the \wavy" h?a' letter

,t ?H ?t t?a' with a small t.?a' accent

,d ?X ?d d?al with a small t.?a' accent

,r ?P ?r r?a' with a small t.?a' accent

.n ? n. n?un without a dot

E fl ?e ?e, y?a' bar??' in the final position

ae flff? ae the diphtong ae

ao ?ff? ao the diphtong ao

O ?? ?o the long vowel ?o

U ?fi? ?u the long vowel ?u

Table 7.1: Additional codings for Urdu.

- The codings <H>, <,a> and <,e> are used as in Persian. The rules for 
iz.?afet and y?a'-i-wah.dat apply.

- For writing some Pashto words in the Urdu style, write the command 
\seturdu and afterwards switch back to the Pashto conventions by 
\setpashto.

7.4 Maghribi

Nearly like Arabic but using a different writing convention. f?a' is 
written with one dot below the letter, q?af with one dot above the 
normal letter form of f?a' . The three dots of v?a' are put below the 
letter.


CHAPTER 7. SUPPORT FOR OTHER LANGUAGES BESIDES ARABIC24

,t ?L ?t t?a' with a small loop

,d ^ ?d d?al with a small loop

,r V ?r r?a' with a small loop

.n ? n. n?un with a small loop

g ? g g?af with a small loop instead of a bar

,z n ?z r?a' with one dot above and one below

,s ? ?s s??n with one dot above and one below

ae fl?ff? ae the diphtong ae

Ee ü? ey the diphtong ey

ee fl?? ey the diphtong ey

E ?..? ?e the long vowel ?e

O ?f? ?o the long vowel ?o

U ?fi? ?u the long vowel ?u

Table 7.2: Additional codings for Pashto.

7.5 Other languages

This is up to experimentation by the user. If \setarab or \setfarsi will 
not produce the desired result, try \setverb for verbatim mode.

The vowelization and the transliteration cannot generally be expected to 
be correct, but might work by accident.

In case some character variants not yet provided are needed, feel free 
to ask the author for help. There is no simple way for the user to 
modify the script.

Chapter 8

Miscellaneous features

8.1 Automatic stretching

For special purposes, e.g. for headlines and for Arabic paragraphs 
containing long mathematical or non-Arabic insertions, the connection 
between adjacent Arabic letters may be made \elastic", if they form no 
ligature. Thus a ka<=s??da is inserted whose length will be adjusted 
automatically to uniformly fill the output line.

This feature very easily leads to storage overflow during the 
processing, and should only be used whenever necessary. It is switched 
on with \spreadtrue and switched off again with \spreadfalse. Inside an 
Arabic Environment, it will also be switched off automatically at the 
end of every paragraph.

8.2 Dots on y?a'

Whether y?a' in the final position carries dots or not is controlled by 
the chosen language convention. You can override this, after selecting 
the language, by \yahdots and \yahnodots.

8.3 Additional codings

To reproduce exotic, erroneous or archaic texts exactly as they are 
written, some additional codings are available, see Table 8.1.

25


CHAPTER 8. MISCELLANEOUS FEATURES 26

.k ? k k?af in the final position without a mark

^d X. <=d d?al with a dot below

.f ? f. f?a' without a dot

.b H b. b?a' without a dot

.n ? n. n?un without a dot (not available in Pashto mode)

Y ? ?a 'alif maqs.?ura; y?a' without dots in all positions

Table 8.1: Additional codings for special purposes.

If further variants are needed, write to the author and indicate:

- the required shape,

- the assumed transliteration,

- a suggestion for the input coding,

- some information on the intended use.

We are willing to consider any suggestion. Adding a new character might 
be easy, or else it might be impossible. ArabTEX is flexible, but there 
are some technical limitations.

8.4 Progress report

As ArabTEX is slow, it will produce some terminal output while running 
to indicate it is still alive. If that is not wanted, e.g., on a very 
fast system, or while running a batch job, say \quiet or \tracingarab = 
0 (outside an Arabic Environment; otherwise say \doassign {\tracingarab 
}{0}). \tracingarab = 1 will only report Arabic paragraphs, a value of 
2: Arabic lines and insertions, a value of 3 or more: individual Arabic 
items.


CHAPTER 8. MISCELLANEOUS FEATURES 27

8.5 Verbatim copy of the input

For test purposes, the Arabic input may be reproduced verbatim after 
\showtrue in addition to the normal output; \showfalse switches this 
feature off again. Commands will not usually be shown. The output will 
generally not look pleasant, and this feature is only provided in order 
to trace down errors, or to demonstrate the operation of ArabTEX as in 
the appendix.

8.6 Using ArabTEX with EDMAC

ArabTEX will cooperate with EDMAC, a Plain TEX macro package for 
critical editions, written by John Lavagnino and Dominik Wujastyk. If 
EDMAC is already present when ArabTEX is loaded, the EDMAC commands 
will, after suitable modifications, be available inside an Arabic 
environment. Their arguments are considered Roman text but may contain 
Arabic quotations.

For further details, see the EDMAC documentation.


Chapter 9

Acknowledgments

The development of ArabTEX would not have been possible without the 
assistance of many people, and it is impossible to acknowledge every 
individual contribution. Besides our local team, i.e. Udo Merkel and 
Heribert Schlebbe, helpful advice came, among others, from Chahriar 
Assad, Benno van Dalen, Ivan Derzhanski, Wolfdietrich Fischer, Ahmed 
El-Hadi, Yannis Haralambous, Abdelsalam Heddaya, Nicholas Heer, Iqbal 
Khan, Tom Koornwinder, Eberhard Krüger, Asif Lakehsar, Jan Lodder, 
Richard Lorch, Pierre MacKay, Eberhard Mattes, Fathy Neamat-Allah, Bernd 
Raichle, Ulrich Rebstock, Mohamed Saba, Waheed Samy, Annemarie Schimmel, 
Nariman Shehab, Dominik Wujastyk, and Michio Yano. We also have to thank 
all users who sent error reports, comments, and suggestions.

28


Chapter 10

References

B. Alavi, M. Lorenz: Lehrbuch der persischen Sprache.
5. Auflage 1988. VEB Verlag Enzyklopädie, Leipzig.

A. A. Ambros: Einführung in die moderne arabische Schriftsprache. 1. 
Auflage 1969. Max Hueber Verlag, München.

ASMO 449: 7-bit coded Arabic character set for information interchange. 
Arabic Standards and Measurements Organization, 1982.

J. D. Becker: Arabic Word Processing.
Comm. ACM 30/7, 600-610 (1987).

T. Borg: Arabisch für Ausländer. Ein Lehrbuch für modernes Hocharabisch. 
2. Auflage 1979. Verlag Borg GmbH, Hamburg.

J. A. Boyle: Grammar of Modern Persian.
Wiesbaden: Otto Harrassowitz, 1966.

B. Comrie (ed.): The World's Major Languages.
Croom Helm, London 1987.

DIN 31 635: Umschrift des Arabischen Alphabets.
Deutsches Institut für Normung e.V., 1982.

J. Lavagnino and D. Wujastyk: An Overview of EDMAC: A plain TEX format 
for critical editions.
TUGboat 11/4, 623-643 (1990).

L. P. Elwell-Sutton: Elementary Persian Grammar.
Cambridge University Press, 1963.

29


CHAPTER 10. REFERENCES 30

C. Faulmann: Das Buch der Schrift, enthaltend die Schriften und 
Alphabete aller Zeiten und aller Völker des gesammten (sic!) Erdkreises. 
K. K. Hof- und Staatsdruckerei, Wien 1878.

W.D. Fischer: Grammatik des Klassischen Arabisch.
2. Auflage 1987. Verlag Otto Harrassowitz, Wiesbaden.

A. Grohmann: Arabische Paläographie (Teil I und II).
Österreichische Akademie der Wissenschaften, Philosophisch-historische 
Klasse, Denkschriften 94, 1. Wien 1967.

E. Harder, A. Schimmel: Arabische Sprachlehre.
15. Auflage 1983. Julius Groos Verlag, Heidelberg.

ffl??G. Q?? @ ffl?u?' @ Y? @??fl , ?A ffl?u?' @ Yffl"m? "??A?.

H?a<=sim Muh.ammad al-H?at.t.?at.: Qaw?a`id al-H?at.t.i al-`Arab??.
Maktaba an-Nahd.a, Baghdad; D?ar al-Qalam, Beirut, 1400/1980.

ISO/R 233 - 1961: International System for the Transliteration of Arabic 
Characters. International Standards Institution, 1961.

ISO 8859 - 6: Information processing | 8-bit single-byte coded graphic 
character sets | Part 6: Latin/Arabic alphabet.
International Organization for Standardization, 1987.

ISO 9036: Information processing | Arabic 7-bit coded character set for 
information interchange. International Organization for Standardization, 
1987.

D. E. Knuth: The METAFONTbook.
Addison Wesley Publishing Comp., Reading, Mass., 1986.

D. E. Knuth: The TEXbook.
Sixth printing. Addison Wesley Publishing Comp., Reading, Mass., 1986.

D. E. Knuth and P. MacKay: Mixing right-to-left texts with left-to-right 
texts. TUGboat 8/1, 14-25 (1987).

Ann K. S. Lambton: Persian Grammar.
Cambridge University Press, 1953.

L. Lamport: LaTEX, A Document Preparation System.
Addison Wesley Publishing Comp., Reading, Mass., 1986.

M. Lorenz: Lehrbuch des Pashto (Afghanisch).
2. Auflage 1982. VEB Verlag Enzyklopädie, Leipzig.

P. A. MacKay: Typesetting Problem Scripts.
BYTE 11/2, 201-216 (1986).


CHAPTER 10. REFERENCES 31

H. Ritter: Über einige Regeln, die beim Drucken mit arabischen Typen zu 
beachten sind.
ZDMG 100/2, 577-580 (1951).

Friedrich Rückert: Grammatik, Poetik und Rhetorik der Perser. Wiesbaden: 
Otto Harrassowitz, 1966.

C. Salemann, V. Shukovski: Persische Grammatik.
4. Auflage 1947. Verlag Otto Harrassowitz, Leipzig.

A. Schimmel: Islamic Calligraphy.
E.J.Brill, Leiden, Netherlands 1970.

H.J. Vermeer, W. Akhtar, A. Akhtar: Urdu-Lautlehre und Urdu-Schrift. 3. 
Auflage 1985. Julius Groos Verlag, Heidelberg.


Appendix A

Obtaining ArabTEX

The ArabTEX system is available from the author's institution (by 
anonymous FTP from ftp.informatik.uni-stuttgart.de (129.69.211.2), in 
the directory pub/arabtex) and from many other common servers, e.g. the 
CTAN network (Aston, Niord, Stuttgart). The files may be transferred 
individually or as a package: arabtex.zip for PC systems, arabtex.tar.Z 
for U*IX systems; we recommend to get and inspect the README file first. 
Successfull operation on the Apple Macintosh in conjunction with OzTEX 
has also been reported.

At the time of this writing, version 3.00 is current. The Nasta`liq font 
is still under development; Naskh will be substituted automatically. 
Version 2 is downward compatible; the old version 1 is obsolete and 
should no more be used.

ArabTEX is copyrighted, but free use for scientific, experimental and 
other strictly private, noncommercial purposes is granted. Offprints of 
any publications using ArabTEX are welcome. Using ArabTEX otherwise 
requires a license agreement.

32


Appendix B

Installing ArabTEX

The installation procedure is strongly system dependent, and we 
recommend securing the assistance of a local TEXpert. You have to 
install the "nash14" font with its "*.pk" and "*.tfm" files on the font 
search path of your TEX system, and the "*.sty" files and "arabtex.tex" 
on the source search path (usually TEXINPUT) of your system. Possibly 
you will also have to rename the "*.pk" files according to local 
conventions, and as a last resort you can try to recreate the fonts from 
the "*.mf" METAFONT sources. Additional fonts, whenever available, are 
installed analogously.

ArabTEX has been found to cooperate well with TEX versions 3.xxx, LaTEX 
versions 2.09 of 1991 or later, NFSS and NFSS2 (not required), and 
previewers that can handle fonts of more than 128 characters. TEX-XET or 
TEX--XET are not required, and their additional features are presently 
not exploited. The TEX \hash size" should be at least 3000 to 3500, 
especially when using ArabTEX in conjunction with LaTEX, and if the 
transliteration module is used. Use of a BIG TEX may be necessary when 
using the NFSS2 due to the latter's high demand on string storage. Space 
and time requirements are not negligible, and have increased during 
development; however, ArabTEX currently still runs, albeit slowly, even 
on a PC XT standard configuration.

33


Appendix C

Release history

There was a Version 1 which is no more supported.

Version 2 was not fully compatible with Version 1; however, moving to 
the new version usually caused little problems. Apart from some 
extensions, most changes were introduced in order to better conform to 
the transliteration standards, and to have less compatibility problems 
with TEX and LaTEX. Further versions are expected to be upward 
compatible if no serious problems will turn up.

The main differences between versions 1 and 2 are:

- The font size has increased, so the document layout may change. The 
old font "nash10" can no more be used as the character locations have 
been assigned differently.

- Some Arabic characters are now coded differently: `ayn is denoted by a 
left quote, and <c>, <^z>, <^t>, and <.n> have been assigned new 
meanings in order to better conform to the standard transliteration.

- There are many more ligatures than before. This normally need not 
concern the user.

- \vocalize will no more generate suk?un and was.la except if explicitly 
indicated by quoting. See \fullvocalize.

- Arabic Environments are now always bracketed by the new control 
sequences \begin{arabtext} and \end{arabtext} even if only the 
transliteration is wanted.

We strongly recommend converting any still existing version 1 input 
files to the new notation. To assist in this migrating procedure, the 
LaTEX option

34


APPENDIX C. RELEASE HISTORY 35

"oldarabtex" and/or the command \oldarabtex will switch to a mode where 
virtually all places where the old conventions are used, will either 
produce a TEX error message or will be flagged in the output.

The changes introduced since the release of Version 2.00 up to now 
(Version 3.00) fall into one of two categories: error corrections, and 
upward compatible extensions. Details are not given here, but are 
documented in the text file CHANGES that is part of the distribution 
package of ArabTEX.

Version 3 is upwards compatible with version 2. All supported features 
are documented in this manual.


Appendix D

Sample ArabTEX input

\documentstyle[12pt,arabtex]{article}
\begin{document}

\setarab % choose the language conventions
\vocalize % diacritics for short vowels on
\transtrue % additionally switch on the transliteration
\arabtrue % print arabic text ... is on anyway
\spreadtrue % spread out caption

\centerline {<^gu.hA wa-.himAruhu>}

\begin{arabtext}
'at_A .sadIquN 'il_A ^gu.hA ya.tlubu minhu .himArahu li-yarkabahu fI 
safraTiN qa.sIraTiN wa-qAla lahu:

sawfa 'u`Iduhu 'ilayka fI al-masA'i, wa-'adfa`u laka 'u^graTaN. \\ 
fa-qAla ^gu.hA:

'anA 'AsifuN ^giddaN 'annI lA 'asta.tI`u 'an 'u.haqqiqa
laka ra.gbataka, fa-al.himAru laysa hunA al-yawma. \\
wa-qabla 'an yutimmu ^gu.hA kalAmahu
bada'a al-.himAru yanhaqu fI i.s.tablihi. \\
fa-qAla lahu .sadIquhu:

'innI 'asma`u .himAraka yA ^gu.hA yanhaqu. \\
fa-qAla lahu ^gu.hA:

.garIbuN 'amruka yA .sadIqI!
'a-tu.saddiqu al-.himAra wa-tuka_d_dibunI?
\end{arabtext}

\end{document}

36


Appendix E

Sample ArabTEX output

fi? fiP A ff?kff ff? Affm fic <=guh.?a wa-h.im?aruhu

-at?a s.ad??qun -il?a <=guh.?a yat.lubu minhu h.im?arahu li-yarkabahu 
f?? safratin qas.??ratin wa-q?ala lahu:

: fi?ff? ffÄff?fl ff? ??? ffQÖ??ff
ff?fl ??? ffQ ?fi ff? ??
?flff
fi?ffJ.
ff?QffÖ??ff fi? ffP Aff"gff fi? ?J?ff fiI.
fi??ffÄ? Affm fic ?ff?@flff
<=?K? Yff ff? ?ff?G
fffl@

sawfa -u ,??duhu -ilayka f?? 'l-mas?a-i , wa--adfa ,u laka -u<=gratan.

. ??? ffQc
fifl@ ff?ff? fi? ff?fl X
fffl@ ff? , Zff A ff? ffÜ?@ ??
?flff
ff?J?
ff? @flff
fi? fiYJ??ff
fifl@ ff?? ff?

fa-q?ala <=guh.?a:

: Affm fic ffÄ ff?fi ff?fl

-an?a -?asifun <=giddan -ann?? l?a -astat.??,u -an -uh.aqqiqa laka 
ra_gbataka, fa-'lh.im?aru laysa hun?a 'l-yawma.

. ff??ffJ?? @ Aff?J fi? ffü??
ff? fiP Aff"mff?' Aff?fl , ff?ff?J ffJ. ?? ffP ff?ff? ff? fflff?fi ffk
fifl@ ??
fffl@ fi?J??ffff?J?
fffl@ BffB ??
fflff?G
fffl@ @ ?fflYg.ff
<=??ff
ffi@ Aff?K
fffl@

wa-qabla -an yutimmu <=guh.?a kal?amahu bada-a 'l-h.im?aru yanhaqu f?? 
's.t.ablihi.

. ?ff?ffJ.
ff?? @ ??
?flff
fi?ff? ?DffK? fiP Aff"mff?' @
fffl@ ffYffK. fi? ff? CffC ff? Affm fic fiffl" ??ff fiK? ??
fffl@ ff?J. ff?fl ff?

fa-q?ala lahu s.ad??quhu:

: fi? fi?fiK? Yff ff? fi?ff? ffÄ ff?fi ff?fl

-inn?? -asma ,u h.im?araka y?a <=guh.?a yanhaqu.

. fi?ff? ?DffK? Affm fic AffK?
ff? ffP Aff"gff fi? ffÜfiÖfffl@ ??
fflff?G @flff

fa-q?ala lahu <=guh.?a:

: Affm fic fi?ff? ffÄ ff?fi ff?fl

_gar??bun -amruka y?a s.ad??q??! -a-tus.addiqu 'l-h.im?ara 
wa-tukad?d?ibun???

? ??
??ff fiK.
fflff?Y ff?fi?K ff? ffP Aff"mff?' @ fi? fflffY ff?fi?Ä
fffl@ ! ??
?fiffK? Yff ff? AffK?
ff? fiQ?
fffl@ <=I.K? Qff
ff??

37


Appendix F

Coding examples for

Arabic1

The short vowels fath.a , kasra , d.amma are denoted, as in the 
transliteration, by the small letters a, i, u:

mana`a ff?ff?J ff? mana ,a, _dahaba ffI. ff? ff?X d?ahaba, ^sariba ffH. 
Qff?ff? <=sariba,

qabila ff?J.ff
ff?fl qabila, `a.zuma ff? fi? ff? ,az.uma, `alu fi? ff? ,alu, bal 
>=?ffK. bal,

ni`ma ff" >=? ?Kff ni ,ma, yaktub >=I.fi?J >=?ffK? yaktub.

The long vowels ?a, ??, ?u are denoted by capitals A, I, U or by aa, iy, 
uw:

qAtala ff?ff?K Aff?fl q?atala, nUzi`a ff? Rff ?fi?K n?uzi ,a, lUmI ???ff 
?fi? l?um??,

sIrI ??QffÖ??ff s??r??; lawmI ???ff >=?ff? lawm??, sayrI ??Qff>=Ö? ff? 
sayr??.

Alif maqs.?ura is coded as _A or Y.

ramY ? ff? ffP ram?a, _dikrY ? ffQ >=? ?Xff d?ikr?a, `al_A ? ff? ff? 
,al?a, bal_A ? ff?ffK. bal?a.

Silent 'alif : The plural suffixes -?u, -aw of the verb are denoted UA, 
aW or aWA:

katabUA @?fiJ.ff?J ff? katab?u, yaktubUA @?fiJ.fi?J >=?ffK? yaktub?u,

ramaWA @ >=? ff? ffP ramaw, yalqaW @ >=? ff?fi >=? ffK? yalqaw.

1Most of the examples are taken from: Wolfdietrich Fischer, Grammatik 
des Klassischen Arabisch, 2. Auflage, Verlag Otto Harrassowitz, 
Wiesbaden 1987.

38


APPENDIX F. CODING EXAMPLES FOR ARABIC 39

The defective notation of ?a, ??, ?u can be indicated by _a, _i, _u and 
leads to the appropriate spelling:

dAru-h_u f? fiP @ ffX d?aru-h?u, ri^gli-h_i ???ff >=g. Pff ri<=gli-h??,

however: ramA-hu fi? A ff? ffP ram?a-hu, yarmI-hi ?ffJ??ff >=Q ffK? 
yarm??-hi;

_dih_i ?? ?Xff d?ih??, h_a_dih_i ?? ?Yff ?? h?ad?ih??, tih_i ???Kff 
tih??, hAtih_i ???Kff A ff? h?atih??,

rabb_i fflH.? ffP rabb??, .sAl_i ??A ff? s.?al??; hum_u f" fi? hum?u;

qiy_amaTuN <=?? ffÜ ?fi? ?flff qiy?amatun, 'il_ahuN <=??? @flff 
-il?ahun,

sam_awATuN <=?? @ ff? ?ÜfiffÖ sam?aw?atun, _tal_a_tuN <=?I ?? ff?K 
t?al?at?un,

l_akin >=?ß?ff
?? l?akin, h_a_dA @ ff?Y ?? h?ad??a, 'al-ll_ahu fi? ?ffl<?
fffl@ -al-l?ahu,

'al-rra.hm_anu fi?ß ?" >=g fffflQ?
fffl@ -ar-rah.m?anu, _d_alika ff??ff
??X d??alika.

To reproduce the historical writing correctly, a silent long vowel or 
'alif maqs.?ura after _a receives no suk?un and is ignored in the 
transliteration:

.sal_aUTuN <=??? ?? ff? s.al?atun, .hay_aUTuN <=??? ?J? ffk h.ay?atun,

zak_aUTuN <=??? ?? ffR zak?atun, mi^sk_aUTuN <=??? ?? >=ä?ff 
mi<=sk?atun,

ar-rib_aU ? ?K. fflffQ? ff@ ar-rib?a, tawr_aITuN <=??K? ?P >=?ff?K 
tawr?atun,

ram_aYhu fi?J ?? ffP ram?ahu, sIm_aYhum >=? fi?D ?Üfi??ff s??m?ahum.

The short vowel u can be written as a long vowel by _U:

'_UlY ?ff??
fifl@ -ul?a, '_UlA'i Zff BffB?
fifl@ -ul?a-i, '_UlU ?fi??
fifl@ -ul?u,

'_UlAka ff? BffB?
fifl@ -ul?aka, '_UlA'ika ff?flKff BffB?
fifl@ -ul?a-ika.

Tanw??n : The plural suffixes -un, -in, -an are written -uN, -iN, -aN or 
aNA. Silent 'alif in -an may be indicated by A or omitted; if necessary 
it is supplied from the context.

ra^guluN <=? fig. ffP ra<=gulun, ra^guliN ??
fig. ffP ra<=gulin, ra^gulaN C?C fig. ffP ra<=gulan,

madInaTaN ???ff?JK? Yff ff? mad??natan, ^gamIlaTaN ???ff?J?"ff ffg. 
<=gam??latan,

'i_daN @ ??X@flff -id?an, samA'aN ?Z A ffÜfiffÖ sam?a-an.

There is a special case:

ribaNU ??K. Pff riban; `amruNU ? <=Q>=" ff? ,amrun, `amriNU ?Q?
>=" ff? ,amrin,

however: `amraN @ ?Q>=" ff? ,amran.


APPENDIX F. CODING EXAMPLES FOR ARABIC 40

Tanw??n fath.a is traditionally put on the last consonant even if a 
silent 'alif follows. Some modern conventions, and also Persian 
practice, require to put it on the 'alif in this case. This behaviour 
may be switched on by \newtanwin, and off by \oldtanwin. \newtanwin mode 
is the default for Persian.

ra^gulaN ?CC fig. ffP ra<=gulan, 'i_daN ?@ ?X @flff -id?an.

A silent 'alif maqs.?ura after tanw??n is written aNY or aN_A:

hudaNY ? ?Y fi? hudan, fataN_A ? ??? ff?fl fatan;

compare:

al-hudY ? ffYfi?>=?ff@ al-hud?a, 'al-fat_A ? ff?? ff?fi>=?
fffl@ -al-fat?a.

T?a' marbut.a is denoted by T:

kalimaTuN <=?? ff??ff
ff? kalimatun, kalimaTiN ??? ff??ff
ff? kalimatin,

kalimaTaN ??? ff??ff
ff? kalimatan; fatATuN <=?? Aff?J ff?fl fat?atun,

fatATiN ??? Aff?J ff?fl fat?atin, fatATaN ??? Aff?J ff?fl fat?atan.

Hamza is indicated by '; the appropriate carrier is determined by the 
context:

'amruN <=Q >=?
fffl@ -amrun, 'ibiluN <=?K.ff @flff -ibilun, 'u_htuN <=?I >=s
fifl@ -uh>=tun;

ra'suN <=Ä
>=fl@ ffP ra-sun, 'ar'asu fiÄ
fffl@ >=P
fffl@ -ar-asu, sa'ala ff?
ffflA ff? sa-ala,

qara'a
fffl@ ffQ ff?fl qara-a; bu'suN <=Ä >=fl? fiK. bu-sun, 'ab'usuN <=Ä fifl? 
>=K.
fffl@ -ab-usun,

ra'ufa ff? fifl? ffP ra-ufa, ru'asA'u fiZ A ff? fffl? fiP ru-as?a-u; 
bi'ruN <=Q>=flÖK.ff bi-run,

'as'ilaTuN <=??ff? flJff >=?
fffl@ -as-ilatun, ka'iba ffI.flJff
ff? ka-iba, qA'imuN <="?fl'ff Aff?fl q?a-imun,

ri'AsaTuN <=?? ff? AffflKPff ri-?asatun, su'ila ff?flJff fi? su-ila; 
samA'uN <=Z A ffÜfiffÖ sam?a-un,

barI'uN <=Z ??Qff
ffK. bar??-un, sU'uN <=Z? fi? s?u-un, bad'uN <=Z >=YffK. bad-un,

^say'uN <=Z >=???ff? <=say-un, ^say'iN Z? >=???ff? <=say-in, ^say'aN 
A?flJ >=??
ff? <=say-an;

sA'ala ff? ffZ A ff? s?a-ala, mas'alaTuN <=??ff?
ffflA >=? ff? mas-alatun,

saw'aTuN <=??
fffl@ >=? ff? saw-atun, _ha.tI'aTuN <=??ffflJJ??ff
ffs h>=at.??-atun.


APPENDIX F. CODING EXAMPLES FOR ARABIC 41

Old Hamza convention: In an older writing style that is used, e.g., in 
some Qur'an editions, the hamza is sometimes put below its carrier or on 
the connecting line. This style may be switched on by \oldhamza (and off 
again by \newhamza):

'as'ilaTuN <=??ff?Jflff >=?
fffl@ -as-ilatun, ka'iba ffI.Jflff
ff? ka-iba, qA'imuN <="?'flff Aff?fl q?a-imun,

su'ila ff?Jflff fi? su-ila, ^say'aN A?fl?>=J?
ff? <=say-an, _ha.tI'aTuN <=??fffl?J??ff
ffs h>=at.??-atun.

Madda in the context '?a is generated automatically:

'AkiluN <=??ff
ffi@ -?akilun, qur'AnuN <=??ffi@ >=Q fi?fl qur-?anun, ra'Ahu fi?ffi@ ffP 
ra-?ahu.

To reproduce the historic writing correctly, it can also be explicitly 
written in other contexts:

'a.sdiq~A'uh_u f? fifl?ffiAff?fl Yff >=?
fffl@ -as.diq?a-uh?u;

ya^g~I'u fiZ ffi??eff? ff'? ya<=g??-u, s~U'ila ff?flKff ffi? fi? 
s?u-ila.

<=Sadda : A double consonant must be written twice, even if it is coded 
by more than one character:

nazzala ff? fffflS ff?K nazzala, ba^s^sAruN <=P A
ffffläffÄ. ba<=s<=s?arun, nawwara ffP ffffl? ff?K nawwara,

sayyiduN <=YfflffJ? ff? sayyidun, sa''AluN <=? @
fffflflA ff? sa--?alun,

.sabiyyuN <=ffl???.ff
ff? s.abiyyun, `aduwwuN <=ffl? fiY ff? ,aduwwun.

Instead of iyy, uww one can also write Iy, Uw:

.sabIyuN <=ffl???.ff
ff? s.ab??yun, `adUwuN <=ffl? fiY ff? ,ad?uwun.

Assimilation: the definite article may be always written al-; a 
following \sun letter" must be written twice like in the Arabic 
spelling. The transliteration and the use of suk?un are adjusted 
accordingly:

'al-ddAru fiP @ fffflY?
fffl@ -ad-d?aru, 'al-rra^gulu fi? fig. fffflQ?
fffl@ -ar-ra<=gulu,

'al-ssanaTu fi??ff?J ffffl??
fffl@ -as-sanatu, 'al-nnAru fiP Affffl?J?
fffl@ -an-n?aru;

'al-^gAru fiP Affe?>=' fffl@ -al-<=g?aru, 'al-bAbu fiH. AffJ.
>=?
fffl@ -al-b?abu;

'al-llaylaTu fi??ff? >=J?
ffffl??
fffl@ -al-laylatu, 'al-llisAnu fi?? A ff?
fflff??
fffl@ -al-lis?anu,

'al-ll_ahu fi? ?ffl<?
fffl@ -al-l?ahu.


APPENDIX F. CODING EXAMPLES FOR ARABIC 42

The article may also be written in the assimilated form, with identical 
result:

'ad-dAru fiP @ fffflY?
fffl@ -ad-d?aru, 'ar-ra^gulu fi? fig. fffflQ?
fffl@ -ar-ra<=gulu,

'as-sanaTu fi??ff?J ffffl??
fffl@ -as-sanatu, 'an-nAru fiP Affffl?J?
fffl@ -an-n?aru.

In some special cases the literal spelling must be used:

'alla_dI ??
?Yff
ffffl?
fffl@ -allad???, 'alla_dIna ff?ßK? ?Yff
ffffl?
fffl@ -allad???na, 'allatI ??
??ff
ffffl?
fffl@ -allat??;

however:

'al-lla_dAni ??ff @ ff?Y
ffffl??
fffl@ -al-lad??ani, 'al-llatAni ??ff Aff?J
ffffl??
fffl@ -al-lat?ani,

'al-llawAtI ??
?Gff @ ff?
ffffl??
fffl@ -al-law?at??.

Was.la : an auxiliary vowel at the beginning of a word is always 
written, but in the middle of a sentence generally without hamza. If a 
vowel precedes the word, the auxiliary vowel will be omitted in the 
transliteration, and the was.la sign will be used in the spelling:

wa-ismuhu fi? fiÜfi>=Ö?@ ff? wa-'smuhu, f--a-in.sarafa ff? ffQ? 
ffö>=?Ä?Aff? ?fl fa-'ns.arafa.2

This also works across word boundaries:

yA ibnI ??
??ff >=K.
?@ AffK? y?a 'bn??, h_a_dA ibnuh_u f?fi?J >=K.
?@ @ ff?Y ?? h?ad??a 'bnuh?u,

qAla u_hru^g >=` fiQ >=s?@ ffÄff?fl q?ala 'h>=ru<=g.

An auxiliary vowel at the end of the preceding word may be separated by 
a hyphen:

qad-i in.sarafa ff? ffQ? ffö>=?Ä?@ Yff
ff?fl qad-i 'ns.arafa,

ra'aW-u al-bAba ffH. AffJ.
>=??@ @ fi?
fffl@ ffP ra-aw-u 'l-b?aba,

min-i ibnih_i ?? ?Jff >=K.
?@ ?ßff?ff min-i 'bnih??.

This also works for the article preceding 'alif al-was.l :

'al-i-ismu fi"?>=Ö ?BBff
fffl@ -al-i-'smu, 'al-i-i^stirA'u fiZ @ ffQ?Öff
>=? ?BBff
fffl@ -al-i-'<=stir?a-u,

and even if the auxiliary vowel is omitted in the spelling:

ra^guluN-i ibnatuh_u ^gamIlaTuN <=??ff?J?"ff ffg. f?fi?J ff??>=K.
?@ <=? fig. ffP ra<=gulun-i 'bnatuh?u

<=gam??latun,

mu.hammaduN-i al-qura^sIyu fiffl????ff ffQ fi?fi>=??@ <=Yffffl"ffm fi? 
muh.ammadun-i 'l-qura<=s??yu.

2In vowelized writing, it may sometimes be advisable to introduce a 
ka<=s??da to prevent the vowel marks from bumping into each other.


APPENDIX F. CODING EXAMPLES FOR ARABIC 43

The particles li- and la- must be combined with the article except 
before l?am:

lil-rra^guli ?ff
fig. fffflQ??ff lir-ra<=guli, lal-ma|^gdu fiY >=b& ff?>=?ff? 
lal-ma<=gdu;3

however:

li-llaylaTi ??ff
ff? >=J?
ffffl??ff li-llaylati, li-ll_ahi ?ff
?ffl<?ff li-ll?ahi.

The Name of God is written with a special ligature if it is recognized 
from the input sequence ll_ah:

'al-ll_ahu fi? ?ffl<?
fffl@ -al-l?ahu, ta-al-ll_ahi ?ff
?ffl<??Aff?K ta-'l-l?ahi.

Increased spacing (Tatw??l) between adjoining characters may be produced 
by a double hyphen --; note the position of the vowel marks:

qabila ff?J.ff
ff?fl qabila, qa--bi--la ff??J.ff?ff?fl qabila, q--ab--ila 
ff??ffJ.ff??fl qabila.

q--a--b--i--la ff???ffJ.?ff? ?fl qabila, qa----bi----la ff???J.ff?? 
ff?fl qabila

Ties between words are indicated by a single hyphen:

bi-baladiN Y?
ff? ffJ.K.ff bi-baladin, ta-al-ll_ahi ?ff
?ffl<??Aff?K ta-'l-l?ahi,

sa-ya'tI ??
?Gff
>=flA ffJ? ff? sa-ya-t??, li-yafra.ha ffh ffQ >=?fi ffJ? ?ff 
li-yafrah.a,

wa-iswadda fffflX ff? >=??@ ff? wa-'swadda, ba`da-mA A ff? ffY >=?ffK. 
ba ,da-m?a,

.tAla-mA A ffÜff?A ff? t.?ala-m?a, fI-ma ff"?? ?flff f??-ma, `alA-ma 
ff?CffC ff? ,al?a-ma.

A single hyphen at the beginning or end of a word will enforce the use 
of the joining form of the first resp. the last character, if that form 
exists (for special uses only):

s Ä s, -s ü -s, -s- ? -s-, s- ? s-

h ? h, -h ? -h, -h- ? -h-, h- ? h-

d X d, -d Y -d, lA BB l?a, -lA CC -l?a

1400 h- ? 1400 1400 h-

Digit sequences are written in the natural order:

1234567890 1234567890 1234567890

3The ligature otherwise produced automatically looks ugly and has been 
broken by |.

APPENDIX F. CODING EXAMPLES FOR ARABIC 44

Ligatures are generated automatically; they can be suppressed by |:

'al-'islAmu fi?CffC >=? BflffB
fffl@ -al--isl?amu;

'al-^gAru fiP Affe?>=' fffl@ -al-<=g?aru, 'al|^gAru fiP A ffb>=&?
fffl@ -al<=g?aru;

_tumma ffffl"?fi?' t?umma, _tu|mma ffffl? & fi?K t?umma;

mu.hammaduN <=Yffffl"ffm fi? muh.ammadun, mu|.ha|mmaduN <=Y ffffl? & 
ffj& fi? muh.ammadun.

Abbreviations and emphasis are indicated by \emphasize:

\emphasize .sl`m "??? s.l ,m

\emphasize ab||^g `'H. @ ab<=g

If necessary, use grouping by curly braces:

\emphasize {`alayhi as-salAmu} ?CC ffl?? @ ?J??? ,alayhi 's-sal?amu

Appendix G

Coding examples for

Persian1

The short vowels ? (>=a), e (>=?), o (>=u) are denoted by the lowercase 
letters a, e or i, o or u:

bar >=QffK. bar, beh >=?K.ff beh, bon >=?ß fiK. bon.

The long vowels a (?a), i (??, ?e), u (?u, ?o) are denoted by the 
capital letters A, I or E, U or O. Älef m?dde is automatically generated 
for word-initial a:

Ab >=H.
ffi@ ?ab, bAd >=X AffK. b?ad, bId >=YJ?K.ff b??d, bUd >=X?fiK. b?ud.

Note that I yields a ya-ye m?`ruf (with zir), whilst E yields a ya-ye 
m?jhul (without zir). Similarly, U yields a waw-e m?`ruf (with pi<=s), 
whilst O yields a waw-e m?jhul (without pi<=s):

tIr >=QÖ??Kff t??r, tE.g >=??J??K t?e_g; dUr >=P ? fiX d?ur, zOr >=P ?R 
z?or.

The diphthongs ?e? and öu are written ay and aw:

pay >=?ffG? pay, naw >=?ff?K naw.

Intervocalic h.?mze is written ':

pA'Iz >=SÖ? flKff AffK? p?a-??z; miyA'I ?flGff AffJ??ff miy?a-??, mIgU'I 
?flGff ? fi?J??ff m??g?u-??;

1We gratefully acknowledge the voluntary help by Ivan Derzhanski who 
wrote this chapter, and implemented the language-specific processing. As 
we extensively modified his routines during system integration, all 
responsibility for any remaining, or new, errors rests with us.

45


APPENDIX G. CODING EXAMPLES FOR PERSIAN 46

tawAnA'I ?flGffAff?K @ ff?ff?K taw?an?a-??, zanA^sU'I ?flGff ? fi? Aff?K 
ffR zan?a<=s?u-??.

Silent word-final waw is generated by _U or O:

t_U ?fi?K tu, d_U ? fiX du; tO ??K t?o, dO ?X d?o.

Waw-e m?`dul is written w; it is omitted in the transliteration and the 
preceding xe receives no j?zm:

_hwAb >=H. @ ff?s h>=?ab, _hwI^s >=?Ä? ?ffs h>=??<=s, _hwod >=X fi?s 
h>=od.

Ha-ye h?ww?z-e m?xfi is generated by H, or optionally by ,e, ,a or ,A. 
It does not receive a j?zm even in fully vocalised mode and is not 
joined to a following letter:

_hAneH ? ?Kff A ff?g h>=?aneh, ^c,e ?{ff <=ceh, naH ?ff?K nah,

yal_aH >=? ?? ffK? yal?ah, yal,A >=? ?? ffK? yal?ah

_hAneHhA A ff?? ?Kff A ff?g h>=?anehh?a, _hAneH-hA A ff?' ? ?Kff A ff?g 
h>=?aneh-h?a.

Short ed.afe is written -e or -i:

ketAb-e U ?fi@ H.ffAff?J?ff ket?ab-e ?u, rAh-e t_U ?fi?K ?ff @ ffP 
r?ah-e tu,

nAmeH-i man >=?ß ff? fl?ff?ff Aff?K n?ameh-i man,

bInI-e An mard >=X >=Q ff? >=??ffi@ fl?ff
??ffJ?K.ff b??n??-e ?an mard,

pA-i In zan >=?? ffR >=?ßK? @ff ?ffAffK? p?a-' ??n zan,

bAzU-i In zan >=?? ffR >=?ßK? @ff ?ff? fiR AffK. b?az?u-' ??n zan.

Long ed.afe is written -_i:

dAr-_i man >=?ß ff? P? @ ffX d?ar-?? man, _hU-_i t_U ?fi?K ??? fis 
h>=?u-?? tu.

H.?mze as ya-ye w?h.d?t/nesb?t/xet.ab is likewise written -_i:

nAmeH-_i fl???ff Aff?K n?ameh-??, sormeH-_i fl???ff >=Q?fiÖ sormeh-??,

gofteH-_i fl???Jff >=?fi fi? gofteh-??.

Ye-ye w?h.d?t is written -I or -E:

ketAb-I ?G.ff Aff?J?ff ket?ab-??, rAh-I ??ff @ ffP r?ah-??, nAmeH-I ? 
@ff '??ffAff?K n?ameh-??;

dAnA-I ?flGff Aff?K @ ffX d?an?a-??, pArU-I ?flGff ? fiP AffK? 
p?ar?u-??;

dAnA-I-keH ??ffJ?flKff Aff?K @ ffX d?an?a-??-keh, pArU-I-keH ??ffJ?flKff 
? fiP AffK? p?ar?u-??-keh.

APPENDIX G. CODING EXAMPLES FOR PERSIAN 47

The present tense forms of the verb bud?n and the pronominal clitics are 
written as they are spoken:

rafteH-am >=? ff@' ??Jff
>=?fl ffP rafteh-am, rafteH-Im >="?'? @ff '??Jff
>=?fl ffP rafteh-??m,

rafteH-I ? @ff' ??Jff
>=?fl ffP rafteh-??, rafteH-Id >=YK? @ff '??Jff
>=?fl ffP rafteh-??d,

rafteH-ast >=?I >=? ff@' ??Jff
>=?fl ffP rafteh-ast, rafteH-and >=Y>=?K ff@ '??Jff
>=?fl ffP rafteh-and;

mard-Id >=YK? Xff >=Q ff? mard-??d, asb-etAn >=?? Aff?J?.ff >=? ff@ 
asb-et?an;

An^gA-st >=?I >=? Affe?>=?'ffi@ ?an<=g?a-st, U-st >=?I >=??fi@ ?u-st, 
t_U-st >=?I >=??fi?K tu-st;

ketAb-I-st >=?I >=???K.ff Aff?J?ff ket?ab-??-st, nAmeH-I-st >=?I >=?Ä? 
@ff' ??ffAff?K n?ameh-??-st.

The preposition be- can be written with or without a hyphen:

be-man >=?ß ffÜß.ff be-man, be-t_U ?fi?JK.ff be-tu;

be-An >=??ffiAK.ff be-?an, be-In >=?ßK? AffK.ff be-??n, beU ?fiAK.ff 
be?u.

The components of compounds can be separated by || or "|:

.sA.heb||_hAneH ? ?Kff A ffs'I.kffA ff? s.?ah.ebh>=?aneh,

ta_ht-e-"|_hwAb >=H. @ ff?s' ?Iff
>=u?ff?' tah>=t-e-h>=?ab;

pas||andAz >=R @ ffY>=?K ff@' üffÄ? pasand?az, naw||AmUz >=R ? fi?ffi@' 
?ff?K naw?am?uz,

bI||_hwod >=X fi?s'?G.ff b??h>=od.


Appendix H

Alternate input encodings

H.1 ASMO 449 = ISO 9036

The file asmo449.sty contains a reading module for the ASMO 449 code 
(identical to ISO 9036). It is installed by the LaTEX option asmo449 or 
by \input asmo449.sty. The module is activated by \setcode {asmo449} or 
\setcode {iso9036}; all following Arabic text will be considered to be 
coded according to the ASMO 449 standard. The ArabTEX notation may be 
reactivated by \setcode {arabtex}.

ASMO 449 (see Table H.1) is a 7-bit code, differing from ASCII (ISO 646) 
mainly by replacing the letters by the Arabic letter characters and 
diacritical marks; the Arabic digits share their positions with the 
ASCII digits. The positions of special and control characters in both 
codes are identical.

Aminimal driver file for processing, e.g. a file asmotext.dat, could be 
structured as follows:

\documentstyle [arabtex,asmo449]{article}
\begin {document}
\setcode {asmo449}
\begin {arabtext}
\input asmotext.dat

% the preceding blank line is required if "asmotext.dat" did not % end 
with a blank line itself; this is strange and embarrassing \end 
{arabtext}
\end {document}

48


APPENDIX H. ALTERNATE INPUT ENCODINGS 49

0 1 2 3 4 5 6 7

00 NUL DLE SP 0 @ ?X ?&?? ?ff

01 SOH DC1 ! 1 Z P ? ffl?

02 STX DC2 " 2 ffi@ R ? >=?

03 ETX DC3 # 3 fl@ Ä ?

04 EOT DC4 $ 4 fl? ? ?

05 ENQ NAK % 5 @fl ? ?

06 ACK SYN & 6 flK ? ??

07 BEL ETB ' 7 @ ? ?

08 BS CAN ) 8 H. ? ?

09 HT EM ( 9 ?? ? ?

10 LF SUB ? : ?H ?? ??

11 VT ESC + ; ?H ] ?? }

12 FF IS4 , > ` \ <=? |

13 CR IS3 ? = h [ ?? {

14 SO IS2 . < p ^ ff? ~

15 SI IS1 / ? X _ fi? DEL

Table H.1: ASMO 449 code table


APPENDIX H. ALTERNATE INPUT ENCODINGS 50

As texts coded in ASMO 449 are always rendered verbatim the commands 
\novocalize, \vocalize, \fullvocalize and the language selection 
commands \setarab etc. make no sense and are temporarily disabled.

Texts in ASMO 449 are usually not fully vowelized. Thus the 
transliteration cannot be expected to be correct. This is especially 
true for Egyptian texts which commonly do not differentiate between y?a' 
and 'alif maqs.?ura.

H.2 ASMO 449E = ISO 8859 - 6

The file iso88596.sty contains a reading module for the ISO 8859-6 code 
(extended ASMO 449 = ASMO 449E). It is installed by the LaTEX option 
iso88596 or by \input iso88596.sty. The module is activated by \setcode 
{iso8859-6}; all following Arabic text will be considered to be coded 
according to the ISO 8859-6 standard. The ArabTEX notation may be 
reactivated by \setcode {arabtex}.

ISO 8859-6 (see Table H.2) is an 8-bit code closely related both to 
7-bit ASCII and to ASMO 449; whereas the lower 128 positions are 
identical to ASCII (ISO 646), the upper 128 positions contain the Arabic 
characters of ASMO 449 in the analogous places, plus a few additional 
graphic and control characters.

We exploit the close relationship of these codes by reusing the ASMO 449 
reading routines, after suitable modification of the input. This only 
works correctly if the input text does not contain genuine ASCII 
letters, as we project the Arabic characters onto their locations in 
ASMO 449. Some of the code switching messages in the log file are 
spurious; do not worry.

The notes on vowelization and transliteration of ASMO 449 apply also.

The driver file indicated for ASMO 449 will be usable after the obvious 
modifications; however, your TEX installation must be capable of 
processing 8-bit data input. This is nowadays usually the case; 
otherwise you can try to locally find some utility program that will 
strip the highest order bit off the characters in your file, and process 
the result via ASMO 449.


APPENDIX H. ALTERNATE INPUT ENCODINGS 51

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15

00 NULDLE SP 0 0 @ P ` p NBSP ?X ?&?? ?ff

01 SOH DC1 ! 1 1 A Q a q Z P ? ffl?

02 STX DC2 " 2 2 B R b r ffi@ R ? >=?

03 ETXDC3 # 3 3 C S c s fl@ Ä ?

04 EOTDC4 $ 4 4 D T d t b ........ fl? ? ?

05 ENQNAK % 5 5 E U e u @fl ? ?

06 ACKSYN & 6 6 F V f v flK ? ??

07 BEL ETB ' 7 7 G W g w @ ? ?

08 BS CAN ) 8 8 H X h x H. ? ?

09 HT EM ( 9 9 I Y i y ?? ? ?

10 LF SUB ? : J Z j z ?H ?? ??

11 VT ESC + ; K ] k } ; ?H ??

12 FF IS4 , > L \ l | , ` <=?

13 CR IS3 ? = M [ m { SHY h ??

14 SO IS2 . < N ^ n ~ p ff?

15 SI IS1 / ? O _ o ? X fi? DEL

Table H.2: ISO 8859-6 code table


Appendix I

Miscellaneous utilities

The following packages are not part of ArabTEX proper, and are not 
supported in any way, but are distributed along with ArabTEX as possibly 
a convenience to the users. There is no warranty whatsoever.

I.1 twoblks.sty

This LaTEX option will define a command \twoblocks {#1}{#2} which will 
place the two parameters #1 and #2, usually two paragraphs, into two 
boxes side by side, separated by space of length \colsep. If necessary, 
the resulting boxes will be split across a page boundary.

This feature is useful if two versions of a text are to be compared. 
They may be in different languages, and one of them might be in Arabic 
(if enclosed in \begin {arabtext} ? ? ? \end {arabtext}).

This sentence has been written
twice: in the English language and
in the Arabic language.

??ff ff??
fiffl? ??A K.ff : ?ßff >=Ö? ff?K fffflQ ff? fi?? ff? >=? fie?>='?@ ?ff 
?Yff ?? >=?I ff?. ?Jff
fi?

. ??ff fffflJ? K.ff ffQ ff? >=??@ ??ff ff??
fiffl? ??A K.ff ff? ??ff fffflK? SffÖ? ?ff eff?>=?' BflffB?@

Otherwise this command does not depend on ArabTEX in any way, and indeed 
originated in a completely different context.

Beware that the two \blocks" should each not contain much more than one, 
not too long, paragraph of text, otherwise TEX's main storage might 
overflow. There must be no \verbatim text inside the parameters of 
\twoblocks, nor any \catcode changes; and all TEX groups and \if ? ? ? 
\fi sequences must be properly nested.

52


APPENDIX I. MISCELLANEOUS UTILITIES 53

I.2 abjad.sty

This file, loaded as a LaTEX option, will define a command \abjad {#1} 
usable inside and outside of an Arabic context. It profited greatly from 
suggestions by Dr. Benno van Dalen (Utrecht University).

The command \abjad {#1} will convert its argument, which has to be a 
legal representation of a number between 1 and 1999, to the Arabic 
'ab<=gad notation used in some mediaeval manuscripts. The result of the 
conversion will not look perfect, and the legal 'ab<=gad number 0 can 
presently not be generated.

Improving this routine needs a font revision, which is hard and tedious; 
whenever this happens, the command might well become part of ArabTEX 
proper.

I.3 MLS2ARAB

This is an UNIX SED script, written by Prof. Nicholas Heer (University 
of Washington), and released for free distribution. It will (almost) 
convert an ASCII file of Arabic text, produced by Multi-Lingual Scholar, 
to the ArabTEX input notation. The conversion is not perfect so some 
manual corrections might be necessary.

For operating instructions, see the file itself.


INDEX 54

Index

" (quoting), 15
"|, 14, 15
$, 7
--, 15
\ , 8
\\, 8
\abjad, 53
\arabfalse, 19
\arabtrue, 19
\begin{arabtext}, 6, 34
\begin{setcode}, 18
\bigskip, 8
\centerline, 9
\colsep, 52
\doassign, 9
\docommand, 9
\emphasize, 8
\end{arabtext}, 6, 34
\end{setcode}, 18
\footnote, 8
\fullvocalize, 13, 15, 16, 34
\hfill, 8
\hspace, 8
\indent, 8
\input, 8, 18
\input arabtex.tex, 5
\input atrans.sty, 20
\input etrans.sty, 20
\ligsfalse, 16
\ligstrue, 16
\magnification, 5
\marginpar, 8
\mbox, 8
\medskip, 8
\newhamza, 41

\newtanwin, 17, 40
\noindent, 8
\nospace, 8
\novocalize, 15{17
\oldarabtex, 35
\oldhamza, 41
\oldtanwin, 17, 40
\par, 6, 8
\pnash, 5
\pnashbf, 5
\quiet, 26
\setarab, 6, 10, 13, 21
\setcode, 18
\setcode{arabtex}, 18, 48, 50
\setcode{asmo449}, 48
\setcode{iso8859-6}, 50
\setcode{iso9036}, 48
\setfarsi, 10, 21
\setmaghribi, 10, 21
\setnash, 8, 11
\setnashbf, 8, 11
\setnastaliq, 8, 11
\setnone, 10, 21
\setpashto, 10, 21, 23
\seturdu, 10, 21, 23
\setverb, 10, 21, 24
\showfalse, 27
\showtrue, 27
\smallskip, 8
\space, 8
\spreadbox, 8
\spreadfalse, 25
\spreadline, 9
\spreadtrue, 25
\tracingarab, 26


INDEX 55

\transfalse, 19
\transtrue, 19
\twoblocks, 52
\vocalize, 15, 16, 34
\yahdots, 25
\yahnodots, 25
>, 10, 21
\|, 8
|, 14{16
|B, 15
|BB, 15
||, 14, 15

` (`ayn), 14
' (hamza), 13

A, 12, 17, 38
'A, 14, 16, 41
,A, 46
_A, 12, 17, 38
~A, 14
,a, 22, 23, 46
_a, 12, 16, 39
a (fath.a), 12, 38
aa, 12, 38
abbreviation, 44
abjad.sty, 53
'ab<=gad numbers, 53
Afghanic, 22
`ayn, 14
al-, 14, 19
'alif , 17
dagger, 12, 16, 39
initial, 17
maqs.?ura, 12, 17, 38, 40
silent, 17, 40
Qur'an, 16, 39
silent, 17, 19, 38{40
small, 16, 39
below, 16, 39
'Allah (spelling), 43
aN, 13, 17, 39
aN_A, 17, 40
aNA, 13, 17, 39

aNY, 40
Arabic context, 6, 7
Arabic environment, 6
Arabic group, 7
Arabic item, 6
Arabic number, 7
Arabic quotation, 6
Arabic quotes, 7
Arabic word, 7
arabtex.tex, 5
ArabTEX commands, 7, 8
archaic text, 25
ASCII, 48, 50
ASMO 449, 18, 48, 50
aspirated consonant, 22
assignment, 9
assimilation, 14, 16, 19, 41
automatic stretching, 25
aW, 38
aw, 45
aWA, 38
ay, 45

B, 15
be-, 47
boxing commands, 8
breaking connections, 15

code
7-bit, 48
8-bit, 50
arabtex, 18
ASCII, 48, 50
ASMO 449, 18, 48, 50
ISO 646, 48, 50
ISO 8859-6, 18, 50
ISO 9036, 18, 48
coding conventions, 12, 34
commands
ArabTEX, 7, 8
boxing, 8
illegal, 9
internal, 5
LaTEX, 7


INDEX 56

overview, 9
size changing, 5, 8, 11
TEX, 7
user defined, 5, 9
compounds, 47
copyright, 0, 32

dagger 'alif , 12, 16
d.amma, 12, 15, 16
inverted, 16, 22, 39
Dari, 21
date, 15
default font, 5, 11
defective writing, 12, 16, 39
definite article, 14, 19, 41
Derzhanski, Ivan, 45
diacritics, 16
diphthongs, 45
display mode, 7
dots on y?a' , 22, 25

E, 21
-E, 22
,e, 22, 23, 46
-e, 22
EDMAC, 27
emphasis, 44
environment
Arabic, 6, 18
arabtext, 6, 18
setcode, 18
tabbing, 6

Farsi, 21
fath.a, 12, 15, 16
Fischer, Wolfdietrich, 38
font
bold, 11
default, 5, 11
installation, 33
nash10, 34
nash14, 32{34
nash14bf, 33
naskh, 11, 32, 33

nasta`liq, 22, 32
selection, 11
unavailable, 11

grouping, 7, 44

H, 21{23, 46
h-, 15
hamza, 13, 15, 22, 40, 45, 46
carrier, 17, 40
old style, 41
h.arak?at , 12, 15, 16, 38, 45
on tatw??l , 15
Heer, Nicholas, 53
hyphen, 15, 43

I, 12, 38
-I, 22
~I, 14
-i, 22
_i, 12, 16, 39
i (kasra), 12, 38
implementation
Mac, 32
PC, 32
U*IX, 32
iN, 13, 39
input switching, 18
insertion
mathematical, 7
non-Arabic, 7
Roman, 7
installation, 33
internal commands, 5
inverted d.amma, 16, 22
invisible consonant, 14
ISO 646, 48, 50
ISO 8859-6, 50
ISO 9036, 48
iy, 12, 38
iz.?afet , 15, 22, 23, 46

ka<=s??da, 15, 25, 43
kasra, 12, 15, 16


INDEX 57

Kurdish, 21

la-, 43
language selection, 10
LaTEX commands, 7
li-, 43
ligature, 16, 34, 44
breaking, 14{16, 44
lists, 9
long vowels, 12, 16

Macintosh, 32
madda, 14, 16, 45
Maghribi, 23
mathematical insertion, 7
METAFONT, 33
MLS2ARAB, 53
Multi-Lingual Scholar, 53

N, 15, 16, 19
naskh, 11, 32, 33
nasta`liq, 22, 32
nesting, 7, 9
NFSS, 33
NFSS2, 33
non-Arabic insertion, 7
NU, 13, 39
numbers, 43
'ab<=gad , 53

O, 21, 46
option
abjad, 53
arabtex, 5
asmo449, 18
atrans, 20
etrans, 20
iso88596, 18
nashbf, 11
nastaliq, 11
oldarabtex, 35
twoblks, 52
Ottoman, 21

Pashto, 22, 23

PC implementation, 32
Persian, 21
Persian copula, 22
pi<=s, 45
punctuation, 6

quotation
Arabic, 6
non-Arabic, 7
Roman, 7
quoting, 13, 15, 16
Qur'an 'alif , 16

reading module, 18
Roman insertion, 7

<=sadda, 14, 16, 41
on tatweel, 15
short vowels, 12
silent 'alif , 17, 19
size changing, 5, 8, 11
special codings, 25
stretching, 8, 15, 25
automatic, 25
suk?un, 15, 16, 22, 34, 46
on l?am, 14
on tatw??l , 15
sun letter, 14

T, 40
tabbing environment, 6
t?a' marbut.a, 40
tanw??n, 13, 15{17, 19, 39, 40
fath.a, 40
on tatw??l , 15
ta<=sd??d , 14, 16
tatw??l , 15, 43
TEX commands, 7
TEX hash size, 5, 33
text
archaic, 25
erroneous, 25
TEX-XET, 33
transliteration, 12, 19, 34


INDEX 58

Encyclopedia of Islam, 20
ZDMG, 19
twoblks.sty, 52

U, 12, 19, 38
_U, 39, 46
~U, 14
_u, 12, 16, 39
u (d.amma), 12, 38
U*IX implementation, 32
UA, 17, 38
uN, 13, 39
unavailable font, 11
Urdu, 22, 23
user defined commands, 5, 9
uw, 12, 38

van Dalen, Benno, 53
verbatim, 17
vowel marks, 16
vowels
long, 12, 16, 38, 45
short, 12, 38, 45

W, 19
WA, 17
was.la, 15, 16, 19, 34, 42

Y, 12, 38
y?a'
dots, 22, 25
y?a'-i-wah.dat , 22, 23, 46

z??r , 45
zwarakay , 22