We do assume basic knowledge of set theory
Given two sets A and B. What does AxB mean? read : A cross B
What does A* mean? read : A Kleene Star
Foundations
In this post, we discuss the notion of an alphabet and a language.
What is an Alphabet?
English Alphabet
First lets go over the natural notion of an alphabet. So what do we know about the English alphabet?
The characters in the English alphabet are called letters.
Letters can be seen as having a “length of 1” For example; “ab” is not a letter because it’s length is two.
There are 26 letters in the alphabet.
Each letter has two representations, capital “A” and lowercase “a”
Concatenating letters together form a word.
There is a special character called a “space” which separates words.
A collection of words along with spaces in between them, form a sentence.
Alphabets in Computability
The notion of an alphabet that we will use in computability is quite similar. Lets use the numbered list above to contrast and compare.
The characters in the alphabet are called symbols. We will explicitly define the alphabet when it cannot be inferred from the context. This is similar to real life, in that many natural languages have explicitly defined their alphabet and the alphabet can usually be inferred from the language being spoken.
For example; If I spoke Chinese, you could infer that I am using the Chinese alphabet.
Symbols have a length of 1. This is similar to the English alphabet since letters always have a length of 1.
The number of symbols in our alphabet, will vary depending on the alphabet being used.
Symbols in the defined alphabet have one canonical representation. If a different representation was being used, this would be another symbol. For example; “A” and “a” are two different symbols in computability. In the English language however, “A” and “a” are the same letters.
A finite string of symbols form a word.
There is a special symbol called a “blank symbol” which separates words.
If there are blanks in between words, this is seen as the same word. There is no notion of a sentence here.
Example 1
We define the alphabet $\Sigma = \{a,b,c, \wedge \}$
Notice that we also defined our blank symbol $\wedge$ in the alphabet. Whereas in the English alphabet, the “space” is not included as a part of the alphabet. You will also see alphabets where the “blank symbol” is not defined. The difference is that this means it cannot be used. Succinctly; the only symbols we can use are those defined in the alphabet.
Lets list a few properties of the alphabet that we have just defined:
- Our alphabet has 4 symbols.
- We define the set of all possible words that can derived from our alphabet $\Sigma$ as $\Sigma^{*}$
- There is a special empty word which we denote $\epsilon$ of length 0, where $\epsilon \in \Sigma^{*}$.
The intuition behind the empty word, is that it can be formed by using no letters of a given alphabet. It is therefore a symbol in every alphabet, which can be created by using none of that alphabet’s letters. Note the similarity between the set theoretic notion of the empty set.
Example 2
We now define the alphabet that is predominantly used throughout computer science. $\Sigma = \{0,1\}$ .
$\Sigma^{*}$ is the set of all binary strings of any length.
The following elements are all in $\Sigma^{*}$ : $01,000,111,10101, \epsilon$
Example 3
We now define the alphabet that contains all of the lowercase letter of the English alphabet. $\Sigma = \{a,b,c,d,e,f…,z\}$
$\Sigma^{*}$ is the set of all combinations of these symbols.
The following elements are all in $\Sigma^{*}$ : a, ab, abcd, add, police, wpkl
Notice that not all elements in $\Sigma^{*}$ above formed dictionary recognised words.
“add” and “police” were recognisable. However, “abcd” was not.
To put succinctly, not all words in $\Sigma^{*}$ will be used in a language.
What is a Language?
A language defined over an alphabet $\Sigma$ is a subset of $\Sigma^{*}$
Another way to look a it is, that a language is the set of all values that return true from some predicate function P.
A predicate function is a function which takes input and returns true or false.
$L = \{w \in \Sigma^* | P(w) == true\}$
The above mathematical syntax roughly says: take a word from $\Sigma^*$ and give that word to the function P. If P returns true, then that word is a part of the set L. L is our language.
Notice: If the predicate function is changed, we form a new language from $\Sigma^*$
Examples of Predicates
- P is the predicate that returns true on all even numbers
- P is the predicate that returns true on all words that have a length < 2.
- P is the predicate that returns true on every word in the English dictionary. This function could then be used to form the English language!