29.9.11

Data and processes in computing: Summary

Summary

1. We discussed forms of data and processes relevant to an electronic till in a supermarket. In particular, we introduced the idea of a sequence of data items.

2. A number of fundamental forms of data were introduced. We distinguished two types of number: integers (positive or negative whole numbers, or 0), and real numbers (thought of as decimal numbers and approximated in computers as floating point numbers). Characters may be thought of as symbols that may be entered from a computer keyboard by a single keystroke. Each character is associated with an integer code and we introduced one such encoding called the ASCII code. The Boolean values true and false form another fundamental form of data.

Data may be structured in a collection. Different forms of collection are possible. We looked at two: sets and sequences. In a sequence, the order in which the items appear in the collection is important and an item may appear more than once. In a set, one is only interested in the different items appearing in the collection, and the order in which these items are listed is of no significance, nor is repetition of an item.

5 Operations and comparisons

Seeing processes as functions

Addition of numbers is a process that one would expect a computer to be able to perform. Now we write the result of adding the numbers 5 and 2 as 5 + 2, for example. The symbol +, which represents the process of addition, appears between the two numbers being added. This is known as infix notation. Infix notation may be used for processes that combine two data items of the same type. Addition, subtraction and multiplication of numbers are familiar examples. We also use infix notation when writing a comparison of the size of two numbers, such as 5 < 9.
In this section, we shall show that a process such as addition of numbers is a function. Perhaps more surprisingly, a process such as < can also be seen as a function. This perception is a necessary preliminary to seeing how such a process can be implemented by a computer.

4 Processes

Processes that can be applied to data

Having looked at some forms of data, we now turn our attention to processes that can be applied to data. Each process that we consider in this section will input data of a specified form, and will result in a corresponding value. For example, one process, which we will call ASC, takes a character as input, and has as its resulting value the integer giving the ASCII code of the input character (as listed in Table 2). Another process, which we will call SIZE, takes a sequence as input, and has as its resulting value the length of the sequence (which will be an integer).
It is important to distinguish between a description of the outcome required when a process is applied to a form of data, and a description of the exact steps to be taken to achieve the desired outcome. Here, we are concerned only with the first of these; that is, with providing an overview of a computing process. You might like to think of this as a “black box” view of the process. We do not, at this stage, care how one obtains the output value.

Objectives for Section 3

After studying this section you should be able to do the following.

· Recognise and use the terminology: disjoint union; power set (of a set); representation (of a data abstraction).

· Use and interpret the notation:

o X

Y for the disjoint union of the sets X and Y ;

o SeqOfSetOfInt for the set consisting of all sequences whose members are sets of integers (and similar notations).

Exercises on Section 3

Exercise 7

Each of (a)–(c) is a member of one of the sets given in (i)–(iii). Say which item comes from which set.
Sets: (i) SetOfSeqOfChar. (ii) SeqOfSetOfChar. (iii) SeqOfSeqOfChar.
(a) {“error1”, “error2”, “error3”}.
(b) [“error1”, “error2”, “error3”].
(c) [{‘e’,‘1’}, {‘T’}, {‘q’,‘w’,‘e’,‘r’,‘t’,‘y’}].

Answer

3.4 Representing data in applications

Suppose that you are designing software for some application. You will be working with a programming language that enables you to communicate instructions to a computer. In this programming language, certain forms of data will already be represented electronically. These will include common forms of data, such as numbers, characters and sequences. In any particular application, you are likely also to be concerned with forms of data that are peculiar to that application. Having identified some form of data that you need to be able to handle, you will then need to represent this in terms of what is available; that is, in terms of forms of data that have already been represented electronically.
As a simple example, imagine that integers, characters and strings are available forms of data, and that you want to represent the days of the week.

3.3 Mixing different forms of data: disjoint union of sets

At the supermarket checkout, some items need to be weighed (organic courgettes for example) and some do not. Let BarcodedItems be the set of items that do not need to be weighed, and WeighedItems be the set of items that must be weighed. When a weighed item is recorded at the till, we must record both the item type and the weight of the item that has been purchased. Earlier, we saw that such a purchase can be seen as an ordered pair, such as (“WALNUTS”, 335), that comes from the set WeighedItems × Int.
Suppose now that we want to form the set of all items that might appear in a transaction at a till. We might call this set TillItems. Specifying this set TillItems poses a complication, since there are two different types of element that might appear in it. An item from the set TillItems will come either from BarcodedItems or from WeighedItems × Int. We express this relationship by saying that TillItems is the disjoint union of BarcodedItems and WeighedItems × Int. We write this as:
You can read X

Y as “X or Y.”

3.2 Combining data structures

In Section 2, we introduced the notation SeqOfX for the set of all sequences whose members come from the set X. In Section 2, we looked only at sequences whose members were of one of the primitive forms of data (integers, characters or Booleans). We can have sequences whose members are themselves data with a more complicated form. For example, suppose that Jo is working at the till T1 and is replaced by Jessica. We might represent this handover by the 3-tuple (Jo, T1, Jessica). Now suppose that we want to give all the handovers that occur during a particular day, in the order in which they occur. We could give this information in a sequence. This sequence would come from the set SeqOfX , where X is the Cartesian product Staff × Tills × Staff.
As another example, one might think of a sentence as a sequence of words, where each word is seen as a sequence of characters. If we did this, then the sentence would be regarded as coming from the set SeqOfSeqOfChar. We can use notations introduced earlier to show when we want to see a sentence in this way, and when it is to be regarded as a single string (from SeqOfChar).

3 Combining forms of data

Structuring data

In Section 2, we considered integers, characters and truth values. We shall refer to these as primitive forms of data. We also looked at two forms of data collection, sets and sequences, and at the association of different data items in a tuple. In this section, we will look briefly at some other ways in which data may be structured.

3.1 Sets of sets

In Section 2, all the sets and sequences we considered had primitive forms of data as their elements. However, sets and sequences may contain non-primitive forms of data. Let us look first at a situation in which we may find it useful to have a set whose members are themselves sets.
Think again about a shop with just three members of staff, given in the set Staff = {Jo, Jessica, Wesley}. Now let at WorkStaff be the set of staff currently at work. Clearly, at WorkStaff may take a range of values. If just Jo and Wesley are at work, then at WorkStaff is {Jo, Wesley}. If Jessica were to start work and Jo were to leave, then at WorkStaff becomes {Jessica, Wesley}.
Now any combination of staff from the set Staff = {Jo, Jessica, Wesley} might be at work at the same time. Possibilities include all these staff (when at WorkStaff is {Jo, Jessica, Wesley}), and none of them (in which case at WorkStaff is { }). The full list of possibilities (giving the possible values of at WorkStaff ) is given below.

2.5 Sequences

You have already met sequences briefly, and have seen that a sequence contains items given in a particular order, and that repetitions are of significance.
One might have a sequence containing integers, such as [22, −31, 44, 0, 2, 0, 11] or a sequence containing characters, such as [‘W’,‘o’,‘r’,‘d’]. However, we will aim to avoid mixing the forms of data in a sequence. A sequence of characters may also be referred to as a string, and that “Word” is another notation for the sequence [‘W’,‘o’,‘r’,‘d’].
The items in a sequence are enumerated from the left. Thus in the sequence [‘W’,‘o’,‘r’,‘d’], the first item is ‘W’, the second item is ‘o’, and so on. The items in a sequence may also be referred to as the elements (or sometimes members) of the sequence. The number of items in a sequence is called its length. So, for example, the length of the sequence [‘W’,‘o’,‘r’,‘d’] is 4. An empty sequence, [ ], has no members and so has length 0.
We do count repeated items when calculating the length of a sequence. So, for example, the sequence of numbers [22, −31, 44, 0, 2, 0, 11] has length 7, while the sequence of characters “aardvark” has length 8.

2.4 Sets

A set is a collection of items, and is a collection of a particular form. The items appearing in a set are referred to as the elements (or members) of the set. Examples of sets mentioned earlier are: Int, Char and Bool.
A set is a collection in which repetition is not significant, nor is the order in which the items are given. For example, the supermarket might sell its own brand of Wheat Flakes in three sizes: large, medium and small. In a situation where we are interested in the types of product that are available, we are interested in the set of sizes. This set of sizes may be written as:
{large, medium, small}.
We might choose to name the set:
Sizes = {large, medium, small}.
The order in which the elements are listed is of no significance, so we might equally well have written Sizes = {small, medium, large}.
Notice the use of curly brackets {and} here. These are used as a signal that the collection is a set (as distinct from some other form of collection, such as a sequence). Note too the commas used to separate the elements.

2.3 Truth values

We will want to distinguish between statements that are true and statements that are false. Another fundamental form of data allows us to do this. This form of data consists of just two values, which we shall write as true and false.
Not all texts use the same notation: some use T and F; others may use 0 for false and 1 for true (or the reverse!).
We may refer to true and false as truth values, or Boolean values. We will denote the collection (set) of truth values as Bool, after the mathematician George Boole. We write:
Bool = {true, false}.
This shows the collection Bool as a set. We have already mentioned the word set in passing, and now want to look at this concept in more detail.

Data and processes in computing: 2 Forms of data: 2.2 Characters

2.2 Characters

Characters are another fundamental form of data. Computers store characters as integers, and system hardware and software translate these integer codes so that monitors and printers can display them.
As well as the familiar characters appearing on a keyboard, the current international standard (UNICODE) includes codes for characters from a variety of languages and alphabets (such as ê and ö). For simplicity, examples in this unit will use only a part of this code, as given in Table 2. This is confined to printable symbols that appear on a standard computer keyboard (for an English-speaking user).
We will denote by Char the set of characters appearing in Table 2. We will call the codes in the table the ASCII codes of the characters.

2 Forms of data

2.1 Numbers

The supermarket example discussed in Section 1 involves various forms of data that a computer may need to handle. Some of these, such as numbers and characters, are simple but fundamental. Other forms of data, such as sequences, involve more complicated structure. In this section, we will introduce sets, which are a variety of data collection that is different from sequences. But first we will look more carefully at numbers and characters.
When developing software we need to distinguish between different sorts of numbers, not least because computers represent and process them differently. Whole numbers (positive, negative or zero) are called integers. We shall use Int to denote the collection (or set) of all integers. In principle, digital computers can represent integers exactly, no matter how large or small. In practice, however, most programming languages place restrictions on the size of an integer (positive or negative) that can be stored. Many texts use

instead of Int.

Objectives for Section 1

After studying this section you should be able to do the following.

· Recognise the terminology: character; string; integer; sequence; element (of a sequence); variable; identifier (of a variable); state (of a variable).

· Use and interpret the notational conventions:

o single inverted commas to show a character;

o double inverted commas to show a string (sequence of characters);

o ‘

’ to make explicit a space character;

o the brackets [ and ] to delimit a sequence.

1 At the supermarket

1.1 Number sequences

Imagine that you are at a modern supermarket and the cashier is processing the items from your basket or trolley. The electronic till (which is a form of computer) records each item that you have bought. Most items are scanned using a device that can read the barcode printed on the item.

Figure 1 An example of a barcode. Today, nearly every product has a barcode, although not all stores use them. A laser scanner is used to convert the pattern of light and dark bars into a number stored in the till. The number, printed as part of the barcode, may be entered manually if an electronic scanner is unavailable. A computerised till uses this code to look up the price that a shop wishes to charge for the item.

Key ideas

In this unit, we will take an introductory look at two key ideas: forms of data handled by a software system, and the processes that may be applied to that data. These ideas are illustrated by a particular application — a supermarket till — but they are of general relevance in designing software systems. Important terminology will be highlighted in bold.
In this unit we will look at some commonly occurring forms of data. We start with fundamental forms, such as numbers and characters (which are symbols that may be typed at a keyboard). We then go on to look at more complicated data structures.
A second crucial feature of data is the processes available to handle it. The processes needed in any particular application are a particular focus in software design. Here, we shall look at how in principle we can describe a process that manipulates data. This initial description will not be concerned with how the process is executed, but only with its effect.
This unit will introduce two important mathematical ideas that help in offering clear and precise descriptions of software components relating to data and processes. These ideas are set and function.
Section 1 gives a brief introduction to how data and processes may arise in an application situation. Section 2 is concerned with fundamental forms of data, and the mathematical idea of a set. Section 3 looks briefly at some more structured forms of data. In Section 4 we look at processes, and the mathematical idea of function. Section 5 is concerned with processes of a particular sort. Examples of these processes are the addition of numbers (as in 5 + 6) and a comparison of two numbers (as in 3 < 5).

Data and processes in computing: Introduction

Introduction
This unit provides an introduction to data and processes in software, and provides a basis that enables these fundamental ideas to be developed in a clear and precise way. It has two main aims. The first is to illustrate how we can describe ways in which data may be structured and processed. The second is to introduce you to some vocabulary and concepts that help us to do this. The material is accessible to anyone with a little experience of the use of symbols in presenting ideas.
Section 1 provides a brief introduction to the unit. It contains some new language that will be explained more fully in later sections. Read this section without spending too much time on it. The most important material in this unit is that in Sections 2 and 4. Section 3 includes some ideas that are relatively difficult. You should read this section, but do not spend a great deal of time on it. Section 5 is of a similar length to Sections 2 and 4.
Overall, do not allow yourself to spend too long on any section while you are studying this unit. You can always come back and reread material here if you find later that you need a more thorough understanding of some point.

Pages

29.9.11

Exercise 7

2.2 Characters