What's new? | Help | Directory | Sign in
Google
                
Search
for
Updated Aug 08, 2008 by brian.thomas.will
PigeonForNewbies  
Pigeon for non-programmers

Pigeon is as simple a programming language as I believe possible. If you're concerned Pigeon is a waste of time because it is not a "real" language, don't be: Pigeon is consciously designed to be learned as an introduction to proper popular languages, such as Python, Ruby, C#, Java, C, etc. Think of Pigeon as the distilled essence of these languages but with the virtue of being fully learnable in 2-4 hours, not the usual 2-4 weeks (or more like the 2-4 months most learners experience with big languages like C#, C++, and Java). Everything you learn in Pigeon has a direct analog in all of these languages---the code just looks a bit different. Three qualities make Pigeon unacceptable for real use:

  • First, a major selling point of many languages is compact code, saving programmers the effort of reading and typing so much. This ideal is inappropriate for an educational language. At every juncture, the design of Pigeon chooses the conceptual simplicity of making everything explicit over the syntactical convenience of allowing some things to be left implicit. Consequently, given a program written in Python or Ruby, the same program written in Pigeon would likely be 50-100 percent more verbose. This cost is worth it: learners of Pigeon are not distracted from core concepts by inessential conveniences.
  • Second, real languages provide the means to have program code split across more than one file and the means for programs to be run by users. Pigeon deliberately chooses simplicity over accommodation: a Pigeon program's code lives in only one file, and the only way to execute a Pigeon program is from the Pigeon development program, for it is assumed that Pigeon will be used for nothing other than writing small, trivial programs to be executed by their authors.
  • Third, a real language provides the means to access and manipulate the full system: files, databases, sound devices, network devices, the graphics hardware, etc. Pigeon deliberately omits most of these capabilities. The only input/output provided for Pigeon programs is a text console, a place to spit out messages to the user and get back keyboard input (like a command prompt). Were you starting with a real language, these are the kinds of programs you would be stuck writing at first, anyway, for simplicity's sake. (You might assume a language omiting input/output capabilities can't be anything like a real language, but this is not so: real languages don't directly include input/output features but instead provide access to input/output through libraries; while learning to use such libraries is an important part of programming education, libraries are not part of languages, properly speaking.)

Static vs. Dynamic Languages

Programming languages can be divided into two categories: static and dynamic.

A static language requires the programmer to specify data type information everywhere data is used and passed around, making it possible for the compiler to detect type mismatches (i.e. whether the wrong kind of data is being sent from one place in code to another) and to generate faster code than would be possible were the compiler not to have the data type information. C, C++, Java, and C# are all popular static languages.

Pigeon---like Python, Ruby, Javascript, and Perl---is a dynamic language. In a dynamic language, the programmer omits this type information, leaving it to the language to detect type mismatches when the program is executed, not when it is compiled. On the downside, this means that type mismatches are not caught at compile time and that performance overhead is incured in checking for these mismatches at execution time. On the upside, leaving this information out makes code less verbose and the coding process more flexible.

Say I have a type error in my code, i.e. the wrong kind of data is used at some point in code:

So in a static language, the compiler will reject your code if it has type errors, and this is a good thing, but having to explicitly state data types throughout code can be bothersome and make your code inflexible. In a dynamic language, you have to worry about type errors lurking in your running code because the compiler can't point them out to you.

Modules

A file of Pigeon code is called a "module". Pigeon expects modules to end in the extension .pig and only allows you to save code with that extension. Modules must be saved and kept in the "modules" subdirectory of your Pigeon directory.

A Pigeon module consists of, in order:

While a module doesn't require globalvar statements, a module without function definitions would be pointless.

Comments

In programming, a comment is a piece of text ignored by the language. This is useful for leaving notes about the code. In Pigeon, all characters on a line after (and including) a # character are ignored e.g.:

This text is not ignored.    # This text (including the #) is ignored.

The Pigeon editor displays comments in green.

Values

A piece of data is often called a "value". There are three basic types of values in Pigeon, the same three types which are found in just about all languages:

Another special value, null, is used to represent nothing, as in, "There ain't nothing here." (The utility of null will be made clear, later.)

A value written in code is called a literal, e.g. -3.5 is a number literal, "moo" is a string literal, and true is a boolean literal.

In addition to these three types of data, there are two other kinds, lists and dictionaries, which we'll discuss later.

Escape Sequences

The whitespace characters 'tab' (the character inserted when you hit the tab key) and 'newline' (the character inserted when you hit enter) cannot be directly typed in string literals. Instead, these characters must be represented specially with what are called escape sequences. Tab is represented by the escape sequence \t and newline by \n. So the string "hello,\nworld" represents the text:

hello,
world

...because the two characters, \ followed by n, are treated specially to mean newline. The " character also must be represented by an escape sequence, \", so as to distinguish a " character meaning just " and one signaling the end of the string literal, e.g. "\"" represents the text:

"

Because \ is always interpreted as the start of an escape sequence, the \ character must itself be represented in string literals with the escape sequence \\, e.g. the string literal "red\\blue" represents the text:

red\blue

Operations

An operation is an action. Some operations take one or more input values, called the operands of the operation. For instance, addition is an operation performed upon number operands.

All operations return a value---e.g. an addition operation returns the sum of its operands---though some operations may just return the value null.

While operators in mathematics---and, in fact, in most programming languages---are almost always non-alphanumeric symbols (e.g. + - / * ^ &&), operators in Pigeon are all words, some abbreviated, e.g. 'add' for addition, 'sub' for subtraction.

In Pigeon, operations are written in prefix notation. In prefix notation, every operation is surrounded in parentheses with the operator placed first before any operands:

(add 3 -7 8)    # an addition operation that returns 4 because that is the sum of the operands 3, -7, and 8

Because an operation returns a value, it can itself be used as the operand to another operation as long as it returns the right kind of value:

(add 2 (sub 11 3))  # subtracting 3 from 11 returns 8, which is added to 2, returning 10

Note that the number, type(s), and ordering of the operands is particular to each operator. For instance, the add operator can take 2 or more numeric operands, and the order doesn't matter; the sub operator, however, always takes exactly 2 numeric operands, and it matters which operand comes first. (Because Pigeon is a dynamic language, the compiler will not tell you when you supply the wrong number or type of operands to an operator; this kind of mistake is only discovered when the operation is executed, at which point your program will be aborted, and Pigeon will notify you of the error.)

The couple dozen Pigeon operators are all explained here.

Variables and Assignment Statements

In programming, a variable is a location in memory used to hold a piece of data. This location is considered to be "variable" in the sense that, at various times in the course of the program, we might modify the data at that location. (Unlike in mathematics, a programming variable does not represent all possible values; its value might change, but at any moment in time, it only has one value.)

In code, we refer to a variable not by its numeric address but by some name we choose for that variable. While in mathematics, variable names are typically just single letters, programming languages allow (and favor) multi-letter variable names, for good practice says variables should be given meaningful names that reflect the use of the variable, not given arbitrary names like 'x' or 'y' (however, I will often use single-letter names in examples for simplicity).

The jargon term for a variable name in programming is identifier. The rules for identifiers in Pigeon (and in most languages) are as follows: 1) no whitespace 2) no non-alphanumeric characters 3) no starting with a numeral. So these identifiers are valid:

NewYork     foo     asdf43qwerty78

...but these are not:

New York     f^$o*o     43asdf

Identifiers in Pigeon (and most languages) are also case-sensitive, meaning that upper- and lower-case letters are considered different characters, so NewYork, newyork, newYORK, nEwYoRk, etc. are all considered wholly independent names, and so a variable named NewYork is a totally separate variable from one named newyork. (It would be stupidly confusing to have two names in a program that differ only by letter case, so don't do it.)

In Pigeon, an = (assign) statement is used either to create a variable and give it a value or to give a new value to an existing variable:

= foo 53 

(Note that = is a statement, not an operation, so it is not enclosed in parentheses.)

Here, a variable called 'foo' is assigned the value 53. (If no variable named 'foo' previously existed, it is first created.) 53 is the value 'foo' will have forever unless some later = statement gives 'foo' a new value.

You need not express the assigned value as a literal; instead, you can assign a variable the result of an operation:

= foo (add 8 9)   # assign 'foo' the value 17

Because a variable represents a value, it can be used as an operand:

= foo 3
(add foo 9)     # when this operation is reached, 'foo' has the value 3, so this returns 12
= foo 5
(add foo 9)    # but now 'foo' is 5, so this returns 14

Any kind of value can be assigned to a variable, not just number values:

= cat true  # assign 'cat' the boolean value true
= dog "Rubber baby buggy bumper"  # assign 'dog' the string "Rubber baby buggy bumper"

Input and Output

A computer system can usefully be thought of as the interaction between three main parts:

  1. The CPU, which executes the instructions of a program
  2. The system memory, where the instructions and data of each running program resides
  3. The I/O (input/output) devices, which constitute just about everything else: keyboards, mice, monitors, sound devices, storage drives of all kinds, networking devices, printers, etc.

A CPU could happily execute a program in memory without ever touching I/O, but the program would never do anything useful. Even the dullest number-crunching program eventually has to spit out some information to a monitor, printer, storage device, or network card, for otherwise its work would just be lost without anyone ever seeing the result. And if a computer didn't have an I/O system, there would be no place from which to load a program into memory in the first place.

As important as I/O is in real programs, modern programming languages don't include any direct means of manipulating I/O in their syntax. Instead, all I/O is done through functions (a topic discussed later). Pigeon follows this pattern but with two exceptions: the print and prompt operators.

The print operator displays its operands in Pigeon's output window, e.g.:

(print "hello " 35) 

...will display hello 35 in the output window.

The prompt operator takes no operand; when invoked, prompt causes a message box to display and ask the user to enter some text; if the user hits OK, prompt returns the entered text, but if the user hits CANCEL, prompt returns null.

While print and prompt are not exciting, they are simple means to create an interactive program.

Expressions

The term expression refers to anything which "evaluates" into a value. A literal evaluates into the value it represents, a variable evaluates into the value it points to at that moment in time, and an operation evaluates into the value which it returns.

Keywords

A keyword is any word given special significance in the language. Most of the keywords are operators and three are the values true, false, and null. The dozen remaining keywords each have their own particular meaning and syntax.

Branching with if

Often in code, we want to have a set of things happen only under certain conditions. We express this using an if statement, which takes this general form:

if <test>      
   <body>      

The test must be an expression that evaluates into a boolean value, e.g. true or (eq x y). The body is an indented list of one or more statements. When the if statement is reached, the test is evaluated, and if the boolean value is true, the statements of the body are executed; otherwise, the body statements are all skipped.

So, for example:

if (eq x 3)      # assume 'x' is some variable created and given a value before this 'if' is reached
   (print "cat ")
   (print "dog ") 
(print "bird")    # this statement is not indented under the 'if', so it is not part of the 'if'

Above, if the value of x is 3, then the test condition will evaluate to true, and so the two statements in the if will get executed, and so this section of code would create this output:

cat dog bird

If the value of x is not 3 when the if is reached, then the if body would be passed over, producing this output:

bird

Understand that the point of if statements is to change what code gets executed depending upon something which can only be known or decided at runtime. For instance, while it makes sense to branch based upon whether a user-entered value is less than 5, it makes no sense to branch if 3 is less than 5: we can't know what the user is going to do when the program is run, but we don't need to run our program to know that 3 is less than 5.

Sometimes you want to do one of two alternate things depending upon a single condition. You can arrange this with multiple if statements:

if (eq x 3)
   (print "hi")
if (not (eq x 3))
   (print "bye")

The second condition here is just the logical inverse of the first, so one of these if bodies is going to be executed, but never both: these two branches are mutually exclusive. Because this is a common pattern, there is a special way to express this more concisely:

if (eq x 3)
   (print "hi")
else
   (print "bye")

Here, the if has an else clause. The body of the else clause is executed only when the if body is skipped over. So, depending upon the test, either the if body or the else body is going to be executed, but never both and never neither.

Sometimes, we want our code to branch into two mutually exclusive branches but not have the second branch necessarily execute just because the first didn't:

if (eq x 3)
   (print "hi")
else
   if (eq x 5)
      (print "bye")

Again, this is very common, so we have a special syntax for it:

if (eq x 3)
   (print "hi")
elif (eq x 5)            # elif is short for 'else if'
   (print "bye")

The elif clause, like else, only executes when the if is skipped over, but elif has a condition that determines whether its body should execute too.

For branching between more than two mutually exclusive conditions, you can add more than one elif clauses to an if. This common construct is called an if-else ladder (or sometimes an else-if ladder):

if (eq x 3)
   (print "hi")
elif (eq x 5)
   (print "bye")
elif (eq x 7)
   (print "moo")
elif (eq x 8)
   (print "oink")
elif (eq x 14)
   (print "woof")

All of these clauses are mutually exclusive: if one clause gets executed, then all the remaining ones (including their conditions) are skipped over.

After the elif clause(s) of an if, you can use a regular else clause, which acts in effect like a default: if no other part of the ladder is executed, then the else body gets executed.

if (eq x 3)
   (print "hi")
elif (eq x 5)
   (print "bye")
elif (eq x 7)
   (print "moo")
else
   (print "meow")    # executed if all other clauses are skipped

Looping with while

A section of code that repeats is called a loop. Loops are expressed with the while statement, which actually works very similar to if, with the one difference that, after the while body is executed, the while condition is tested again, and if it is true, the body is executed again; this keeps happening until the while condition tests false:

= x 6
while (gt x 0)
   (print (string x))
   = x (sub x 1)   # decrement 'x' by 1

This will print out:

654321

Here's what happens:

  1. 'x' is given the value 6
  2. the while condition, (gt x 0), evaluates true
  3. the while body executes, printing the value of x and then decreasing its value by 1
  4. the condition evaluates true again
  5. the body executes again
  6. the condition evaluates true again
  7. ... and so on, until x is decremented to 0
  8. the condition evaluates false
  9. the body is skipped over and execution continues on

It's important that we arrange the loop condition so that it eventually tests false. Otherwise, the loop will run forever, which is generally not what we intend. The most common thing to do is to use a variable as a counter, as we've done above.

while has no analogue of if's else and elif clauses.

Be clear that, like with if, else, and elif, the body of a while loop can consist of any kind of statement, including if's and other while's.

break and continue

A break statement exits the current loop:

= x 6
while (gt x 0)
   (print x)
   if eq x 3
      break
   = x (sub x 1)   # decrement 'x' by 1

This will print out:

6543

A continue statement skips over the rest of the loop body to the next iteration:

= x 6
while (gt x 0)
   = x (sub x 1)   # decrement 'x' by 1
   if eq x 3
      continue
   (print x)

This will print out:

54210

Notice break and continue are executed conditionally; an unconditional break or continue would make everything below it in the loop pointless.

These statements only affect the loop they are directly in, not any outer loops:

while ...   # outer loop
   while ...   # inner loop
      ...
      if ...
         break    # break inner loop
      ...

While break and continue aren't necessary, they express some common logic more compactly than if we couldn't use them.

Functions

These are all basically interchangeable synonyms for the same thing. However, routine, sub-routine, and procedure are a bit anachronistic these days. The general term used today is function. (Method is frequently used too, but it has a particular connotation with object-oriented programming).

A function is a body of statements which can be invoked by name. Think of functions as operators which you the programmer define. For instance, if I define a function named 'foo' which takes two numeric operands, I call (invoke) 'foo' just like I would invoke an operator:

(foo 4 1)

A function is defined with a function statement, which has this general form:

function <name> <parameters>
   <body>

Function names must conform to the same rules as variable names.

A function definition is not a statement like an if or while is a statement, so you can't write a function inside an if or while body or inside another function; rather, a function definition must be placed at the "top level" of the module (meaning not inside anything else).

The parameters of a function are special variables that receive the operands: when the function is called, the arguments (operands) are "passed" to these variables. A function must be called with the same number of arguments as it has parameters. A function need not have any parameters, in which case it must be called with no arguments:

function eric      # define a function named 'eric' with no parameters
   (print "hello")

This function is called like so:

(eric)    # prints "hello"

The point of having parameters is to get varied behavior out of the same piece of code:

function brian x y   # define a function named 'brian', with parameters 'x' and 'y'
   (print (add x y))

This function's exact behavior depends upon the input we supply:

(brian -9 5)           # prints "-4"
(brian 2 6)            # prints "8"

While there is no limit on how many parameters you can define a function to take, seven or eight parameters is generally considered the upper limit of good sense. Most functions in your code should have zero to four parameters.

Many learners get the false impression that there is a special relationship between a function and a variable used as an argument to that function, but this is mistaken:

= bar 3
(brian 3 5)
(brian bar 5)

Here, the two calls are equivalent: in the second call, 'bar' is an expression that evaluates into the value 3, and it is this value that gets passed to 'brian', not the variable; 'brian' only sees the value and knows nothing about the variable 'bar'.

return

Like operators, a function call returns a value; that value's kind and its significance is up to the function's author. A function call returns either when the end of the function body is reached (as in 'eric' and 'brian', above), in which case null is returned by default, or a function call returns when a return statement is encountered. A return statement has this general form:

return <expression>

The value of the expression is the value that gets returned from the function call. If no expression is given, null is returned.

function ack
   return 3
...
(print (ack))   # prints 3

(I will use ... to mean 'What follows below is some unspecified place elsewhere in the program.' or 'Unspecified stuff omitted here.')

Finally, here's an actually useful function, one that computes a factorial:

function factorial n
   = val 1
   = i n
   while (gt i 0)
      = val (mul i val)
      = i (sub i 1)
   return val
...
(print (factorial 4))  # prints 24

The 'factorial' function returns the factorial of its numeric operand. Here, there is only one return and it comes at the end; while this is a common pattern, it's perfectly legal to have more than one return point in a function (just like it's legal to have more than one break in a loop), and return can occur anywhere in the function.

The Special Function 'main'

Every program needs a "point of entry"---a place where execution begins. In Pigeon, the entry point is the function given the name 'main' of the module you have open when you click 'run', and the program ends when the call to 'main' returns (the value returned by 'main' is ignored). So your program should include a function named 'main', and 'main' shouldn't have any parameters:

function main
   # do stuff...

Understand that the 'main' function, aside from being the entry point of your program, is in all other respects just a normal function. In fact, 'main' can be called recursively by itself or other functions (not that these are necessarily sensible things to do).

Why Use Functions?

So why write functions? The most obvious benefit of writing a function is that, instead of duplicating the same code in multiple places, you can write a single function and invoke it in the multiple places where it is needed. This is in accordance with the important programming mantra, Don't Repeat Yourself (DRY): when you see the same code in many places (or very similar code with slight variations), it's quite likely there's a better way the code could have been written that avoids such duplication.

Even if a piece of code only appears in one place, it still might be a good idea to put it in its own function, as this makes your code more modular and therefore easier to understand. Imagine that inside a function named 'foo' you have 30 lines of code that paints the roses red; if you split that code into its own function called 'paintingTheRosesRed', then 'foo' just became a lot easier to read and understand because all that code was reduced to one line that reads (paintingTheRosesRed). If you name your functions well, your code won't need many comments because it will be "self-documenting", as we say.

A well-written program ideally consists of short, well-defined functions. There's no exact definition of 'short', but most programmers define it as somewhere under 30 lines. In practice, a few longer functions up to ~100 lines are acceptable, but any time you go above 30-40 lines should warrant second thought. The most important rule is that a function should do one thing and only that one thing. If we're confident that a function does the one thing it's supposed to do, we can generally just use it without thinking about how it does its business.

Deciding how to modularize your code into functions is largely a matter of good taste, and developing that good taste is one of the most important steps in becoming a good programmer.

Scope of Variables

The variables of a function exist only for the duration of each call to that function: each time a function is called, the variables in the function are created anew for that call and then discarded when the call returns. In jargon, we say that a variable is 'local' to the 'scope' of its function.

function foo cat dog
   = bird 4
   return (sum cat dog bird)

The variables 'cat', 'dog', and 'bird' are local to the function 'foo'. (The parameters of a function are just as much local variables as the other variables; parameters are special variables only in how they are declared and how they get their initial values.) Each time 'foo' is called, these variables get created anew and then get discarded when the function returns. Consequently, the value of each variable is not preserved from one 'foo' call to the next, and it makes no sense to try and use these variables in other functions, for the local variables of a function do not exist in other functions. Two separate functions can both have a variable named x, but they each have their own x, two separate variables that just happen to have the same name.

function richard apple orange
   = coconut "hi"
   ...

function michael sofa coconut
   ...

These two functions both have a local variable named 'coconut', but the 'coconut' of one has nothing to do with the other. They are as independent as if they were to have totally different names. Languages allow you to reuse variable names among functions because it would be very annoying and inconvenient to have to give every single variable in a program its own unique name whether or not it made sense to do.

Often times you will wish you can use data in one function from another only to be frustrated by how hard the language makes it: by design, you're supposed to only have information entering your functions via the parameters and exiting via return values. As you'll see in a moment, there are workarounds to this limitation, but it's important that you learn to live within these limitations as much as possible. Promiscuously sharing data at all places in the code is the easy way to do trivial things, but when a program gets beyond trivial in size, it makes your code's logic very difficult---if not impossible---to reason about: if every piece of data can be modified anywhere, you'll have to think about every piece of data everywhere, which doesn't scale as your code gets larger and larger. The rules of 'local scope' give us a base-line guarantee that when we're looking at a function, we don't have to worry ourselves about the local variables of all the other functions.

Global Variables

A special kind of variable which exists independently of any function is a global variable, so called because it can be used "globally", i.e. in any function. A global variable is created with a globalvar statement, which is used like = but placed outside and before the function definitions.

globalvar cat "hello"  # create a global variable named 'cat', giving it the initial value "hello"

function foo
   (print cat)   # print the value of the global variable 'cat'

function bar x
   = cat x   # change the value of the global variable 'cat'

(foo)               # prints "hello"
(bar "goodbye")
(foo)               # prints "goodbye"

Functions 'foo' and 'bar' can both use and change the value of global variable 'cat'. Because a globalvar named 'cat' exists in this module, the assign to 'cat' in 'bar' doesn't create a local but rather assigns a value to the global variable 'cat' (in effect, you can't have a local variable with the same name as a global variable in the same module).

Notice that because 'foo' reads a global variable, the behavior of calls to 'foo' depend upon calls elsewhere. This is precisely what's dangerous about shared data: you can't just look at the code of 'foo' to fully understand 'foo'. In trivial examples like this, it seems harmless, but in real-world programs, the effect of shared data easily becomes pernicious.

In practice, all real-world code uses some amount of shared data, but it's important that you learn to minimize sharing data, just as you must learn to avoid repeating yourself. (Some programming instructors hammer this into their students by forbidding them to use any globals at all; this might instill good habits, but it also may confuse students with good taste, for avoiding globals entirely can lead to ugly convolutions in code that are just as bad as using too many globals.)

Functions as Values

In Pigeon, you can treat functions as values in the sense that you can assign functions to variables and call them via those variables:

function foo
   (print "hi")
...
= bar foo
(bar)     # print "hi"

You can also pass a function into a function:

function cat dog
   (dog)
...
(cat foo)    # print "hi": the function 'foo' is passed as the value to 'dog', then 'dog' is called

...or return a function out of a function:

function cow chicken
   if chicken
      return foo   # return the function 'foo'
   else
      return cat   # return the function 'cat'

If (x) is executed when x is not the name of a function or a variable pointing to a function, this is an error that will terminate the program.

As we'll see later, this ability to pass functions around as values can be very handy.

Recursive Functions

The term for the situation where a function calls itself is recursion. A recursive function may sound like a logical impossibility, but consider a trivial example:

function foo
   (print "a")
   (foo)
...
(foo)   # prints an infinite chain of "a": aaaaaaaaaaaaaaaaaaaaaaaa.....

...when called, 'foo' will call itself. Consider another example:

function foo
   (print "a")
   (bar)

function bar
   (print "b")
   (foo)
...
(foo)   # prints an infinite chain of "ab": ababababababababababababab.....

Here, the recursion is indirect: 'foo' calls 'bar' and 'bar' calls 'foo', meaning a call to 'foo' will result in more calls to 'foo' and a call to 'bar' will result in more calls to 'bar'.

While both of the above examples are perfectly legal code, the problem is that they result in infinite chains of calls, which are undesirable in the same way that infinite loops are undesirable. The trick when writing recursive functions usually lies in getting them to stop calling themselves at some point. Here's a terminating recursive function:

function factorial n
   if (eq n 0)
      return 1
   return (mul n (factorial (sub n 1)))

We've already seen a 'factorial' function, but here's a recursive version that accomplishes the same thing. Try mentally stepping through a call to this function with a small input, such as (factorial 5). You'll see that the reason this recursive function eventually stops calling itself is because each successive call to 'factorial' is with a smaller argument until finally (factorial 0) is called, which returns 1 before another recursive call to 'factorial' is made, thereby breaking the chain. This is the key: for a recursive call to not be infinite, at some point in a chain of recursive calls, the function should return before another recursive call is made.

To understand recursion, be clear what happens when a function call is made: the currently executing function and all of its variables wait for the called function to return, and then the function resumes where it left off. For instance, above, when 'factorial' calls itself, the value of 'n' is kept around, for the 'n' of each call is independent of the 'n' of any other call to that function. When you call (factorial 5), there are five separate 'n' variables waiting by the time (factorial 0) gets called. As the recursive calls return, one-by-one those 'n' variables get discarded until finally the (factorial 5) call itself returns and there are no more 'n' variables left waiting around.

Why would you want recursive functions in the first place? Well, some problems are best solved recursively, and a few can only be solved recursively. In truth, computer scientists tend to have unnecessary obsessions with all things recursive despite the fact that most practical programming gets by just fine without any recursion at all. Still, recursion is an important concept to grasp, not the least because you don't really understand how functions work if you can't see how recursive functions work: the variables of each call to a function are totally independent from those of other calls to the same function.

(Although computing factorials is the classic example of recursion introduced to students, it's actually not a realistic example at all, for the iterative (non-recursive) version is both easier to understand and better performing. Much better examples of recursive functions are those that deal with recursively structured data, such as trees.)

Lists

When dealing with large sets of data, it would be awfully inconvenient and impractical if each individual value required its own variable. For instance, if our program involved a roster of student names, it would be silly if for each name we needed an individual variable, like so:

globalvar name1 "George Washington"
globalvar name2 "John Adams"
globalvar name3 "Thomas Jefferson"
globalvar name4 "James Madison"
globalvar name5 "James Monroe"
# ... and so on

Not only is this a lot of lines just for creating variables, imagine the code that would result every time we wished to do something for each piece of data, for instance if we wished to print out each name:

(print name1)
(print name2)
(print name3)
(print name4)
(print name5)
# ... and so on

And just imagine passing a large set of data into a function: you would need a parameter for every single value. This situation is bad enough when we have five pieces of data, but imagine if we had 100 pieces, 1,000, 1,000,000, or even more.

Another problem using individual variables for each piece of data is that often we're dealing with sets of data of which we don't know the size at the time of writing. For instance, in a program that keeps track of student attendance, we want it to work no matter how many or few students each class has, but when using individual variables for each student, our only solution is to create n number of variables and hope that no class has more than n students. This is bad: guessing and crossing your fingers is not good programming practice. (Many, many programs have caused endless trouble because their authors said something like, 'No class could possibly have more than 200 students!'; in general, think twice before you say, 'Users won't ever do x,' because you'll usually be wrong.)

What we want is a way to keep many values in one variable, and this is where lists come in. A list is a collection of values wherein the values are ordered, meaning that, in a list of, for instance, four values, one value is first, another is second, another is third, and another is fourth. The position of a value in the list is how we uniquely identify that value.

A list is created using the list operator. A list itself is a kind of value, so we can assign it to a variable and pass it in and out of functions:

= dog (list)   # assign 'dog' a newly created list
(cow dog)           # call 'cow', passing the list referenced by 'dog' as the argument

When first created, a list has no elements (values). The append operator adds elements to the end of an existing list:

# the list referenced by 'dog' is empty
(append dog "hi" "bye")   # "hi" is now the first element of the list and "bye" is the second
(append dog 3 6)          # the value 3 is now the third element of the list and 6 is the fourth

The get operator retrieves a specified value from a list:

(print (get dog 2))       # print "bye", the second element of the list

The set operator gives a new value to a specified position in a list:

(set dog 2 "moo")      # change the second element of the list to the value "moo"
(print (get dog 2))    # print "moo", the second element of the list

The get and set operations cause a program error if you try to access an element of a list that doesn't exist, e.g. for a list with only 5 elements, get'ing or set'ing the 6th element causes an error that will terminate your program.

The len operator returns the 'length' of a list (the number of elements):

(print (len dog))   # print 4, the number of elements of the list

Using len, we can loop through all the members of a list without having to know ahead of time how many elements it has:

= i 1
while (lte i (len dog))
   (print (get dog i))     # print the i'th element of the list
   = (add i 1)

Lists are handy when we want to return more than one value from a function:

function foo
   = bar (list)
   (append bar 13 18)
   return bar   # return a list with two elements

Using lists doesn't entirely solve our problem with large sets of data, for we still have a lot of data written in our code. This is bad because it clutters our code and makes it difficult for non-programmers to inspect and modify the data. The solution is to have any significant amount of data stored independently of the code, either in files or databases. Our program should then read the data from these sources as needed. (The ability to read and write files and access databases is deliberately left out of Pigeon because you're not supposed to be using Pigeon for anything but learning to write very small programs. I recommend not trying to use big sets of data in your Pigeon programs.)

List Mutability

Previously, the only value types we've dealt with are numbers, strings, booleans, and the special null value. Consider this scenario:

= x 3
= x 5

Here, we are changing which value the variable 'x' points to, but we are not directly changing the 3 and 5 values themselves: the memory where the value 3 is stored does not get overwritten with the value 5; instead, 'x' represents a place in memory where the address of the value 3 is stored, and that address gets overwritten with the address of the value 5. Number values in Pigeon never themselves change---number values are immutable---as are strings, booleans, and the special null value. Some operations may seem like they modify values, but in fact these operations only produce new, separate values:

= x "woo"
= y (concat x "hoo")  # assign 'y' the value "woohoo"

Here, concat produces a new string, "woohoo", without touching the original strings "woo" and "hoo" in memory, so 'x' still points to the string "woo".

Unlike numbers, strings, booleans, and null, lists are mutable: the set and append operations directly change a list in memory, not the variable(s) referencing the list. Because lists are mutable, a variable referencing a list might have its list modified indirectly:

= dog (list)
= cat dog      # 'cat' now references the very same list in memory as 'dog'
(print (len cat))   # print 0
(append dog 3)
(print (len cat))   # print 1

When multiple variables reference the same list, modification of the list affects them all. This holds true when we pass a list to a function:

function foo rat
   (append rat "yo")
...
= moose (list)
(foo moose)             # 'foo' modifies the list passed to it
(print (get moose 1))   # print "yo"

In Pigeon, a list is stored in memory as a list of addresses: each position of the list is an address pointing to some value in memory. In this sense, each position is just like a variable, and so a list element can be any kind of element, including other lists. In fact, a list can have itself as an element.

Dictionaries

While a list is an ordered collection in which the elements are identified by their index (their numerical position in the list), a dictionary is an unordered collection in which each element is identified by an associated string called a key. Each element and its key are together known as a key-value pair.

An empty dictionary is created with the dict operator:

= x (dict)

Use the set operator to add a new key-value pair to a dictionary or to change the value associated with an existing key.

(set x "fred" 3)   # in the dictionary referenced by 'x', assign the key "fred" the value 3

Use get to retrieve a value from a dictionary:

(print (get x "fred"))   # print 3, the value associated with the key "fred"

Key-value pairs only work one way: you can look up a value by key, but not look up a key by its value. The reason for this is that, while the keys in a dictionary must be unique (e.g. you can only have one key "fred"), the values need not be unique (e.g. a key "barney" can also be associated with the value 3). If you tried setting or getting a key by its value, there might be multiple keys associated with that value, so the operations would be ambiguous---thus, there are no such operations.

Dictionaries are preferable to lists in cases where we have a collection of pieces of data of no particular order. For instance, imagine we wished to store the personal information of many individual people; we could write a function that creates a dictionary representing one individual person in which each key-value pair is a piece of information about that person:

function makePerson firstName lastName age address socialSecurityNum
   = person (dict)
   (set "first name" firstName)
   (set "last name" lastName)
   (set "age" age)
   (set "address" address)
   (set "SS num" socialSecurityNum)
   return person
...
= nixon (makePerson "Richard" "Nixon" 81 "1600 Pennsylvania Ave." 5555555555)
(print (get nixon "first name"))   # print "Richard"

Like lists, dictionaries are mutable:

= cow (dict)
= goose cow             # 'goose' and 'cow' both reference the same empty dictionary
(set cow "name" "Fred")
(print (get goose "name"))   # print "Fred"

The key-value pairs of a dictionary can have any kind of value, including lists. Because dictionaries are themselves a kind of value, they too can be values in a dictionary or elements of a list.

Object-Oriented Programming

Any kind of thing, whether abstract or concrete, can be thought of as being composed of a set of properties. For instance, a state (the political kind) is an abstract thing consisting of many properties---a chief executive, a legislature, a bureaucracy, a military, a judiciary, etc.---while a table is a concrete thing consisting of properties---height, width, color, material, style, age, owner, monetary value, and so on. A property of a thing might usefully be decomposed into its own properties, e.g. a state has a military, and a military in turn is made up of many things, such as officers, troops, vehicles, and so forth. In addition to properties, a thing might have a set of operations (actions), e.g. a dog can eat, sleep, run, and bark.

In programming, a thing represented as a bundle of properties and operations is called an object. Object-Oriented Programming---abbreviated as OOP (pronounced like 'oops') or just OO (pronounced 'oh-oh')---is a paradigm (style) of programming that revolves around objects. Some languages, such as Java, C#, and C++, are designed to facilitate programming in the OOP paradigm, but you can do OOP in any language.

In Pigeon, an object is neatly represented as a dictionary: each key-value pair can represent a property or operation of the object, such that each key is a string naming a property or operation while the key's associated value represents the property or operation itself (operations are represented as functions).

function meow
   (print "meow")

= cat (dict)
(set cat "name" "Mittens")
(set cat "age" 7)
(set cat "weight" 4.5)
(set cat "meow" meow)   # give the 'cat' object a 'meow' operation

In programming, when we model a real-world thing as an object, we can never exhaustively represent all of the thing's properties, and in turn, the properties we do choose to include often cannot themselves be fully accurately represented. In the end, all of our data has to be stuff we can represent in bits---mainly, numbers and text. So what we do is just stick to the properties and operations of the thing that we care about simulating in our program, and we model these properties as best we can. For instance, in a video game, we might have an object representing a kind of a monster; the primary aspects of this monster relevant to our purposes are its position in the world (a coordinate), its appearance (a 3D model, i.e. a bunch of coordinates), and its health (a number representing how many more times we have to hit the thing to kill it); the monster would be given operations such as 'move', 'attack', and 'die'. (The code for actually rendering this monster on screen would most likely be elsewhere in different objects and functions.)

Objects need not be real-world simulacrums, and in fact, the majority of objects we code represent abstract, computery things, such as files. The basic point of OO is to encapsulate: to associate sets of actions with types of data as a way of organizing our code and thinking about problems.

Classes

Two objects with the same set of operations and the same set of properties (the property names and types, not actual property values), can be said to be objects of the same class---i.e. type or kind---of object. For instance, let's say:

If all of the above is true, then x and y are objects of the same class: two objects of the same class share operations and differ only in the values of their properties, not the number, name, or type of their properties, e.g. given two people, they both have a date of birth property, but their dates of birth may differ.

To understand why class is an important idea, imagine we write a function that expects an object with a property "bar":

function foo x
   = y (add 3 (get x "bar"))
   ...

Because this function expects the argument passed to 'x' to have a property named "bar" whose value is a number, this function will cause an error if this is not the case. Just like a parameter expecting a value expects it to be of a certain type, a parameter expecting an object may expect it to be of a certain class because otherwise it might attempt to use properties and operations of the object that the object doesn't have or are of the wrong type, causing an error. However, there isn't necessarily just one class that satisfies all the requirements a parameter is expecting, e.g. multiple classes can all have a numeric property named 'bar', and so 'foo' will be happy to take objects of any of those classes as arguments to 'x'. This principle is known as polymorphism: the same operation can be used upon different types of data because those types share the right set of features in common.

When class X includes all the properties and operations of class Y, X is said to inherit from Y, i.e. X is a child of Y, and Y is a parent of X. A child is a complete polymorphic stand-in for its parent: every operation that can take the parent as an argument will work equally well with the child because the child has all the right properties and operations.

Methods

In OOP, the jargon term for a function which acts as operation of an object is method. What's wrong with just saying 'function'? Who knows.

A method very often requires access to the properties or other operations of its object, so its very typical for methods to include a parameter for passing in the object itself:

function bar obj x y
   ... 

# assume 'bar' is an operation of object 'foo'
((get foo "bar") foo 3 4)  # get the method 'bar' and then call it

Because calling methods this way is ugly and verbose, Pigeon includes the cm operator just for calling methods:

(cm foo "bar" 3 4)   # the 'cm' operator stands for 'call method'

The cm operator gets the method from the object and then calls it, passing the object as the first argument. (For clarity and consistency, I recommend giving every method a first parameter for receiving the object, even those methods which don't use the object; that way, you can call every method with cm.)

For two classes to be fully polymorphically compatible, their operations not only must share the same names, they must expect the same number, type, and order of arguments:

function dog foo
   (cm foo "bar" 3 4)

Whatever class of object is passed to 'foo', it must have a "bar" method expecting 2 numeric operands, otherwise this will cause an error.



Sign in to add a comment