Now that we understand what a variable is, how to use functions, and all the basic ideas behind programming, we’re gonna learn how to export our variables and functions to outside code as well as what a container is and how to use them.
In modern-day programming there is essentially only one type of containers: classes. But classes are not something to be used when learning programming as they can either lead to awesomely crafted code, or a big pile of stinking cr*p (like 99% of Java). Classes will be taught (alongside other high-level programming techniques) in 10 Days of Intermediate Programming, which will begin when this current series end.
Instead we’re gonna learn the more low-level and simpler containers: arrays and structures.
So far, when declaring variables we were only creating one variable at a time:
int x = 0
But one simple variable might not always be what we are looking for.
What if we are working with larger ammounts of data? Perhaps need to work with a lot of integers, or even simpler: we need to work with text.
What we are looking for here is an array.
An array is a consecutive collection of elements, each element of the same type (and therefore of the same size and treated the same way). Let’s say we have an array holding 10 integers, this would be the graphical representation of the array:
|4|6|0|8|4|8|2|6|0|7|
When we declare an array, we give the array a name, let’s call this one MyArray, the name of the array will be bound to the beginning (very first element) in the array, so we don’t need to give a name to each element, we can just call an element by calling the array and giving a relative position of the element within the array. Basically, what we do is say “I want to access the element 5 positions after the start of the array”.
With that all the compiler has to do is a very simple calculation to obtain the absolute position of the element within RAM: array + offset*size, where size is the size in bytes of each elements. This means that the very first item in the array is item number 0, not number 1, and if the array is of 1 elements, the last element is number 9, not number 10.
With an array we can have large volumes of data all in the same place and all under one common name, rather than giving them very specific names.
The most common place where arrays are used is to create strings.
Strings are a bunch of characters next to each other (arrays) that form human-readable text:
|H|e|l|l|o| |W|o|r|l|d|!|
To declare arrays in C we do similar to a normal variable, but we specify that it is an array using [] and the size of the array inside the [], the above array of integers would be declared like this:
int MyArray[10]
There’s a few things to notice here: the size of the array must be a constant value, it can’t be a variable or dynamically determined at execution time, although it is possible to create arrays with size determined at runtime, but that requires using pointers. Arrays and pointers are very close together in terms of usage, but we won’t be seeing pointers yet.
Now there’s a big problem with arrays: each element must be of the same type. Sometimes what we really need is to collect different variables of different types into one place: this is where structures come into place.
A structure is a collection of variables, each with their own type and their own name, but all packed together in the same place in RAM, one after the other.
structure MyStructure:
int variable1
char variable2
float variable3
....
variableN
This way when we declare an instance of MyStruct we are actually declaring an instance for each and everyone of these variables, then when we want to access a specific variable we simply call it by name inside the structure.
Let’s look at this example in C.
struct MyStruct{
int myVar1;
char myVar2;
}
int main(){
struct MyStruct instance; // here we declare an instance of MyStruct
instance.myVar1 = 0; // here we access the int variable myVar1
instance.myVar2 = ‘x'; // and here we access another variable
return 0;
}
Structures are pretty simple to use and are a central part of Structural Programming, which later becomes Object Oriented Programming. We’ll be seeing both later.
But none of these programming design patterns would work if there was some way to access “outside” code. When we write all these structs, variables and functions in a file it is all fine and great, but what if we want to call a function defined in another file? Or if we want to use a struct defined in another file? We can’t just go around copy-pasting all our functions and declarations onto all our files, first and foremost because copy-pasting code is a code-smell, there are many many techniques for reusing code properly without copy-pasting, if you ever copy-paste code or you have very similar-looking code then you are probably doing something wrong. Secondly, we don’t really need to copy the code, only the declaration of the copy, and last but not least, the language of our choice already has some way for us to include these declarations into our code.
But what is a declaration? A declaration is a bunch of language-specific constructs that don’t turn into machine code, they are only useful for the compiler/interpreter to know how to treat certain data or to know how something works without requiring that “something” to be there.
Let’s take a look at a function for example:
int myFunction(int argument){
// code here
}
When you write this functions two things are created: the code itself, which is inside the function, and the declaration that a function named myFunction exists and that this function returns an int and requires one argument of type int.
When we make a call to this function, or refer to this function in any way, the compiler doesn’t need to know what the function does internally, there’s nothing to be learnt from looking at the function code, what the compiler needs is to look at the function declaration, it’s header, to see if you are correctly using the function (if you are passing the correct arguments, if you are using the return value correctly, etc).
We can easily omit the code and declare a function by using just the header:
int myFunction(int argument);
This header tells the compiler that a function exists called myFunction which returns an int and requires an argument, but the actual implementation of the function is somewhere else.
This allows you to use the function even if the function is not found here as this function could be found on another file or on another library.
Functions are not the only things that can be declared without being implemented. A variable can be declared but it’s actual place of residence could be somewhere else, for that we use the “extern” keyword:
extern int myVariable;
Notice how you can’t initialize a declared variable the same way you can’t include code in a declared function.
An array behaves similarly:
extern int myArray[10];
Structures are a different story, they are always declaration with no code on them, so they are left intact.
Now that we now how to separate declaration from implementation all we have to do is have the declaration in one file and the implementation in another.
The declaration file can be included into other files so they can use your implementation without having to include the entire implementation.
These special files are called header files, they use the .h file extension, and they cannot contain any code at all.
When you need to use certain functionality that another file in your project has all you need to do is include the header file.
#include "myfile1.h"
myfile1.h contains the declaration that myfile1.c implements for the outside world to use.
It is important to know that header files are a technique used by C, C++ and C# because they are languages whose commpilers only compile one file at a time, but there are more high-level languages such as Python that compile the entire folder recursively, so they obtain the declarations of each file directly from the implementation, so there’s no need for header files there.
This works because instead of “including a header file”, in Python we “import a module”, so this “module” that we import gets parsed entirely.
We’ll learn more about modules when we get to namespaces next month or so.
Being able to separate declarations and implementations are possible because compilation of C/C++ code has three steps:
– preprocessing: in this stage no C code is parsed, but rather the preprocessor identifies and manages special keywords before the actual C compiler can do anything. I’ll get to these keywords in a bit.
– compilation: this is the actual process in which C code gets to converted to assembly and then to machine code, but the resulting machine code still has symbols in it that need to be resolved.
– linking: in this stage the generated machine code is parsed for symbols , that is variables and functions addressed by their name instead of their value in RAM, and are replaced with their actual value in RAM. A CPU doesn’t know what a variable named “x” is, but it does know what a memory address is. The linking process could be static or dynamic. Static means linking is done at compile time so the resulting executable has its symbols (or at least the local ones) resolved, while dynamic linking is done just before executing the binary. Each of them have their own pro and con, dynamic linking has the pro that it allows for the executable to be placed anywhere in RAM, while static linking doesn’t. Modern executables use a mix of the two, static linking is only done for symbols that are found within the executable, while symbols found in external libraries are left, and the statically resolved symbols are changed for relative offsets instead of a real RAM address, this allows for the executable to be loaded anywhere on RAM.
The C preprocessor has a few common keywords that are used to make programming a bit easier, mainly:
– include: this tells the preprocessor to read a file and include all of its contents inside the current file.
– define: this allows us to create constant values as well as macros. For example:
#define MAX 10
The above means that every time the preprocessor finds the symbol MAX, it has to be replaced with 10
– ifdef/ifndef/etc: these are a few if-else like constructs but for the preprocessor. It basically means that if a certain preprocessor symbol is defined then the following code is to be used, otherwise use the code in the else. This allows for creation of multiplatform code where depending on the defined target we compile one or the other code. For example:
#define LINUX
#ifdef LINUX
// do something specific to linux
#else
// do something specific to other operating systems
#endif
Note that anything can go in the code between the #if and the #else and that #endif is used to mark the end of a preprocessor construct.
Well this is it for now. Any questions or doubts please leave a comment bellow.
<<< Previous: 10 Days of Basic Programming, Day 4: functions and scope
>>> Next: 10 Days of Basic Programming, Day 6: Input/Output, recursion and some exercises
The post 10 Days of Basic Programming, Day 5: containers and declarations appeared first on Wololo.net.