C is a general-purpose programming language developed by Dennis Ritchie at Bell Labs in the early 1970s. It introduced a lot of the syntactic and programming conventions we see in the numerous programming languages it influenced, including Java, Javascript, Go, C#, Objective-C, and of course C++.
C code maps very efficiently to lower-level machine code, giving programmers very precise control of the execution of their program. Because of this, it's often used in systems programming for low-latency, low-overhead, close-to-hardware applications.
Now, without further ado, let's take a look at C!
Before going into detail about the different types available in C, it's important to convey what we mean by the term, "type" (though without getting too technical with a mathematical definition).
For our purposes, and colloquially speaking, a type in C is just how we choose to understand the data we find in memory. It defines how much data we should be expecting to read, and how to understand the data we find.
Now, with that in mind, let's look at some types!
Integers in C all use the binary number system we discussed previously; two's complement for signed integers, and regular binary for unsigned. So the only difference between integer types (of the same signedness) is their size, which is also related to the range of values they can represent.
C only defines the size of each type in relation to the others:
char <= short <= int <= long <= long long
Technically there's nothing stopping them from all of them being the same size, but it's all dependent on what kind of machine and compiler you're using. CLAC machines use the following:
Type | Size |
---|---|
char |
1 byte |
short |
2 bytes |
int |
4 bytes |
long |
8 bytes |
long long |
8 bytes |
On most modern machines, the sizes for char
, short
, and int
are fairly consistent, and the same as what we have on CLAC.
However, the sizes for long
and long long
vary a lot from system to system.
You can always test it out for yourself:
#include <stdio.h>
int main(void) {
printf("char: %lu\n
short: %lu\n
int: %lu\n
long: %lu\n
longlong: %lu\n",
sizeof(char), sizeof(short), sizeof(int),
sizeof(long), sizeof(long long));
return 0;
}
There are some more technical details different ways of declaring integers, which you can find in the Appendix.
There are three types for floating-point numbers in C, for single-precision, double-precision, and extended-precision floating point. How they're bitwise represented differs from machine to machine, but most modern computers use the IEEE-754 floating-point standard (if you're curious to see what that looks like in binary, see the Appendix).
Like with integers, the sizes are defined relative to each other. They are:
float <= double <= long double
The sizes on CLAC are as follows:
Type | Size |
---|---|
float |
4 bytes |
double |
8 bytes |
long double |
16 byetes |
Again, you can test it out for yourself:
#include <stdio.h>
int main(void) {
printf("float: %lu\n
double: %lu\n
long double: %lu\n",
sizeof(float), sizeof(double), sizeof(long double));
return 0;
}
Just like in Java, we declare variables using the following syntactic structure
(note that the []
s indicate that the type modifiers are optional,
and the ..
indicate that there can be multiple modifiers):
[<modifiers..>] <type> [<modifiers..>] <variable name>;
This declares a variable, but does not initialize it. That is to say, we've told the compiler how much space we want and how to read what we'll eventually put there, but we've not actually put anything there yet! There's no guarantee what garbage might be written in the piece of memory we're given, so trying to access it will result in undefined behavior (no guarantees for what will happen during compilation or execution; you'll here us talk about this a lot, specifically about avoiding it).
int x; // x is declared, but uninitialized
printf("%d", x); // undefined behavior; don't do this!
x = 4; // x is now initialized to 6
We can also declare multiple variables of the same type all at once,
separating the variable names with a ,
:
int y, z; // y and z are declared, but both uninitialized
Just like in Java, we can declare and initialize our variables all at once:
int x = 2; // x is declared and initialized to 9
int y = 0, z; // y and z are declared, only y is initialized (to 0)
Also like in Java, we can cast variables from one type to another. We do so with the following syntax:
(<type>) <variable>
This comes in handy for changing the type of a variable to suit the interface of a function we might want to call, or to trick the compiler (as you saw before).
For example, let's say that we have a function foo()
that takes a single float
as its parameter:
int x = 34;
foo((float) x);
When we cast from an integer type to a floating-point type, the compiler produces instructions to convert our integer representation to floating-point representation.
Keep in mind that when we convert between integer types,
we extend/truncate our integer in order to fit it into the size of the type
we are casting it to. Note this can lead to some strange-looking results!
For example, if we're casting from a positive long
to an int
,
and the long
happened to have a 1 value at the bit position
corresponding to the MSB of an int
, the result may be a negative number:
#include <stdio.h>
int main(void) {
char c = 4;
int i = 4;
long l = 3000000000;
printf("char to int: %d\n", (int) c); // prints 4
printf("int to float: %f\n", (float) i); // prints 4.000000
printf("long (no cast): %ld\n", ll); // prints 3000000000
printf("long to int: %d\n", (int) ll); // prints -1294967296
return 0;
}
As you can see, downcasting the 8-byte long
truncated our integer,
which means something very different as a signed, 4-byte integer.
When we convert between floating-point types of different levels of precision, again the compiler helps us do the conversion, though we may lose precision if we're casting from a more precise floating-point type to a less precise one.
We can actually use casting to "cast away" some of the things the compiler
tries to enforce for us, like pointer types and const
ness.
Check out the Appendix for more details.
Literals are values hard-coded into our program. Most literals look fairly self-explanatory (especially numeric literals), but there are some C-specific notations that we should be aware about.
Octal (base 8) literals start with a leading 0
.
The following digits can only consist of 0
through 7
.
Hexadecimal (base 16) literals start with a leading 0x
or 0X
The following digits can only consist of 0
through 9
and a
through f
(capitalization doesn't matter).
Let's look at different ways of writing the number 42:
42 // decimal
052 // octal
0X2A // hexadecimal
0x2a // also hexadecimal
To specify that a number is unsigned, it should be followed by a u
or a U
.
For example:
8u
0xffffffffu
A number by itself is understood by the compiler to be an int
by default.
To specify that it is a long
or long long
,
use l
/L
and ll
/LL
respectively by appending it to the end of the number.
For example:
4l
4LL
0x4l
Floating numbers can be written several different ways:
- Just as a number:
42
(inferred) - A number with a decimal:
42.
,4.2
,.42
- Exponent form:
4e2
These are all double
s by default; one can specify float
or long double
using suffixes f
/F
or l/L
respectively.
You might've noticed that, unlike in Python or Javascript,
in C, single quotation marks '
aren't the same as double quoatation marks "
.
This is becauase single quotation marks are used for single character literals,
while double quotation marks are used for string literals (see below).
C uses ASCII encoding for characters, which assigns each character an integer.
Recall that a char
is usually 1 byte; this is because each ASCII character
can be representedy a single byte (it only specifies 128 characters).
Have a look at The ASCII Table to see what's what.
Instead of assigning raw numbers to char
s,
we can alternatively write them as character literals:
char a = 'a'; // same as writing char a = 97
char newline = '\n'; // same as writing char newling = 10
assert(a == 97);
assert(newline == 10;
The backslash \
is used to specify an escape sequence,
to denote characters that might be difficult to otherwise type out.
Here are some common escape sequences:
Escape Sequence | Character Description |
---|---|
\n |
newline |
\r |
carriage return |
\\ |
backslash |
\' |
single quote |
\" |
double quote |
A \
may also be followed by a numeric value in octal notation;
\x
may be followed by a numeric value in hexadecimal notation.
So, consider the following:
char a = 0;
char b = '0'; // 48 in binary
char c = '\0';
char d = '\x0';
assert(a != b);
assert(a == c);
assert(a == d);
char
literals can be added and subtracted just like any other number,
so if you wanted to find what position in the alphabet the letter j
is,
you could do something like the following:
'j' - 'a'
We write string literals as a sequence of valid char
literals
(including escaped characters), surrounded by double quotation marks "
:
"I am a not really a 'string', just an array of characters.\n"
String literals actually represent a null-terminated array of characters, which the compiler pre-allocates for us in static memory.
In C, we differentiate between expressions and statements. Simply put, statements don't necessarily have a value, whereas expressions do. All expressions can be statements, but not all statements can be expressions.
This will become clearer as we look at some examples.
A C expression is any syntactic construct that yields a value. It can be as simple as a variable or a literal, or it can be combinations of other expressions via operators. Function calls are also expressions that yield value of what they return.
We're not going to go through all the operators one by one, but we'll note some important details and gotchas.
For variables x
and y
, and some function foo()
,
the following are examples of expressions:
3157
x
x + y
x = y + 3157
x += y
x++
foo(x, y + 3157)
Note that an assignment =
operation is also an expression!
It assigns the value of its right operand to its left operand,
and yields the value of the left operand.
This might seem weird at first, but it allows us to do things like:
x = y = 3157
x = (y = 3157)
The second line unpacks it a little bit:
it's saying, assign 3157
to y
, and then assign the value of that to x
.
Well, these aren't really uncommon, but it's possible you haven't seen these yet depending on what language experience you've had before.
Apply the operation before the =
to the left and right operand,
then assign to the left operand.
So, x += y
is the same as x = x + y
Used to add 1 or subtract 1 from its operand.
There are two flavors of these operators:
the prefix version ++x
--x
(appears on the left of the operand)
increments/decrements first, and yields the value after the operation
(x++
is equivalent to x += 1
);
the postfix version x++
x--
(appears on the right of the operand)
increments/decrements as well, but yields the value before the operation.
int b1 = 5, a1 = 5; int b2 = ++b1; // b2 is assigned the value of 6 int a2 = a1++; // a2 is assigned the value of 5
This is the syntax for the ternary operator:
<expression-1> ? <expression-2> : <expression-3>
The value of a conditional expression is dependent on that of
the expression before the ?
operator.
If <expression-1>
is non-zero, then the conditional value yields the value
of <expression-2>
; otherwise it yields the value of <expression-3>
.
Here's an example:
int t = 1, f = 0;
int a = t ? 42 : 24; // a is assigned the value of 42
double b = f ? 4.2 : 1e20 // b is assigned the value of 1e20
It's sort of the same idea as an if/then/else, but keep in mind that those are separate, syntactic constructs in C (which we will cover below).
You've probably noticed that there is no boolean type in C.
That's we simply represent false with 0, and true with anything non-zero!
This allows for some convenient shorthand: if (x != 0)
is the same as if(x)
.
The comparison operators, <
>
<=
>=
==
=!
.
Operations that typically return boolean values in other languages
instead return 1 for true and 0 for false in C.
This includes the comparison operators, <
>
<=
>=
==
=!
,
and the logical operators &&
||
!
.
We can confirm this with the following code:
printf("'true': %d\n", 0 < 3); // prints 1
Usually, an expression that contains subexpressions (as operands to an operator, or as arguments to a function call) will first evaluate those subexpressions before evaluating the greater overall expression. So let us consider the following two functions:
int foo(void) {
printf("foo!\n");
return 1;
}
int bar(void) {
printf("bar!\n");
return 0;
}
The following expression will evaluate both foo()
and bar()
function calls:
foo() + bar()
We can tell by the fact that both foo!
and bar!
are printed.
However, &&
and ||
use short circuit evaluation;
this means that their operands are evaluated only as necessary.
For example, for the expression L && R
, if we know L
is 0 (false),
we know the value of the logical AND expression is going to be 0
regardless of what happens with R
.
The operators skip the operands which they know are no longer necessary.
To demonstrate:
foo() && bar(); // prints foo! bar!
foo() || bar(); // only prints foo!
bar() && foo(); // only prints bar!
bar() || foo(); // prints bar! foo!
This means that for some short-circuited statement like foo() && bar()
,
as long as foo()
returns a nonzero value, bar()
will never even be called.
Like in any other programming or symbolic language, C defines an order of operations for each of its operators. This way, both the programmer and compiler can agree on what order to evalaute expressions.
You should be familiar with a lot of the most common rules (like multiplication/division before addition/subtraction), but there is one note-worthy gotcha that is quite commonly used in C:
*p++
This doesn't increment the variable you get from dereferencing p
;
instead, this expression takes on the value of *p
,
and then increments p
afterwards. It's not the same as (*p)++
;
it's more along the lines of something like, *p, p++
,
where the two operations are done upon p
separately.
Aside from that, C operators are pretty straightforward to read and write. CPP Reference has a pretty clear listing of each operator, their precedence, and their associativity.
if(condition)
if(condition2) {
printf("conditions met");
return 0;
}
else
printf("no conditions met");
switch(v) {
case 1: // v == 1
printf("v is 1");
break;
case 2:
printf("v is 2");
break;
default:
printf("v is neither 1 nor 2");
}
int i;
for(i = 0; i < 10; i++) {
// do things here
}
while(i) { //condition checked at the beginning
//other things
i--;
}
do {
//more things
} while(i < 5); //condition is checked at the end
Using break;
inside a loop will break out of the innermost loop. Using
continue;
will stop executing the current iteration of the loop and skip to
the next iteration. for(;;)
is an idiom for an infinite loop. goto label
will jump to a line beginning with label:
. Be careful with gotos.
There are a few modifiers we can use to specify exactly what kind of integer we want our compiler to declare for us.
When we declare an integer type, we can modify it with the keyword
signed
or unsigned
to tell the compiler the signedness of the integer.
When unspecified, most integer types are signed by default.
C also allows the programmer to use long
and short
as modifiers for int
,
which are the same as not having used the keyword int
at all.
So don't be alarmed if you see a long int
type! It's just a long
.
Also, if we don't write int
and only specify the signedness, it's an int
.
The following are equivalent:
short
| short int
| signed short
| signed short int
long
| long int
| signed long
| signed long int
long long
| long long int
| signed long long
| signed long long int
int
| signed
| signed int
unsigned int
| unsigned
unsigned short
| unsigned short int
unsigned long
| unsigned long int
unsigned long long
| unsigned long long int
This is just a detail to watch out for, but not super important. Most of the time we use the notation in the leftmost column, by convention.
If you're curious to see what IEEE-754 floating point numbers look like, you can reuse some of your code from lab1 to do so:
float f = 3.157; // or whatever you want it to be!
int *i = (int *) &f; // don't worry about what this means
// we're tricking the compiler to think that
// the float is a binary (;
print_hex(*i);
print_binary(*i); // implementation left as an exercise to the reader
Note that this is also an example of casting pointers.
By casting the float
pointer to an int
pointer, once we derefence it,
we can interpret the same 4 bytes worth of data as integers rather than floats.
The const
keyword tells that compiler that once a variable is initialized,
it shouldn't be assigned to again.
The type modifer can come either before or after the type:
const int x;
int const y;
Note that this is only compiler enforced, but like what we did above to print floats in binary, we can trick the compiler