Skip to content

Latest commit

 

History

History
346 lines (253 loc) · 14.4 KB

cpp.md

File metadata and controls

346 lines (253 loc) · 14.4 KB

Introducing C++

Strings in C

At this point, you're all quite familiar with C's quirks and kinks. Perhaps one of the most notable ones is the absence of a String datatype!

Recall that strings in C are just special cases of arrays. C strings are arrays of characters with a null terminating character at the end.

char buf[50]; //stack allocation
strcpy(buf, "hello");

char *s = (char *) malloc(50); //heap allocation
strcpy(s, "hello");

//don't forget to free s when you're done!

But finally, after months of dealing with those pesky invalid reads, memory leaks, and so on, we can transition into a new language that permits a much friendlier way to handle strings: C++!

Classes/Structs in C++

Before we go into strings in C++, let's talk about the basics of classes and structs first. Although we can define a struct in C++ the same way we do in C, structs tend to get much more complicated in C++ than their C predecessors. But let's start with the C++ version of the struct Pt example from Jae's lecture notes:

struct Pt 
{
	double x;
	double y;
}

Stack Allocation in C++

Here's how you allocate space for a struct Pt in C++:

struct Pt myPt;

//in C++, you can omit the 'struct' part:
Pt myPt2;

It looks exactly like how we allocated it in C! Boooring. But is it really doing the same thing?

Classes/structs in C++ have constructors whose purpose is to initialize all data members and base classes. When we declared our struct Pt, myPt, the default Pt constructor was called under the covers. If we don't define any constructors for a class/struct, the C++ compiler implicitly defines a default constructor for us. The synthesized default constructor will call the default constructors of all members of class type and leave members of built-in type undefined.

What if we attempted to call the default constructor like this?

Pt myPt3();

Oftentimes, the synthesized default constructor is not what we want, so we define our own constructors. Let's say we want the user to be able to specify values for Pt members x and y. We could do so as follows:

struct Pt
{
	double x;
	double y;

	Pt(double myX, double myY);
};

//we could have also made this inline
Pt::Pt(double myX, double myY)
{
	x = myX;
	y = myY;
}

And we can call it like this:

Pt myPt2(4, 4);

When we define member functions for our classes, we can choose to define them within the class definition (the part enclosed by the curly braces of struct Pt), or just declare them within the class definition and define them elsewhere. Member functions defined within a class definition are implicitly inline. Inline functions are kind of like macros in that calls to them are replaced with the body of the function. Inline functions should be short. If we don't define functions inside the class definition, we need to use the scope resolution operator, ::, to indicate that it's a member of the class.

A word of caution, though: if we define any constructors ourselves, we can no longer rely on the synthesized default constructor. Now that we've defined a constructor for Pt that takes two doubles, we lose our synthesized default constructor. A declaration like this would be illegal now:

struct Pt myPt; //error!

So if we want our class to have a default constructor, we'd better make sure to define it along with our other constructors, lest the compiler complain.

A convenient way to account for two constructors in one definition is to use default arguments. If no arguments are supplied to the constructor (as is what occurs when we call the default constructor), default values are assigned to the arguments, as if we had passed these values ourselves:

/*this constructor deals with calls to the default 
constructor and calls to the constructor taking
two doubles*/
Pt::Pt(double myX = 0, double myY = 0)
{
	//if no arguments are passed, myX == 0
	//and myY == 0
	x = myX;
	y = myY;
	
	//note that we'd have to update our function
	//prototype to Pt(double myX = 0, double myY = 0);
}

Hand-in-hand with constructors are destructors. The purpose of the destructor is to deallocate all data members. If our constructor allocated space on the heap for its data members, our destructor should free up that space. If we don't explicitly define a destructor, the compiler will synthesize one for us, which just calls the destructors of all class type members and does nothing for the built-in type members. A stack-allocated variable's destructor is called when the variable falls out of scope, after which the stack shrinks accordingly.

Heap Allocation in C++

Although you can still use malloc to allocate space on the heap, using malloc doesn't allow you to call the constructor of a class type object. A preferred way to do heap allocation in C++ is via the new operator:

Pt *myPt = new Pt; //myPt is a pointer to sizeof(Pt) allocated bytes
Pt *myPt2 = new Pt(4, 4);

//heap-allocated array of Pt's:
Pt *myPtArray = new Pt[10];

The new operator not only allocates space for myPt on the heap, but it also calls Pt's constructor, in this case, the default constructor. Like with malloc, we must remember to free up the heap space we allocated with new:

delete myPt;
delete myPt2;
delete [] myPtArray; //deleting a heap-allocated array

delete calls the Pt destructor of myPt and frees up the heap space it was using. new goes hand-in-hand with delete, and malloc goes hand-in-hand with free. But don't try to mix the four, i.e., calling new and freeing the memory with free.

classes vs. structs

There is a subtle, yet important difference between classes and structs in C++. In a struct, the members defined prior to the first access specifier (e.g., public, private, etc.) are public. In a class, they are private. We want our members to be private, so we'll be writing classes in our labs.

The "Basic 4" and Jae's MyString Class

There are four elements of a C++ class that you should always consider:

  1. Constructor
  2. Destructor
  3. Copy Constructor
  4. Operator=()

The Copy Constructor

We've already discussed constructors and destructors, but what about the other two? The copy constructor specifies how to construct a class type variable using an argument of that class type, i.e.:

string myString("hello");
string myStringCopy(myString); //invoking the copy constructor explicitly

The copy constructor is called implicitly in a couple other scenarios: passing by value and returning by value.

string call_copy(string myString) //copy constructor is called to create temporary copy of myString local to call_copy
{
	return myString; //copy constructor is called to return myString by value
}

If you don't explicitly define a copy constructor, the compiler implicitly defines one for you, which just sets all data members of the newly constructed object equal to all data members of the argument object. This may or may not be what you want. Consider this segment of the MyString class from Jae's lecture notes:

class MyString
{
	char *data;
	int len;
	
	MyString(char *p);
	...

}

...

MyString::MyString(char *p)
{
	if (p)
	{
		len = strlen(p);
		data = new char[len+1];
		strcpy(data, p);
	}

	else 
	{
		data = new char[1];
		data[0] = '\0';
		len = 0;
	}

}

MyString::~MyString()
{
	delete[] data;
}

If we didn't define a copy constructor for MyString, initializing a new MyString from another MyString would make the new MyString's data pointer point to the same heap-allocated space as the old MyString's data pointer. Upon destruction of either MyString, the other MyString would have a data pointer pointing to a freed piece of memory: hello, memory errors!

As a rule of thumb, if your class necessitates explicit definition of a destructor, as MyString does, chances are that your class necessitates explicit definition of a copy constructor.

So how do we define a copy constructor? We know that the copy constructor is called when we try to pass an object by value, so we can't pass a MyString object as a parameter to our copy constructor; how can we define a copy constructor while relying on the existence of a copy constructor? We need a C++ construct known as a reference. You can think of a reference as a dereferenced pointer to something. For example:

int x = 5;
int& y = x; //y is a reference to x

x = 6; //y is now 6
y = 7; //x is now 7

The reference construct allows us to pass an argument by reference, without the need for all that pointer business we've seen before. For example, this function will increment the integer passed as an argument, since the variable it's working with isn't a temporary copy of the argument--it's an alias for the argument itself!

//note that to use iostreams this way, we need to have:
#include <iostream>
using namespace std;

void increment(int& x)
{
	x++;
}

...
int y = 5;
increment(y);
cout << y << endl; //prints 6

Using this new reference construct, we are ready to define our copy constructor:

MyString::MyString(const MyString& s) 
/*note that the const indicates that we will not modify
 the contents of s within this constructor*/
{
    len = s.len;
    
    data = new char[len+1];
    strcpy(data, s.data);
}

Note that the copy constructor takes a constant reference (const MyString&). It's generally better to take constant references for a couple of reasons:

  1. By taking const, you increase the range of things you can take (both const and not const arguments).
  2. By taking a reference, you avoid the overhead associated with making a copy of the argument, so the code will be a little faster.

The Assignment Operator

If your class necessitates the definition of a copy constructor, your class necessitates the definition of an assignment operator for the same reasons. We can write our assignment operator almost entirely the same way we wrote our copy constructor, except now we also have to deal with the existing data of the lvalue. We can examine the contents of the lvalue via the C++ this pointer. this is a pointer to the object on which we're currently operating.

(You probably guessed it. In C++, the this pointer is the same thing as the this reference in Java. What you know as object references in Java are indeed none other than pointers hiding behind a more palatable syntax.)

Now that we have this at our disposal, we can write our assignment operator:

MyString& MyString::operator=(const MyString& rhs)
{
    if (this == &rhs) {
	return *this;
    }

    // first, deallocate memory that 'this' used to hold

    delete[] data;

    // now copy from rhs
    
    len = rhs.len;

    data = new char[len+1];
    strcpy(data, rhs.data);

    return *this; //returns the MyString on which we are calling the assignment operator
}

Note that our assignment operator should return a MyString& so that we can chain calls to it:

MyString MS("hello");
MyString MS2("world");
MyString MS3("yay");

(MS = MS2) = MS3; //MS is now "yay"

Make sure you understand the importance of the Basic 4 and always consider whether they're necessary for your program. Hint: they're almost always necessary.

Other Nifty C++ Features

Implicit Conversions

Recall from your lovely memories with C that there are some automatic type conversions that occur when intermixing types, for example:

int y = 5;
double z = y; //y is coerced into double type

Automatic type conversions can occur for class type variables in C++. For our MyString class, we can do something like this:

MyString s("hello");
s += "world"; 
/*compiler uses the MyString constructor that 
takes a char* to create a temporary MyString whose 
value is "world", which allows us to call our 
+= operator on two MyString objects*/

//note that the lifetime of our temporary MyString is the expression in which it was created

If we want automatic type conversion to occur, we need to make sure that the compiler is able to make the connection between the first type and the other, via our constructors (i.e., we can construct a MyString from a char*, but not from an int.)

Operators

C++ allows us to define our own operators for the classes we write. This means that we can do things like:

MyString MS("hi");
MyString MS2("dude");
MS = MS + MS2; //MS reads "hidude"

You can find a list of all overloadable operators in C++ on page 553 of Lippman, 5th ed. For our MyString class, we'll be overloading +, =, <<, >>, and [] (plus the ones you'll overload in lab 9).

You may find yourself running into the question of member versus nonmember implementation of your operators. Symmetric operators, operators that should allow implicit conversion of either operand, should be nonmember functions. Two examples of these are the + and - operators. Operators whose left-hand operand isn't of the class type shouldn't be members of the class, for example, the << and >> operators of our MyString class. Operators that change the state of their object should be members. The assignment (=), subscript ([]), call (()), and member access arrow (->) operators must be members.

Friend Declarations

If we want to have nonmember operators that access our nonpublic data members, we need to declare them as friends, as we do in mystring.h.

	// operator+
	friend MyString operator+(const MyString& s1, const MyString& s2);

	// put-to operator
	friend ostream& operator<<(ostream& os, const MyString& s);

	// get-from operator
	friend istream& operator>>(istream& is, MyString& s);

Remember this joke: "Only you and your friends can touch your private parts."

Const Member Functions

Note that the const version of the subscript operator in MyString has const at the end of its prototype. What's that about? A const member function is a member function that promises not to modify the object on which the function is being called. As such, the const version of the MyString subscript operator promises to not modify the contents of the MyString that it's subscripting. Note that we can cast away the constness of *this so that we can reuse our nonconst subscript operator:

// operator[] const - in real life this should be inline

const char& MyString::operator[](int i) const
{
    // illustration of casting away constness
    return ((MyString&)*this)[i];
}