I'm looking for an A quality answer. The due is 22st at 4pm (UTC-08:00 pacific time us&canada) here is the instruction for this assignment. In C Improve my multithreading example as follows: (for bo

Vladimir Sverdlov. CS222. Addendum to lecture 15. Multithreading.

Summary.

I was not going to talk about multithreading. It is not in the course's catalog, nor in the syllabus; I specifically indicated that it is not in our scope and that writing a multithreading application is well beyond this advanced class. So I was not going to talk about that.

However, I received a request to do so. Initially my first move was to deny that request. Writing a multithreading app is involved. You probably saw a lot of examples confirming what I just said... Even with the professional software. Writing a good and robust multithreading app is involved and If we'd do that, it would probably be full good two, three weeks or maybe more for a simple program.

But then one thing occurred to me.

I will talk about that, to underscore once again some of the rules and practices that I so adamantly had been pushing upon you for the entire semester, and which you so persistently resisted to.

1. Introduction If you search on the web you will see a lot of examples how to write the multithreading apps. You will see – how easy it is in the new (since C++-11) C++. Lucky you, if you will have to do that, you most likely would not have to do it with the POSIX library. Lucky you!.. Anyways, there are a lot of examples out there. What I will do here, I will take almost what you could find online, or books … and extend it ... just a little. I will connect it to the techniques and approaches of writing the code that you consistently used in your code, and which, I am sure, you'd still continue doing so, had not I've been so insisting.

I once received a concern, that the first half of the semester was … slow, while the rest was brutal. But...

how could we move forward if submissions after the submissions I still see the same problems... again and again and only the midterm seemingly brought some changes in mind? You cannot write a … program, if you cannot write a … program.

Anyways. I will write a code purposefully showing to you the danger and vulnerability of using global variables, defines , and the practice to not checking for the resources, which some of you do even now.

One thing I will not include in this list – is to neglect to do the housekeeping. The reason for that – I do not want to restart my computer after running that code.

With that introduction, let's discuss some concepts.

2. When to write a multithreading app?

But first: #define multithreading application MTA As with everything else, you do not write the MTA just because you want to and just because you can.

You need to have a good justification for that.

1. Suppose you write a heavy computational intensive software. If you can split the computations in parts, and if you can assemble the results of those individual computations into the whole unit afterwards, that would be a good candidate for the MTA .

2. Suppose, you have a software dealing with heavy and lot of data. If you can split the data into the smaller chunks, and if you can re-constitute the final result afterwards ( divide and conquer strategy, should be familiar for people who took the algorithms), that would be a good candidate for the MTA .

1. As a part of the ##1 and 2 above, suppose you have a lot of input-output. For example, you read from / write to a database a lot of data. If your database stores data in different partitions, you may consider dedicating separate thread for each partition.

3. Suppose you do heavy computations and you need to update the GUI in a real time... That one too. That one is tricky to do correctly such that your GUI does not freeze, hung, crash, becomes non-responsive or erratic etc. And because of GUI, the errors and mistakes here are especially and visually seen.

4. Server side applications. The best and classical example – the web server. People probably know how the web server works. There is a main thread (daemon) that listens to the incoming connection, and when one comes, it dispatches a new thread to serve that request, while keeping listening to the other connections.

Let me pause here for a moment. In the last example, a new thread is spawned every time a new request comes. That's one the reason why the DOS attacks are so hard to withstand. The problem is not only that the network resources are consumed; the other and a big problem is that the new threads are used.

Hardware, however powerful it can be, still has limits. OS, however robust they can be, still have limits. Number of the threads in the system, as with any other systems' resource is finite. There is only so many threads available. People who took my Linux class can probably recall the example how easy and how fast I could exhaust the threads resources, after which the system essentially stops responding...

I talk about that not because I just wanted to, but because I wanted to continue discussion going to this:

3. Some considerations when you decide whether you want to / need to do a MTA So, you run the application, and you decided, that you want to split some of the tasks between two (or more) threads. Let's consider for now that you run everything on one single machine. Suppose, it was a dedicated machine, that was used to run your application only. That means, that (aside from the OS overhead), all available resources were dedicated to your process. Now you added one more process to that. That means, you take some share of the CPU, memory, storage, networking, file handlers... etc from your first thread. That means its productivity per unit of time will diminish because there is another process that you will need to share the resources with. If both of these threads use a lot of memory allocation/deallocation, memory performance may degrade faster, and when it comes to the certain point, the OS will need to re-align the memory, which considerably slows all the processes running on the system. In short, when you spawn a new thread, you will not get a 100% increase in productivity. In reality that will be less. In worst cases you may even not see that much of the gain as you hoped, but you heavily invested into writing the MTA already... Bummer.

To summarize, as with everything else, you need to carefully evaluate and assess, whether the MTA is what you really need.

Now, I made a comment about running all threads on one single machine. But you may not be constrained to that. You may have array of machines available. Or you may have multiple cores. If you can utilize that infrastructure and make your threads to take the dedicated machine or the core, that is where you really unleash the processing power.

Some years ago there was a community project to track the near Earth dangerous space objects. If I remember correctly, that was run and managed by some of the university's Observatory in Australia (cannot recall the names). The setup was that anyone could sign up and while your computer is idle, its resources were used to run parts of these computations. That is a good example of distributed computing. But again, that is not in the scope of this course, and I need to keep the format and the size of the lecture.

4. What you need to know before you start writing an MTA 1. So, you started your program, and at some point you spawn a new thread. You need to understand, that everything before the bifurcation point is shared and available to that new thread. Everything after the point of bifurcation may be the thread-specific, but what's before is fully inherited into the thread, including env variables. Thus, if you have global variables (those not in the particular scope, but in the scope of the entire program), they will be available and accessible to all the threads. That can be a good thing – people often use that property to control the threads, or to send communications to a thread, or do a inter-threads communications using the global variables. But however easy and simple this may seem, that is a potential security flaw or source of very hard to figure out bugs.

2. Alright, you still need to share some resources (files, variables....), what to do?

When you do an MTA , which is a little more involved than the examples you may see online, you will need to use a lot of care of correctly executing (schedule execution) of some parts of your code. You also need to use a lot of care updating/accessing some of the variables, resources etc. The simplest (and again, probably the most ubiquitous) example: You update the record in the database. You do not want anyone to access (read or even more so write access) of that record until you finish writing to that record. So, when you do an MTA , you need to know and correctly execute two things:

a. A critical section of your code. That is the part that should be executed in one block, without interruption or interference. When you come to the critical section of the code, you lock the execution (for example by setting a semaphore), and when you leave that section, you unlock (release) the block.

b. A concept of atomic operation. Say, you want to update some variable. For example, you want to increment the counter. You want to do it in such a way, that no other thread would be able to access that variable during that operation either for reading or for writing.

3. You need to know how to schedule your threads. Sometime you want the execution to be in certain order or sequence or otherwise synchronized.

4. You need to understand, that unless you take a special care (and thus, diminishing the advantage of the multiple threads running in parallel), the output may (and will be) interlaced and completely out of order. So if you look at the log, for example, you may have hard time to trace everything down.

5. Finally, testing of the MTA is an involved task by itself.

Alright. Enough with the boring theories, let's see some code.

5. The code.

If you search online, you will see that with the new thread library you can make a threads by number of different options:

1. By passing a pointer to the function. We discussed that approach and we used that very similar approach when we passed a sorting predicate to the STL's sorting functions. Because of that let's not use it here.

2. By passing a functor. That's interesting. Instead of passing just a function to the threads, you pass the entire object. That gives you more power. We talked about functors in one of the previous lectures, and in the last lecture too. That is also more involved than the simplest examples. Let's use that.

3. By passing a lambda-expressions. That you will see in the examples too. Here I will allow myself to express a very strong opinion, which is just a my personal opinion and may or may not be shared by many other people. Which is this:

You do not use tools just because you can. If you want to use a lambda-expressions to work in the thread... Well, perhaps you can just use an asynchronous function instead of full-grade threads. So, we will create an object and pass that object to the thread. I will write a very simple MTA with two threads, one writing to the file, and one reading from the file. That may probably already raise a suspicion, that my threads will compete for the same resource.

YES . And that's exactly what I want to show. I will use global variables and lack of the checks – just as most if not all of you did in your codes earlier in the semester. And I will do it to purposefully cause the racing condition.

Racing condition is when more than one thread competes for the same resources.

The racing condition may result in:

1. Denial that resource to other threads 2. Deadlock. Program enters a deadlock and cannot exit from it.

3. Inconsistent, incorrect, un-predictable data or output.

4. It can just crash.

Out of these four, I don't know which one is worse. But I can probably say, that none is good.

So I'll lead the threads into the racing conditions and we'll see what will happen.

But before, the usual disclaimer... you know that, it is just my code etc. You also know the request not to share it or make publicly available and to respect that my request. But in addition to that, if you decide not to honor that my request, at least supply it along with the above complete description, because I don't want any established programmer looking at the code and asking: “Who ever wrote this thing?” Alright. The code. First version. Global variables, no check for the resources. All the comments – inline.

#include #include #include #include #include "mingw.thread.h" //Your include may be different, probably using namespace std; enum MODE { INVALID = -1, READ , WRITE }; //Global vars . Accessible to all threads. No protection whatsoever.

fstream myFile; //That one is even worse. That completely removes compiler's //protection and type checking.

#define FILENAME "mytextfile.txt" //////////// Classes declarations //////////////////// /******************************* * The underlying file functionality. Opens file for read/write access * Only available to the derived classes on which it depends to correctly * set the mode */ class myFileOp { protected :

myFileOp (MODE); ~myFileOp (){} void operator() ( const string &); MODE Mode ; private :

myFileOp (){ Mode = INVALID ;} }; /******************************* * File reader. Inherits file open functionality from its base class.

* Passes the READ mode to the base class. Overrides operator() for the functor */ class myFileReader : public myFileOp { public :

myFileReader (); ~myFileReader (){} void operator() ( const string &) ; }; /******************************* * File writer. Inherits file open functionality from its base class.

* Passes the WRITE mode to the base class. Overrides operator() for the functor */ class myFileWriter : public myFileOp { public :

myFileWriter (); ~myFileWriter (){} void operator() ( const string &) ; }; ///////////////////////// Implementation ///////////////////// ////////////////////// Base class: myFileOp ////////////////////// /******************************* * Constructor. I don't say I critically depend on the correct mode's setting...

* I *CRITICALLY* critically depend on the correct mode's setting.

* >>>>>>>>>>> VERIFY!!!! <<<<<<<<<<<<<<<<<<< */ myFileOp::myFileOp (MODE m) { Mode = (m == READ || m == WRITE ? m : INVALID ); } /******************************* * File open functionality. Implemented as a overloaded functor operator.

* This is overriden and shadowed in the derived classes, but functionality * is still available to them.

* Note, that I use a fstream object, declared globally.

* You know, that is not a good practice, right? */ void myFileOp::operator() ( const string & fname ) { if( Mode == INVALID ) return; myFile.open(fname.c_str(), ( Mode == READ ?

std:: fstream :: in : std:: fstream :: out )); } //////////////////////////// end of myFileOp ////////////////////////////// //////////////////////////// myFileReaded ////////////////////////////////// /******************************* * Constructor. The only job is to set the mode correctly.

* This is done without client's involvement */ myFileReader::myFileReader () : myFileOp( READ ) { } /******************************* * Here is where all the job to read the file is done. Note, that I break out of that * if number of reads exceeds certain value. Also outputs number of reads */ void myFileReader::operator() ( const string & fname ) { myFileOp :: operator ()(fname); //Calling base's func to open the file string line; int count = 0; getline (myFile, line); std::cout<< "\n1: From file reader:\n\t" <= 10) { std::cerr<< "\nFrom fire reader: Data overflow; returning." ; return ; } } std::cout<< "\nDone reading file with counter: >>>" <

* This is done without client's involvement */ myFileWriter::myFileWriter () : myFileOp( WRITE ) { } /******************************* * Here is where all the job to write to the file is done.

* Note, that I write a random strings to the file.

*/ void myFileWriter::operator() ( const string & fname ) { myFileOp :: operator ()(fname); //Calling base's func to open the file for ( int i = 0; i < 100; i++) { stringstream sChars; //I just wanted to show the stringstream obj.

//A *very* useful tool. I recommend to take a look at it //and to its many various useful functions.

for ( int j = 0; j < 50; j++) { sChars << char ( 'a' + rand () % 26); //50 random chars } myFile<

//technically, since I have a file name defined globally, I could //just use it in the thread, just like I did the file handle, //but I wanted to show how the parameters can be passed to the thread.

thread read_thr(fr, FILENAME); //spawning a writer thread thread write_thr(fw, FILENAME); //You acquire resources, you release resources. In that case I //wait for the thread to finish, and collect its status. If I don't //do that... welcome to the world of orphans and zombies ...

//joining the threads is also a way to synchronize the threads execution.

write_thr.join(); read_thr.join(); cout<< endl ; return 0; } OK, let's run it. We will run it few times, noting number of reads from the file. I will also truncate most of the 100 lines of file writings: First run. Number of file reads is zero. Note, BTW, how the output is interlaced. Very hard to make sense out of it.

Second run Number of file reads is 1:

So we already see the problem. We have inconsistent number of reads, and also... It seems that the file reader is denied the access to the file, but since we never bothered to check, we proceeded with the execution of the program, assuming everything is nice and good. Do you think your clients would like that software? Something tells me: “not really”.

Also, I'd like to point to your attention, that it was a file reader that was denied an access, even though, we launched that thread before the file writer .

We need to fix that. We can fix that by number of the options. Probably the correct one would be to define a critical section, and lock the file reading/writing operations. In that case if reader detects that the resource is locked, it enters the sleep mode, periodically waking up and checking for the availability of the resources. When resource becomes available, it would rush to lock it for itself, read the file, not forgetting to unlock it back, and get hold off that file. That would be right chain of events for our MTA .

But. It's just becoming too complex for the very introductory example. You already have 30+ pages of the main lecture to read. Because of that, let's do that in a way, how we did it in our programs. We will check for the successful file operation, and return (not entering the sleep-wait mode, but just return) if the file is not available. I'll show just the relevant parts of the code:

void myFileOp::operator() ( const string & fname ) { myFile.open(fname.c_str(), ( Mode == READ ?

std:: fstream :: in : std:: fstream :: out )); if (!myFile) Mode = INVALID ; } void myFileReader::operator() ( const string & fname ) { myFileOp :: operator ()(fname); if ( Mode == INVALID ) { cerr<< "\nFile Reader: Cannot get file handle, returning." ; return ; } //continue with the function } void myFileWriter::operator() ( const string & fname ) { myFileOp :: operator ()(fname); if ( Mode == INVALID ) { cerr<< "\nFile Writer: Cannot get file handle, returning." ; return ; } //continue with the function } Here we test for the file in the base class, and if that is not available, we set the mode to the INVALID .

Then in the derived classes ( reader and writer ) we check for the mode's status, and if that is INVALID , just return from the function. Let's see how it runs now.

Again, few times. First time:

Error is triggered in the file reader.

Second time:

And this time we see that the error check triggered bailing out in the writer object, and we see program terminated properly.

As I used to say in my live classes:

– Any questions?

Do you see at least one other reason (beyond writing a good, structured code, which is a pleasure to look at) why you want to write a code that follows the best practices?

Alright, That's it. And I see, that I finished that at the bottom of the page. Again. And you say something about coincidence?