|
||||
1.1 Requirements
1.2 AbstractWhen you develop different types of software, sooner or later, you will have to deal with client/server development. To write a comprehensive client/server code is a difficult task for a programmer. This documentation presents a simple but powerful client/server source code that can be extended to any type of client/server application. This source code uses the advanced IOCP technology which can efficiently serve multiple clients. IOCP presents an efficient solution to the "one-thread-per-client" bottleneck problem (among others), using only a few processing threads and asynchronous input/output send/receive. The IOCP technology is widely used for different types of high performance servers as Apache etc. The source code also provides a set of functions that are frequently used while dealing with communication and client/server software as file receiving/transferring functions and logical thread pool handling. This article focuses on the practical solutions that arise with using the IOCP programming API and also presents an overall documentation of the source code. Furthermore, a simple echo client/server which can handle multiple connections and file transfer is also presented here. 2.1 IntroductionThis article presents a class which can be used in both the client and server code. The class uses IOCP (Input Output Completion Ports) and asynchronous (non-blocking) function calls which are explained later. The source code is based on many other source codes and articles: [1, 2 and 3]. With this simple source code, you can:
It is difficult to find a comprehensive but simple source code to handle client/server communications. The source codes that are found on the net are either too complex (20+ classes) or don’t provide sufficient efficiency. This source code is designed to be as simple and well documented as possible. In this article, we will briefly present the IOCP technology provided by Winsock API 2.0, and also explain the thorny problems that arise while coding and the solution to each one of them. 2.2 Introduction to asynchronous Input Output Completion Ports (IOCP)A server application is fairly meaningless if it cannot service multiple clients at the same time, usually asynchronous I/O calls and multithreading is used for this purpose. By definition, an asynchronous I/O call returns immediately, leaving the I/O call pending. At some point of time, the result of the I/O asynchronous call must be synchronized with the main thread. This can be done in different ways. The synchronization can be performed by:
2.2.1 Why using IOCP?By using IOCP, we can overcome the "one-thread-per-client" problem. It is commonly known that performance decreases heavily if the software does not run on a true multiprocessor machine. Threads are system resources that are neither unlimited nor cheap. IOCP provides a way to have a few (I/O worker) threads handle multiple client input/output "fairly". The threads are suspended and don't use the CPU cycles until there is something to do. 2.3 What is IOCP?We have already stated that IOCP is nothing but a thread synchronization object, similar to a semaphore, therefore IOCP is not a sophisticated concept. An IOCP object is associated with several I/O objects that support pending asynchronous I/O calls. A thread that has access to an IOCP can be suspended until a pending asynchronous I/O call is finished. 3 How does IOCP work?To get more information on this part, I referred to other articles [1, 2, 3, see references]. While working with IOCP, you have to deal with three things, associating a socket to the completion port, making the asynchronous I/O call, and synchronization with the thread. To get the result from the asynchronous I/O call, and to know, for example, which client has made the call, you have to pass two parameters. The 3.1 The completion key parameterThe first parameter, the 3.2 The OVERLAPPED parameterThis parameter is commonly used to pass the memory buffer that is used by the asynchronous I/O call. It is important to note that this data will be locked and is not paged out of the physical memory. We will discuss this later. 3.3 Associating a socket with the completion portOnce a completion port is created, the association of a socket with the completion port can be done by calling the function BOOL IOCPS::AssociateSocketWithCompletionPort(SOCKET socket,
HANDLE hCompletionPort, DWORD dwCompletionKey)
{
HANDLE h = CreateIoCompletionPort((HANDLE) socket,
hCompletionPort, dwCompletionKey, m_nIOWorkers);
return h == hCompletionPort;
}
3.4 Making the asynchronous I/O callTo make the actual asynchronous call, the functions BOOL bSuccess = PostQueuedCompletionStatus(m_hCompletionPort, pOverlapBuff->GetUsed(), (DWORD) pContext, &pOverlapBuff->m_ol); 3.5 Synchronization with the threadSynchronization with the I/O worker threads is done by calling the BOOL GetQueuedCompletionStatus( HANDLE CompletionPort, // handle to completion port LPDWORD lpNumberOfBytes, // bytes transferred PULONG_PTR lpCompletionKey, // file completion key LPOVERLAPPED *lpOverlapped, // buffer DWORD dwMilliseconds // optional timeout value ); 3.6 Four thorny IOCP coding hassles and their solutionsThere are some problems that arise while using IOCP, some of them are not intuitive. In a multithreaded scenario using IOCPs, the control flow of a thread function is not straightforward, because there is no relationship between threads and communications. In this section, we will represent four different problems that can occur while developing client/server applications using IOCPs. They are:
3.6.1 The WSAENOBUFS error problemThis problem is non intuitive and difficult to detect, because at first sight, it seems to be a normal deadlock or a memory leakage "bug". Assume that you have developed your server and everything runs fine. When you stress test the server, it suddenly hangs. If you are lucky, you can find out that it has something to do with the With every overlapped send or receive operation, it is possible that the data buffer submitted will be locked. When memory is locked, it cannot be paged out of physical memory. The operating system imposes a limit on the amount of memory that can be locked. When this limit is reached, the overlapped operations will fail with the If a server posts many overlapped receives on each connection, this limit will be reached when the number of connections grow. If a server anticipates handling a very high number of concurrent clients, the server can post a single zero byte receive on each connection. Because there is no buffer associated with the receive operation, no memory needs to be locked. With this approach, the per-socket receive buffer should be left intact because once the zero-byte receive operation is completed, the server can simply perform a non-blocking receive to retrieve all the data buffered in the socket's receive buffer. There is no more data pending when the non-blocking receive fails with A simple practical solution to the 3.6.2 The package reordering problemThis problem is also being discussed by [3]. Although committed operations using the IO completion port will always be completed in the order they were submitted, thread scheduling issues may mean that the actual work associated with the completion is processed in an undefined order. For example, if you have two I/O worker threads and you should receive "byte chunk 1, byte chunk 2, byte chunk 3", you may process the byte chunks in the wrong order namely "byte chunk 2, byte chunk 1, byte chunk 3". This also means that when you are sending the data by posting a send request on the I/O completion port, the data can actually be sent in a reordered way. This can be solved by only using one worker thread and committing only one I/O call and waiting for it to finish, but if we do this we loose all the benefits of IOCP. A simple practical solution to this problem is to add a sequence number to our buffer class and process the data in the buffer if the buffer sequence number is in order. This means that the buffers that have incorrect numbers have to be saved for later use, and because of performance reasons, we will save the buffers in a hash map object (e.g., To get more information about this solution, please go through the source code and take a look at the following functions in the
3.6.3 Asynchronous pending reads and byte chunk package processing problemThe most common server protocol is a packet based protocol where the first X bytes represent a header and the header contains details of the length of the complete packet. The server can read the header, work out how much more data is required, and keep reading until it has a complete packet. This works fine when the server is making one asynchronous read call at a time. But if we want to use the IOCP server's full potential, we should have several pending asynchronous reads waiting for the data to arrive. This means that several asynchronous reads complete out of order (as discussed before in section 3.6.2), and byte chunk streams returned by the pending reads will not be processed in order. Furthermore, a byte chunk stream can contain one or several packages and also partial packages as shown in figure 1. Figure 1. The figure shows how partial packages (green) and complete packages (yellow) can arrive asynchronously in different byte chunk streams (marked 1, 2 ,3). This means that we have to process the byte stream chunks in order to successfully read a complete package, furthermore we have to handle partial packages (marked with green in figure 1). This makes the byte chunk package processing more difficult. The full solution to this problem can be found in the 3.6.4 The access violation problemThis is a minor problem and is a result of the design of the code, rather than an IOCP specific problem. Suppose that a client connection is lost and an I/O call returns with an error flag, then we know that the client is gone. In the parameter The solution to this problem is to add a number to the structures that contain the number of pending I/O calls ( 3.7 The overview of the source codeThe goal of the source code is to provide a set of simple classes that handle all the hassled code that has to do with IOCP. The source code also provides a set of functions which are frequently used while dealing with communication and client/server software as file receiving/transferring functions, logical thread pool handling, etc.. Figure 2. The figure above illustrates the overview of the IOCP class source code functionality. We have several IO worker threads that handle asynchronous I/O calls through the completion port (IOCP), these workers call some Figure 3.The figure above shows the class overview. The classes that can be observed in figure 3 are:
3.7.1 The buffer design – The CIOCPBuffer classWhen using asynchronous I/O calls, we have to provide a private buffer to be used with the I/O operation. There are some considerations that are to be taken into account when we allocate buffers to use:
All the solutions to the problems we have discussed above, exist in the 3.8 How to use the source code?By inheriting your own class from 3.8.1 Starting and closing the server/clientTo start the server, call the function: BOOL Start(int nPort=999,int iMaxNumConnections=1201, int iMaxIOWorkers=1,int nOfWorkers=1, int iMaxNumberOfFreeBuffer=0, int iMaxNumberOfFreeContext=0, BOOL bOrderedSend=TRUE, BOOL bOrderedRead=TRUE, int iNumberOfPendlingReads=4);
To connect to a remote connection (client mode Connect(const CString &strIPAddr, int nPort)
To close, make the server call the function: For example: MyIOCP m_iocp; if(!m_iocp.Start(-1,1210,2,1,0,0)) AfxMessageBox("Error could not start the Client"); …. m_iocp.ShutDown(); 4 Source code descriptionFor more details about the source code, please check the comments in the source code. 4.1.1 Virtual functions
4.1.2 Important variablesNotice that all the variables have to be exclusively locked by the function that uses the shared variables, this is important to avoid access violations and overlapping writes. All the variables with name XXX, that are needed to be locked, must have a XXXLock variable.
4.1.3 Important functions
5 File transferThe file transfer is done by using the Winsock 2.0
The transfer file is made in this order. The sever initializes the file transfer by calling the 6 The source code exampleThe provided source code example, is an echo client/server that also supports file transmission (figure 4). In the source code, a class The most important part of the client or server code is the void MyIOCP::NotifyReceivedPackage(CIOCPBuffer *pOverlapBuff, int nSize,ClientContext *pContext) { BYTE PackageType=pOverlapBuff->GetPackageType(); switch (PackageType) { case Job_SendText2Client : Packagetext(pOverlapBuff,nSize,pContext); break; case Job_SendFileInfo : PackageFileTransfer(pOverlapBuff,nSize,pContext); break; case Job_StartFileTransfer: PackageStartFileTransfer(pOverlapBuff,nSize,pContext); break; case Job_AbortFileTransfer: DisableSendFile(pContext); break;}; } The function handles an incoming message and performs the request sent by the remote connection. In this case, it is only a matter of a simple echo or file transfer. The source code is divided into two projects, IOCP and IOCPClient, which are the server and the client side of the connection. 6.1 Compiler issuesWhen compiling with VC++ 6.0 or .NET, you may get some strange errors dealing with the “if (pContext->m_File.m_hFile != INVALID_HANDLE_VALUE) <-error C2446: '!=' : no conversion " "from 'void *' to 'unsigned int'” This problems can be solved if you update the header files (*.h) or your VC++ 6.0 version, or just change the type conversion error. After some modifications, the client/server source code can be used without MFC. 7 Special considerations & rules of thumbWhen you are using this code in other types of applications, there are some programming traps related to this source code that can be avoided. Nondeterministic errors are errors that occur stochastically “Randomly”, and it is hard to reproduce these nondeterministic errors by performing the same sequence of tasks that created the error. These types of errors are the worst types of error that exist, and usually, they occur because of errors in the core design implementation of the source code. When the server is running multiple IO working threads serving clients that are connected, nondeterministic errors as access violation can occur if the programmer has not thought about the source code to be for a multithreaded environment. Rule of thumb #1:Never read/write to the client context (e.g.,
Be also aware of that when you are locking a context, other threads or GUI are waiting for it. Rule of thumb #2:Avoid or "use with special care" code that has complicated "context locks" or other types of locks inside a “context lock”, because this may lead to a “deadlock” (e.g. A waiting for B that are waiting for C that are waiting for A => deadlock). pContext-> m_ContextLock.Lock(); //… code code .. pContext2-> m_ContextLock.Lock(); // code code.. pContext2-> m_ContextLock.Unlock(); // code code.. pContext-> m_ContextLock.Unlock(); The code above may cause a deadlock. Rule of thumb #3:Never access a client context outside the notification functions (e.g., ClientContext* pContext=NULL ; m_ContextMapLock.Lock(); pContext = FindClient(ClientID); // safe to access pContext, if it is not NULL and are Locked (Rule of thumbs#1:) //code .. code.. m_ContextMapLock.Unlock(); // Here pContext can suddenly disappear because of disconnect. // do not access pContext members here. 8 Future workIn future, the source code will be updated, to use the 9 F.A.QQ1: The amount of memory used (server program) is rising steadily with client connections. When looking in 'Windows Task Manager', even if clients disconnect, the amount of memory used has not decreased. What's the problem? A1: The code tries to reuse the allocated buffers instead of releasing and reallocating it. You can change this by altering the parameters: Q2: I get compilation errors under .NET "error C2446: '!=' : no conversion from 'unsigned int' to 'HANDLE'" etc.. What is the problem ? A2: This is because of the different header versions of the SDK. Just change the conversion to Q3: Can the source code be used without MFC? Pure Win32 and in a service? A3: The code was developed to be used with a GUI for an short time (not days or years). I developed this client/server solution for use with GUIs in an MFC environment. Of course, you can use it for normal server solutions. Many people have. Just remove the MFC specific stuff as Q4: Excellent work! Thank you for this. When will you implement A4: As soon as the code is stable, it is quite stable right now but I know that the combination of several I/O workers and several pending reads may cause some problems. I enjoy that you like my code. Please vote!. Q5: Why start several I/O workers? Is this necessary, if you don’t have a true multiprocessor computer? A5: No, it is not necessary to have several I/O workers. Just one thread can handle all the connections. On common home computers, one I/O worker gives the best performance. You do not need to worry about possible access violation threats either. But computers are getting more powerful each day (e.g. hyper-threading, dual core, etc.), so why not have the possibility to have several threads? :=) 10 References
11 Revision History
About spinoza
|