Network Working Group                                  L. Kleinrock  (UCLA-NMC)
Request for Comments:  626                             H. Opderbeck  (UCLA-NMC)
NIC #22161                                                             Mar 1974

                "On a Possible Lockup Condition in the
                 IMP Subnet Due to Message Sequencing"

Lockup or deadlock conditions are one of the most serious system malfunctions
that can occur in a computer system or network.  Communication protocols have 
to be designed very carefully to avoid the occurrence of these lockups.  Their
common characteristic is that they occur only under unusual circumstances which
were not foreseen or deemed too unlikely to occur by the protocol designers.  
(However, these designers often are not the ones in a position to evaluate
such likelihoods quantitatively.)

The best known lockup that has occurred in the ARPANET is the reassembly
lockup [1].  The store-and-forward lockup, also described in Reference 1, has
been avoided in the new IMPSYS by carefully observing Kahn's heuristics [1].  
The last lockup in the subnet we know of occurred on December 21, 1973 
(Christmas lockup").  This dormant lockup conditions was brought to light
by collecting snapshot measurement messages from all sites simultaneously.
The Christmas lockup happened when snapshot messages arrived at ther UCLA IMP
which had allocated reassembly storage for them and no reassembly blocks were
free.  (A reassembly block is a piece of storage used in the actual process
of reassembling packets back into messages) [2]. 

Deadlock conditions have not only been observed in the subnet but also in
higher level protocols.  The original design of the ICP, for example, had a
flaw that was discovered only after a few months of use [3,4].  More recently
BBN reported a deadlock problem involving the exchange of HOST status
information by the RSEXEC server (RSSER) programs [5].

As long as it is not possible to design practical communication protocols
which guarantee deadlock-free operation it is vital to continually check those
protocols that are currently in use for any such failures - even if they appear
"very unlikely" to occur.  In this RFC we comment on a possible deadlock
condition in the IMP subnet which, to our knowledge, has not yet occurred,
neither has it been identified.  Though we have never seen this problem 
actually happen it may occur in the future if no precautions are taken.  This
possible lockup condition is due to the sequencing of messages in the subset.

To avoid the occurrence of reassembly lockup, the flow control mechanism in
the subnet was modified in some significant ways.  Currently there is a limit
of four messages that can simultaneously be in transmission between any pair of
source and destination IMPs.  As a result of removing the link-handling from
the old IMPSYS, it became necessary to introduce a message sequencing 
mechanism.

                                   -1-

Network Working Group				
Request for Comments:  626

Messages leave the destination IMP in the same order as they entered the source
IMP.  (Note that the sequencing is done on an IMP-to-IMP basis, not a HOST-to-
Host basis.  This may introduce undesirable "sequencing delay" if two HOSTs 
attached to the same destination IMP receive messages from the same source 
IMP).

Sequencing of messages has, in general, the potential of introducing deadlock
conditions.  The reason for this is that any message, say message (n+1), which
is out of order (and therefore cannot be delivered to its destination HOST)
may use up resources that are required by message (n) which must be delivered
next.  Therefore, message (n) cannot reach its destination IMP which, in turn,
prevents the other messages (n+1), etc) that are out of order from being 
delivered to their destination HOST(s).  For this reason one has to be very
careful not to allocate too many resources (e.g., buffers) to messages that
are out of order.

To avoid lockup conditions the current IMPSYS takes the following two
precautions:

	1.  Requests for buffer allocation are always services in order
	    of message number; i.e., no 'ALLOCATE' is returned for 
	    message (n+1) if message (n) (or a request for buffer
	    allocation for message (n)) has not yet been received and
	    serviced.

	2.  Single packet messages (regular and priority) that arrive
	    at the destination IMP out of order are not accepted unless
	    they were retransmitted in response to a previous buffer
	    allocation.  These messages are treated rather as a request 
	    for the allocation of one buffer (according to 1 above) and
	    then discarded.

With these two precautions a deadlock condition appears to be impossible to
occur.  However, there is a second buffer allocation mechanism that is not
tried to the message sequencing, namely the 'ALLOCATE' that is piggy-backed
on the RFNM for a multiple-packet message.  The piggy-backed ALLOCATE
represents a buffer allocation for the next multiple-packet message, and NOT
for the next message in sequence.  Thus, if the next message in sequence is
a single-packet message, the piggy-backed ALLOCATE in effect allocates
buffers to a message that is out of order.

Let us see how this situation can lead to a deadlock condition.  Assume there
is a maximum number of 24 reassembly buffers in each IMP.

                                   -2-

Network Working Group
Request for Comments:  626

Let IMPs A, B, and C continually transmit 8-packet messages to the same
destination IMP D such that all 24 reassembly buffers in IMP D are used up by
this transmission of multiple packet messages.  If now, in the stream of
8-packet messages, IMP A sends a single-packet message it will generally not
be accepted by destination IMP D since there is no reassembly buffer space
available.  (There may be a free reassembly buffer if the single-packet message
happens to arrive during the time one of the three 8-packet messages is being
transmitted to its HOST).  The single-packet message will therefore be treated
as a request for buffer allocation.  This request will not be serviced before
the RFNM of the previous multiple-packet message is sent.  At this time, 
however, all the free reassembly buffers have already been allocated to the
next multiple-packet message via the piggy-backed ALLOCATE mechanism.  The
only chance for the single-packet message to get its allocation request
satisfied is to grab a reassembly buffer from one of the other two 8-packet
messages.  This attempt may be unsuccessful because it depends on the timing
of events in the IMP. A deadlock condition can occur if IMP B and IMP C
also send a single-packet message in their stream of 8-packet measures which
cannot be serviced for the same reason.  In this case, the three 8-packet
messages that will arrive later at IMP D cannot be delivered to their
destination HOST(s) because they are out of order.  The three single-packet 
messages that should be delivered next, however, will never reach the
destination IMP since there is no reassembly space available.

A possible sequence of event that leads to a deadlock condition can be
described as follows:

  Examples for Notation:

	event:  A8 arrives  ->		all 8 packets of the 8-packet message
					from IMP A have arrived at IMP D

	event:  C1 arrives  ->		a single packet message from IMP C has
					arrived at IMP D (and is treated as a
					request for buffer allocation)

	event:  B8 complete ->		the last packet of the 8-packet
					message from IMP B has been received
					by its destination HOST

	event:  A8 RFNM/ALL ->		a RFNM with the piggy-backed ALLOCATE
					is sent to IMP A

                                   -3-

Network Working Group
Request for Comments:  626

		    # of Allocated	# of Reassembly		# of Free
		     Reassembly		   Buffers		Reassembly
		      Buffers              in Use		 Buffers

     Initially		24		      0			    0
 1.  A8 arrives		16		      8			    0
 2.  B8 arrives		 8		     16			    0
 3.  C8 arrives		 0		     24			    0
 4.  Al arrives		 0		     24			    0
 5.  B1	arrives		 0		     24			    0
 6.  C1 arrives		 0		     24			    0
 7.  A8 complete	 0		     16			    8
 8.  B8 complete	 0		      8			   16
 9.  C8 complete	 0		      0			   24
10.  A8 RFNM/ALL	 8		      0			   16
11.  B8 RFNM/ALL	16		      0			    8
12.  C8 RFNM/ALL	24		      0			    0
13.  A8 arrives		16		      8			    0
14.  B8 arrives		 8		     16			    0
15.  C8 arrives		 0		     24			    0
16.  - deadlock -

Note that an ALLOCATE for one of the single-packet messages A1, B1 and C1 can
only be returned to source IMP A, B, and C, respectively, after the RFNM
(with its piggy-backed ALLOCATE) for the previous 8-packet message has been
sent.  If these RFNMs are sent in sequence, i.e., without an ALLOCATE for
one of the single-packet messages in between, the temporarily freed reassembly
storage (events (7) through (9) is implicitly allocated to the next multiple-
packet messages (events (10) through (12) and not to any of the single-packet
messages.  The next 8-packet messages are, however, out of order and
cannot be delivered to their destination HOST(s).

Right now it looks as if such a lockup can only occur if the number of
reassembly buffers is a multiple of eight.  Indeed, the probability of a 
lockup in this latter case is much higher.  However, deadlocks can also occur
if the number of reassembly buffers is not a multiple of eight.

Let us assume there are 26 instead of 24 reassembly buffers.  Assume also that,
due to alternate paths, line failure, or retransmission, the second packet of
a 2-packet message arrives at IMP D before a single-packet message from the 
same source IMP A.  The single-packet message has a smaller sequence number 
and must therefore be delivered to its destination HOST before the 2-packet
message.  When the second packet of the 2-packet message arrives at IMP D the
IMP realizes that only 2 of the allocated 8 buffers will be needed.  Therefore

                                   -4-

Network Working Group
Request for Comments:  626

6 buffers are returned to the pool of free reassembly buffers.  If there were
26-3x8=2 buffers in the pool before, the pool size is increased by 6 to 8
buffers.  These 8 buffers, however, are just enough to send out another
piggy-backed ALLOCATE.  The single-packet message will therefore find the pool
of free reassembly buffers empty although the total number of reassembly
buffers is not a multiple of eight.  The 2-packet message cannot be delivered
to its destination HOST because it is out of order.  Therefore the deadlock
condition we described before may occur again.

We agree that the above mentioned sequence of events is unlikely to occur
(otherwise one would have observed it already).  This is particularly true
since the current maximum number of reassembly buffers (58) is much larger
than 24.  Judging from past experience with computer systems and networks,
however, we know that even very unlikely events have a tendency to occur in the
long run.  Also, the probability of this deadlock condition increases with 
increasing traffic in the net.  Therefore, it is certainly worthwhile to
modify the IMPSYS in such a way that this deadlock cannot occur.  It turns out
that a minor modification already achieves the desired effect.  Recall that 
that described deadlock can only occur because single- and multiple-packet 
messages use the same pool of reassembly buffers.  If we set aside a single
reassembly buffer (or one for each destination HOST) that can be used only by 
single-packet messages this lockup condition due to message sequencing cannot
occur.

Another solution to this problem is, of course, to more the message sequencing
from the IMP to the HOST level.  This does not mean that similar lockup
problems cannot occur on the HOST level.  Also, it is currently much easier
to resolve this problem by a slight modification of the IMPSYS rather than
having to coordinate all the various HOSTs.  Another alternative is to
discontinue the use of multiple-packet messages.  However, this also involves
much more drastic changes which require careful consideration.

                                   -5-

Network Working Group
Request for Comments:  626

                              REFERENCES

1.  Kahn, R. E. and W. R. Crowther, "Flow Control in a Resource-Sharing
    Computer Network," IEEE Transactions on Communication, Volume COM-20,3
    June 1972.

2.  BBN Report No. 2717, "Interface Message Processors for the ARPA Computer
    Network," Quarterly Technical Report No. 4, January 1974.

3.  Naylor, W., J. Wong, C. Kline, and J. Postel, "Regarding Proffered
    Official ICP," RFC 143, NIC 6728.

4.  Postel, J. B. "A Graph Model Analysis of Computer Communications
    Protocols," PhD dissertation, University of California, Los Angeles.

5.  BBN Report No. 2670.  "Natural Communication with Computers IV,"  Quarterly
    Progress Report No. 12, October 1973.

                                   -6-