The 'Security Digest' Archives (TM)

Archive: About | Browse | Search | Contributions | Feedback
Site: Help | Index | Search | Contact | Notices | Changes

ARCHIVE: TCP-IP Distribution List - Archives (1986)
DOCUMENT: TCP-IP Distribution List for December 1986 (186 messages, 106384 bytes)
SOURCE: http://securitydigest.org/exec/display?f=tcp-ip/archive/1986/12.txt&t=text/plain
NOTICE: securitydigest.org recognises the rights of all third-party works.

START OF DOCUMENT

-----------[000000][next][prev][last][first]----------------------------------------------------
Date:      Mon, 1 Dec 86 21:25:58 pst
From:      David Cheriton <cheriton@pescadero.stanford.edu>
To:        tcp-ip@sri-nic.ARPA
Subject:   VMTP Checksum Discussion
Dear TCP-IPers:
  I have been working on a request-response transport protocol (VMTP) for
possible adoption in the Internet.  This work is being done as part of my
participation in the IAB End-to-end task force, chaired by Bob Braden.
  Bob has asked me to include this mailing list in on some discussion of
design choices for checksums. Here is my side of the story.

                  VMTP Checksum Controversy

There appears to be three basic issues re VMTP Checksum, or more generally
a checksum for the next generation of transport protocols, namely:
1) what is the size of the checksum field.
2) the algorithm should be used.
3) Where in the packet should the checksum be located.
I spent some time talking with John Gill and Marty Hellman  of the Stanford
Information Systems Lab. as well as thinking about it and some experimentation.
(I also got some good feedback from others in my group, including Steve
Deering, Michael Stumm, Ross Finlayson.)

1) Checksum Size

The checksum is a probabilistic device, clearly.  There are an enormous
number of packets that map to the same checksum for either 16 or 32 bits of
checksum.
Let R be the packet rate (on average) in packets per second.
Let P be the probability of a packet corrupted that is undetected
by the lower levels, per packet.
As long as P is non-zero, there is some time range T at which
T*R*P becomes quite large, i.e. we have to expect some packets
that are erroneous being delivered by the IP level. I'll refer to these
as nasty packets.
If we view the corruption as randomly assigning the packet a new checksum
from the space of possible checksums, it is clear that, with a 16-bit
checksum, there is a 1 in 65,536 chance that the corrupted packet will have
the same checksum.  With a 32-bit checksum, it is one in 4 billion.
The key ingredient we expect to dramatically change over the next 5 years
is the packet rate, for channels, switching nodes and gateways.
It is reasonable to expect a significant increase in traffic, number of
gateways as well as packet rate per gateway. Just to play with some numbers,
let us assume by 1991, 500 gateways each handling 200 per sec. with probability
10**-7 of packets with errors undetected by lower levels (i.e. combination
of all hardware and software from one endpoint to another.) means that
there are 36 packets per hour with undetected errors at lower level.
Assuming random mapping on to checksums, with 16 bit checksum, one might
expect a packet whose error is not detected by the transport-level checksum
every 75 days or so.  Maybe this is OK?  However, suppose these numbers are
low?  For instance, we occasionally gets bursts of packets at 2000/sec on
our Ethernet.  The future parameters and use of communication could easily
push the time between undetected errors down by an order of magnitude
(7 days?)

Of course, one could argue that such a packet may still fail the addressing
checks, the sequence number checks, etc. in the protocol.  However,
our measurements of VMTP indicate 1/2 as being minimal size packets and 1/4
as maximal size packets (for the Ethernet at least).  With small packets,
the probability of random mods to data versus header is .5 whereas for
for large packets it is .95 for the data.  Thus, this argument reduces the
risk by far less than a factor of two. Moreover, it is disquieting to fall
back on these random checks to help us when we expect the "real" error
detection mechanisms is going to fail.

One might also argue that my assumed error rate/per packet is too high.
However, the probability of correct operation is the PRODUCT of
the probabilities for the sequence of components involved in a packet
transmission and delivery - so this error rate could get large fast.
Moreover, one requires a more careful analysis to get a tighter bound.
Can anyone offer a significant improvement in bounds without making
some disconcerting assumptions??

In the same breath, one could argue that my analysis is far too optimistic.
For instance, if the checksum algorithm uniformly maps all possible
packets onto all possible checksums, then the fact that normal packets
may be dramatically skewed to particular types, such as ASCII text
packets to and from an important server, may mean that far fewer
corruptions will be detected.  In particular, a systematic hardware error
that corrupts a packet into a nasty packet (with correct checksum) is
far more likely to encounter the same packet (or similar enough)
than if random packets were being generated.

This "analysis" deals with pleasant, sunny day expected behavior.
A more frightening situation is dealing with burst errors from failing
components.  For instance, suppose a gateway or network interface fails
by curdling random portions of packets.  We could see 200 undetected
(below transport level) errors per second. Again, assuming random
mods, with a 16-bit checksum we can expect to get a corrupted packet
that passes the transport-level checksum every 5.5 minutes,
versus every 8.2 days with a 32-bit checksum.
If we consider a military situation where components may get wounded in action,
it is easy to conceive of several things generating garbage at the worst
possible time, given a very high exposure using a 16-bit checksum.
The 32-bit checksum appears to give a needed marginal of safety,
especially a variety of factors could cause one to effectively loss a
factor of 2 or more.  (As John Gill raised, are you sure 32-bits is enough?!)

Overall, after studying this issue in brief and thinking thru the above,
I am even more convinced we need to get to at least a 32-bit checksum.
The increased expected packet rate is the key thing that is making 16-bit
checksums unacceptable.

2) Checksum algorithm

There appears to be a deficiency of theory behind checksums that are suitable
for software implementation.  Calculating 32-bit CRC's is infeasible - either
too slow or requires an enormous lookup table.  The ideal algorithm is:
i) very fast to calculate and check in software.
ii) can be shown to detects common types of packet corruption including,
iii) can be shown to be unupdateable "thru" an encryption scheme,
  I.e. intruder cannot update (encrypted) checksum field to compensate for
   modifications to other encrypted data.

Lots of schemes can be proposed but the more complex, the more difficult it is
to analyze with respect to (ii) and (iii).
The current algorithm, assuming we extend it to a 32-bit version
does not detect appending or inserting 0's in a packet
or the reordering of 16-bit (or 32-bit) units. These seem like not unlikely
failures in a large internetwork with gateways or different byte orders
and flakey hardware, etc.

Another candidate is "add-and-rotate" as used by RDP, plus suggested
(without guarantee) by John Gill. It also has the property that any single
32-bit word mod cannot leave the checksum calculated correctly (also true of
the TCP algorithm). 
The "rotate" provides some detection of zeros and reordering.  However,
clearly there are occasions that it will fail to detect reordering of
32 long word blocks, or insertion of 32 long blocks of zeros.
(It also requires some assembly language in most languages for efficient
implementation.) Other properties are not clear to me.

Another algorithm is the ISO transport algorithm, which is easily extended
to 32-bits or 64-bits. I.e. R = Sum(1..L) Wi and S = Sum(1..L)(L-i+1)Wi
where Wi is the ith (32-bit) word and there are L bits in the packet.
According to Fletcher's analysis (See ref. below.), this algorithm
detects all single bit errors and detects all but .001526 percent of all
possible errors, same as CRC.  It also detects all burst errors of length
C bits, where C is the number of bits in the checksum. Fletcher claims:
"it represents a very reasonable tradeoff between effectiveness and efficiency".

I did some basic performance tests on these algorithms.
The following gives the performance for the three algorithms for checksumming
a VMTP packet with 1024 bytes of data (assumed to be long word aligned).

Algorithm             MC 68010 (10 MHz)    MC 68020 (16 MHz)
add-and-rotate              916 usec                 320 usec.
ISO Transport (32-bit)      914 usec                 336 usec.
ISO (16-bit)               1245 usec                 446 usec.
TCP                         890 usec                 323 usec.

The TCP figure is really not too accurate - it lacks a test for carry on
each 32-bit add. The first ISO version uses 32-bit adds to develop the two sums
using 2's complement arithmetic (and then I just store the sum of the two
sums - a proper scheme here would require 64-bits for the checksum field.)
(Fletcher notes that the 1's complement version has slightly better properties
but is more expensive to compute.)
The second one uses 16-bit adds, which means a register swap is necessary as
well as doubling the number of adds required. The results here appear to
contradict Jon Crowcroft's comments on performance.  Perhaps his factor of
4 comes from a poor implementation??
The ISO 32-bit version and the TCP version are in (portable) non-asm C,
whereas the others are done in assembly language. An assembly version
of ISO 32-bit appears about 80 usecs faster on a 68020.

All these algorithms have the property of requiring only one
memory reference (to get the 32-bit byte out of the packet) per word
in the packet, which is the minimal memory reference cost, given 32-bit
data paths.  The numbers seem to indicate that a small number of instructions
per long word (that do not increase # memory references) make no significant
difference in performance of the algorithm.  This is particularly true on
machines like the 68020, where the inner loop fits into the on-chip
instruction cache, and this is the future. For instance, the ISO and TCP
versions differ by essentially one addition, and thus by 13 usecs
(for 1108 register-register adds!)

Thus, I argue that there is no significant performance penalty to adding a
few non-memory reference instructions for a rotate or extra addition
to get better detection capabilities.

3) Location of the checksum

I propose to place the VMTP checksum into a trailer portion of the packet.
This allows a high-performance network interface, such as the one we are
developing here at Stanford, to add the checksum as the packet is transmitted
to the network, avoiding 2 extra memory references.  It does not appear to
add any additional software overhead without such an interface, unless one
is trying to use a relatively simple DMA net interface and DMA packets
on transmission and reception without copy.  Since this is in general not
feasible without scatter-gather, and with, the trailer presents no problem,
I dont think this is a real disadvantage.

My vision of the future is of network interfaces that implement much or
all of the transport level, providing a service similar to the disk interface.
I.e. one does not generally worry in software about errors between main memory
and the disk.  Note that the error checking is still end-to-end at least
between network interfaces in the end machines.  Also, with request-response
protocols, the response is an application-level end-to-end acknowledgement of
the request.

For these interfaces, performance is determined by the number of memory
references.  The next generation of network interfaces must minimize
the number of memory references to transmit and receive packets or they
will be slow.  In fact, one of the big challenges (the only one?) is to
implement "intelligence" on the network interface without being killed
by the delay of extra copies (i.e. memory references.)
  Clearly, the value of the checksum is not known until the entire packet
has been scanned.  Putting the checksum at the end allows the checksum
to be calculated as part of transmitting the packet from the network
interface to the network.  The only other copy into which it might be
incorporated, between the host and the network interface, has several
problems.  First, the data is not necessarily packetized at that point,
and definitely not in our network interface design.  Second, the data rate
on that path can be so high (and must be in multiprocessors) that fitting
a checksum calculation into the copy is difficult.  Moreover, one may
want to do encryption at the same time.   For instance, we move data
at 320 megabits/sec. in our design for the VMEbus.  Much higher speeds
are planned.

Currently, we are dealing with very "1st" generation network interfaces
that are dealing with multi-megabit data rates.  I think the next generation
can be much more reliable, efficient, and powerful.  We need to design
the transport protocol such that it does not introduce unnecessary
difficulties for such interfaces.
Thus, I feel this minor modification to the protocol supports the right way
of doing network interfaces, and that it aids these interfaces in dealing
with an intrinsic problem, not one that will go away with a few more bells
and whistles on the interfaces.

One issue a trailer checksum raises is alignment relative to encryption.
I have tried to keep the header as a multiple of 64 bits.
With a null data segment and 32-bit checksum, the VMTP packet is still a
multiple of 64 bits.  If we force the datasegment to also be a multiple
of 64 bits, this all remains true still.  This is what I propose to do.

In summary, I recommend we go for a 32-bit checksum and use the ISO
checksum algorithm, modified to use 32-bit units.
The ISO checksum is a standard and has an article analyzing its
properties (to some extent). (If we also start the checksum
using a non-zero initial value, it should do well at detecting a packet
of all zeroes.) Finally, we place the checksum in a single 32-bit trailer
portion of the packet.

Comments?

David Cheriton

Ref>

J.G. Fletcher, An Arithmetic Checksum for Serial Transmission,
IEEE Trans. on Com. COM-30 (1) 247-251, Jan, 1982.
-----------[000001][next][prev][last][first]----------------------------------------------------
Date:      Tue, 2-Dec-86 00:25:58 EST
From:      cheriton@PESCADERO.STANFORD.EDU (David Cheriton)
To:        mod.protocols.tcp-ip
Subject:   VMTP Checksum Discussion

Dear TCP-IPers:
  I have been working on a request-response transport protocol (VMTP) for
possible adoption in the Internet.  This work is being done as part of my
participation in the IAB End-to-end task force, chaired by Bob Braden.
  Bob has asked me to include this mailing list in on some discussion of
design choices for checksums. Here is my side of the story.

                  VMTP Checksum Controversy

There appears to be three basic issues re VMTP Checksum, or more generally
a checksum for the next generation of transport protocols, namely:
1) what is the size of the checksum field.
2) the algorithm should be used.
3) Where in the packet should the checksum be located.
I spent some time talking with John Gill and Marty Hellman  of the Stanford
Information Systems Lab. as well as thinking about it and some experimentation.
(I also got some good feedback from others in my group, including Steve
Deering, Michael Stumm, Ross Finlayson.)

1) Checksum Size

The checksum is a probabilistic device, clearly.  There are an enormous
number of packets that map to the same checksum for either 16 or 32 bits of
checksum.
Let R be the packet rate (on average) in packets per second.
Let P be the probability of a packet corrupted that is undetected
by the lower levels, per packet.
As long as P is non-zero, there is some time range T at which
T*R*P becomes quite large, i.e. we have to expect some packets
that are erroneous being delivered by the IP level. I'll refer to these
as nasty packets.
If we view the corruption as randomly assigning the packet a new checksum
from the space of possible checksums, it is clear that, with a 16-bit
checksum, there is a 1 in 65,536 chance that the corrupted packet will have
the same checksum.  With a 32-bit checksum, it is one in 4 billion.
The key ingredient we expect to dramatically change over the next 5 years
is the packet rate, for channels, switching nodes and gateways.
It is reasonable to expect a significant increase in traffic, number of
gateways as well as packet rate per gateway. Just to play with some numbers,
let us assume by 1991, 500 gateways each handling 200 per sec. with probability
10**-7 of packets with errors undetected by lower levels (i.e. combination
of all hardware and software from one endpoint to another.) means that
there are 36 packets per hour with undetected errors at lower level.
Assuming random mapping on to checksums, with 16 bit checksum, one might
expect a packet whose error is not detected by the transport-level checksum
every 75 days or so.  Maybe this is OK?  However, suppose these numbers are
low?  For instance, we occasionally gets bursts of packets at 2000/sec on
our Ethernet.  The future parameters and use of communication could easily
push the time between undetected errors down by an order of magnitude
(7 days?)

Of course, one could argue that such a packet may still fail the addressing
checks, the sequence number checks, etc. in the protocol.  However,
our measurements of VMTP indicate 1/2 as being minimal size packets and 1/4
as maximal size packets (for the Ethernet at least).  With small packets,
the probability of random mods to data versus header is .5 whereas for
for large packets it is .95 for the data.  Thus, this argument reduces the
risk by far less than a factor of two. Moreover, it is disquieting to fall
back on these random checks to help us when we expect the "real" error
detection mechanisms is going to fail.

One might also argue that my assumed error rate/per packet is too high.
However, the probability of correct operation is the PRODUCT of
the probabilities for the sequence of components involved in a packet
transmission and delivery - so this error rate could get large fast.
Moreover, one requires a more careful analysis to get a tighter bound.
Can anyone offer a significant improvement in bounds without making
some disconcerting assumptions??

In the same breath, one could argue that my analysis is far too optimistic.
For instance, if the checksum algorithm uniformly maps all possible
packets onto all possible checksums, then the fact that normal packets
may be dramatically skewed to particular types, such as ASCII text
packets to and from an important server, may mean that far fewer
corruptions will be detected.  In particular, a systematic hardware error
that corrupts a packet into a nasty packet (with correct checksum) is
far more likely to encounter the same packet (or similar enough)
than if random packets were being generated.

This "analysis" deals with pleasant, sunny day expected behavior.
A more frightening situation is dealing with burst errors from failing
components.  For instance, suppose a gateway or network interface fails
by curdling random portions of packets.  We could see 200 undetected
(below transport level) errors per second. Again, assuming random
mods, with a 16-bit checksum we can expect to get a corrupted packet
that passes the transport-level checksum every 5.5 minutes,
versus every 8.2 days with a 32-bit checksum.
If we consider a military situation where components may get wounded in action,
it is easy to conceive of several things generating garbage at the worst
possible time, given a very high exposure using a 16-bit checksum.
The 32-bit checksum appears to give a needed marginal of safety,
especially a variety of factors could cause one to effectively loss a
factor of 2 or more.  (As John Gill raised, are you sure 32-bits is enough?!)

Overall, after studying this issue in brief and thinking thru the above,
I am even more convinced we need to get to at least a 32-bit checksum.
The increased expected packet rate is the key thing that is making 16-bit
checksums unacceptable.

2) Checksum algorithm

There appears to be a deficiency of theory behind checksums that are suitable
for software implementation.  Calculating 32-bit CRC's is infeasible - either
too slow or requires an enormous lookup table.  The ideal algorithm is:
i) very fast to calculate and check in software.
ii) can be shown to detects common types of packet corruption including,
iii) can be shown to be unupdateable "thru" an encryption scheme,
  I.e. intruder cannot update (encrypted) checksum field to compensate for
   modifications to other encrypted data.

Lots of schemes can be proposed but the more complex, the more difficult it is
to analyze with respect to (ii) and (iii).
The current algorithm, assuming we extend it to a 32-bit version
does not detect appending or inserting 0's in a packet
or the reordering of 16-bit (or 32-bit) units. These seem like not unlikely
failures in a large internetwork with gateways or different byte orders
and flakey hardware, etc.

Another candidate is "add-and-rotate" as used by RDP, plus suggested
(without guarantee) by John Gill. It also has the property that any single
32-bit word mod cannot leave the checksum calculated correctly (also true of
the TCP algorithm). 
The "rotate" provides some detection of zeros and reordering.  However,
clearly there are occasions that it will fail to detect reordering of
32 long word blocks, or insertion of 32 long blocks of zeros.
(It also requires some assembly language in most languages for efficient
implementation.) Other properties are not clear to me.

Another algorithm is the ISO transport algorithm, which is easily extended
to 32-bits or 64-bits. I.e. R = Sum(1..L) Wi and S = Sum(1..L)(L-i+1)Wi
where Wi is the ith (32-bit) word and there are L bits in the packet.
According to Fletcher's analysis (See ref. below.), this algorithm
detects all single bit errors and detects all but .001526 percent of all
possible errors, same as CRC.  It also detects all burst errors of length
C bits, where C is the number of bits in the checksum. Fletcher claims:
"it represents a very reasonable tradeoff between effectiveness and efficiency".

I did some basic performance tests on these algorithms.
The following gives the performance for the three algorithms for checksumming
a VMTP packet with 1024 bytes of data (assumed to be long word aligned).

Algorithm             MC 68010 (10 MHz)    MC 68020 (16 MHz)
add-and-rotate              916 usec                 320 usec.
ISO Transport (32-bit)      914 usec                 336 usec.
ISO (16-bit)               1245 usec                 446 usec.
TCP                         890 usec                 323 usec.

The TCP figure is really not too accurate - it lacks a test for carry on
each 32-bit add. The first ISO version uses 32-bit adds to develop the two sums
using 2's complement arithmetic (and then I just store the sum of the two
sums - a proper scheme here would require 64-bits for the checksum field.)
(Fletcher notes that the 1's complement version has slightly better properties
but is more expensive to compute.)
The second one uses 16-bit adds, which means a register swap is necessary as
well as doubling the number of adds required. The results here appear to
contradict Jon Crowcroft's comments on performance.  Perhaps his factor of
4 comes from a poor implementation??
The ISO 32-bit version and the TCP version are in (portable) non-asm C,
whereas the others are done in assembly language. An assembly version
of ISO 32-bit appears about 80 usecs faster on a 68020.

All these algorithms have the property of requiring only one
memory reference (to get the 32-bit byte out of the packet) per word
in the packet, which is the minimal memory reference cost, given 32-bit
data paths.  The numbers seem to indicate that a small number of instructions
per long word (that do not increase # memory references) make no significant
difference in performance of the algorithm.  This is particularly true on
machines like the 68020, where the inner loop fits into the on-chip
instruction cache, and this is the future. For instance, the ISO and TCP
versions differ by essentially one addition, and thus by 13 usecs
(for 1108 register-register adds!)

Thus, I argue that there is no significant performance penalty to adding a
few non-memory reference instructions for a rotate or extra addition
to get better detection capabilities.

3) Location of the checksum

I propose to place the VMTP checksum into a trailer portion of the packet.
This allows a high-performance network interface, such as the one we are
developing here at Stanford, to add the checksum as the packet is transmitted
to the network, avoiding 2 extra memory references.  It does not appear to
add any additional software overhead without such an interface, unless one
is trying to use a relatively simple DMA net interface and DMA packets
on transmission and reception without copy.  Since this is in general not
feasible without scatter-gather, and with, the trailer presents no problem,
I dont think this is a real disadvantage.

My vision of the future is of network interfaces that implement much or
all of the transport level, providing a service similar to the disk interface.
I.e. one does not generally worry in software about errors between main memory
and the disk.  Note that the error checking is still end-to-end at least
between network interfaces in the end machines.  Also, with request-response
protocols, the response is an application-level end-to-end acknowledgement of
the request.

For these interfaces, performance is determined by the number of memory
references.  The next generation of network interfaces must minimize
the number of memory references to transmit and receive packets or they
will be slow.  In fact, one of the big challenges (the only one?) is to
implement "intelligence" on the network interface without being killed
by the delay of extra copies (i.e. memory references.)
  Clearly, the value of the checksum is not known until the entire packet
has been scanned.  Putting the checksum at the end allows the checksum
to be calculated as part of transmitting the packet from the network
interface to the network.  The only other copy into which it might be
incorporated, between the host and the network interface, has several
problems.  First, the data is not necessarily packetized at that point,
and definitely not in our network interface design.  Second, the data rate
on that path can be so high (and must be in multiprocessors) that fitting
a checksum calculation into the copy is difficult.  Moreover, one may
want to do encryption at the same time.   For instance, we move data
at 320 megabits/sec. in our design for the VMEbus.  Much higher speeds
are planned.

Currently, we are dealing with very "1st" generation network interfaces
that are dealing with multi-megabit data rates.  I think the next generation
can be much more reliable, efficient, and powerful.  We need to design
the transport protocol such that it does not introduce unnecessary
difficulties for such interfaces.
Thus, I feel this minor modification to the protocol supports the right way
of doing network interfaces, and that it aids these interfaces in dealing
with an intrinsic problem, not one that will go away with a few more bells
and whistles on the interfaces.

One issue a trailer checksum raises is alignment relative to encryption.
I have tried to keep the header as a multiple of 64 bits.
With a null data segment and 32-bit checksum, the VMTP packet is still a
multiple of 64 bits.  If we force the datasegment to also be a multiple
of 64 bits, this all remains true still.  This is what I propose to do.

In summary, I recommend we go for a 32-bit checksum and use the ISO
checksum algorithm, modified to use 32-bit units.
The ISO checksum is a standard and has an article analyzing its
properties (to some extent). (If we also start the checksum
using a non-zero initial value, it should do well at detecting a packet
of all zeroes.) Finally, we place the checksum in a single 32-bit trailer
portion of the packet.

Comments?

David Cheriton

Ref>

J.G. Fletcher, An Arithmetic Checksum for Serial Transmission,
IEEE Trans. on Com. COM-30 (1) 247-251, Jan, 1982.

-----------[000002][next][prev][last][first]----------------------------------------------------
Date:      Tue, 2-Dec-86 04:54:02 EST
From:      juke@seismo.CSS.GOV@tutctl.UUCP
To:        mod.protocols.tcp-ip
Subject:   Symbolics-VAX/Sun -connection?

I'm mailing this for a friend of mine:

--------------------------------------------------------------------------

Information wanted

How to implement applications that use Symbolics and other computers via
ethernet using TCP/IP or Decnet protocols.

We has need to join together numerical application software that runs on
VAXes and Suns with our interface experiments that are running on Symbolics
workstation.  We should transform information about results and control
processes via ethernet.  Both TCP/IP and Decnet drivers are available on
both ends.

It seems that driver described above is not impossible to implement but we
don't want to reinvent wheel.

		
		Jarmo Salmela
		Tampere University of Technology
		P.O.Box 527
		SF-33101 Tampere
		Finland

--------------------------------------------------------------------------

You can send your replies also to the following address:

		UUCP: ...!mcvax!tutctl!juke

-----------[000003][next][prev][last][first]----------------------------------------------------
Date:      Fri,  5-DEC-1986 10:46 EST
From:      RANDY MARCHANY  <MARCHANYRC%VTVAX3.BITNET@WISCVM.WISC.EDU>
To:        <tcp-ip@sri-nic.arpa>
Subject:   IBM VM and DEC VMS TCP-IP question
We have over 10 VAX systems and an IBM 3090 model 200. The majority of
the Vaxes run VMS 4.4 (although we have one running Ultrix) and the IBM
systems run VM/SP HPO 4.2 with CMS release 4. We are looking at some
software/hardware products that will let us establish a TCP/IP network
here at Va. Tech. We are planning to connect the 3090 to SURANET and
eventually we'll do the same for our Vax systems.
        I am looking at the WISCNET package and at Fibronics (formerly
Spartacus?) KNET 200. I am very little info on the Wollongong software
for the Vax systems. I would like to hear from anyone who has had
any experience with any of these packages. Any information on reliability,
vendor support, installation problems, your opinion of the products, etc.
would be greatly appreciated. You can respond to this list or to me
directly. My name and address is:
        Randy Marchany
        Vax Systems Manager
        Va. Tech Computing Center
        116 Burruss Hall
        Blacksburg, VA 24061            703-961-6327
BITNET: MARCHANYRC@VTVAX3, MARCHANYRC@VTVAX5, HAND2@VTVM1

Also, if there are any other products out there that will perform the
same functions, I'd like to hear about them. Thanks again.
-----------[000004][next][prev][last][first]----------------------------------------------------
Date:      Fri, 5-Dec-86 14:02:29 EST
From:      MARCHANYRC@VTVAX3.BITNET (RANDY MARCHANY)
To:        mod.protocols.tcp-ip
Subject:   IBM VM and DEC VMS TCP-IP question

We have over 10 VAX systems and an IBM 3090 model 200. The majority of
the Vaxes run VMS 4.4 (although we have one running Ultrix) and the IBM
systems run VM/SP HPO 4.2 with CMS release 4. We are looking at some
software/hardware products that will let us establish a TCP/IP network
here at Va. Tech. We are planning to connect the 3090 to SURANET and
eventually we'll do the same for our Vax systems.
        I am looking at the WISCNET package and at Fibronics (formerly
Spartacus?) KNET 200. I am very little info on the Wollongong software
for the Vax systems. I would like to hear from anyone who has had
any experience with any of these packages. Any information on reliability,
vendor support, installation problems, your opinion of the products, etc.
would be greatly appreciated. You can respond to this list or to me
directly. My name and address is:
        Randy Marchany
        Vax Systems Manager
        Va. Tech Computing Center
        116 Burruss Hall
        Blacksburg, VA 24061            703-961-6327
BITNET: MARCHANYRC@VTVAX3, MARCHANYRC@VTVAX5, HAND2@VTVM1

Also, if there are any other products out there that will perform the
same functions, I'd like to hear about them. Thanks again.

-----------[000005][next][prev][last][first]----------------------------------------------------
Date:      Fri, 5-Dec-86 17:40:00 EST
From:      bjp%ulana@MITRE-BEDFORD.ARPA (Barbara J. Pease)
To:        mod.protocols.tcp-ip
Subject:   ULANA STATEMENT OF WORK AND SYSTEM SPECIFICATIONS NOW AVAILABLE


     This note is to announce that a version of the ULANA Statement of Work
(the SOW) and a new version of the ULANA System Specifications are now
available in the guest account at mitre-b-ulana.arpa.  The following informationis should be enough to allow people to get a copy.

      1. use ftp
      2. user is guest
      3. password is anonymous
      4. pathname for sow is /usr/mitre/guest/ulana.sow
      5. pathname for the spec is /usr/mitre/guest/ulana.spec
      6. internet number for mitre-b-ulana.arpa is 192.12.120.30
      7. send any mail to bjp@mitre-b-ulana.arpa


					Sincerely,
					bj Pease

-----------[000006][next][prev][last][first]----------------------------------------------------
Date:      Sat, 06 Dec 86 18:19:44 -0500
From:      Craig Partridge <craig@loki.bbn.com>
To:        tcp-ip%sri-nic.arpa@SH.CS.NET
Subject:   4.2/4.3 TCP and long RTTs

    I'm in the midst of doing comparisons between an RDP implementation and
the 4.2/4.3 TCP implementations and have run into a problem which I'm hoping
someone else can shed light on.

    I'm running tests on two machines, a VAX 750 running 4.3 and a SUN
workstation running 4.2.  The two machines are on the same Ethernet
and use the same gateway.  If I set up an experiment to test behaviour over
paths with long network delays (for example, bouncing packets off
Goonhilly), the TCP connections are established and then typically
fail part way through the transfer.  I don't understand this because
the RDP connections work just fine, and typically complete in 1/4 the
time it takes for a TCP connection to send about 20% of the data and
faint.  The experiment generally involves passing 50-100 segments of anywhere
from 64 to 1024 bytes to the protocols to send.  This is on weekends so
the delays aren't that long.

    The question I'm trying to answer is whether the problem is in the
RDP implementation (what anti-social things could it be doing to maintain
that connection?), or the TCP implementation (what might it be doing wrong
to die where another implementation succeeds?).  If I can, I'd like to
discourage invective.  I'm simply trying to figure out why this is happening
so I can identify and fix the problem and do a comparison between the two
implementations/protocols.  (And soon -- hair pulling over this problem
is beginning to threaten the health of my scalp and beard).

    General information on the RDP implementation:  it will retransmit
up to 10 times and calculates the round-trip time based on the first
packet sent with the caveat that it ignores round-trip times of segments
with sequence numbers lower than those of segments whose round-trip time
has already been computed (this feature is an experiment which may not
stay).  The maximum RTT is 2 minutes, the minimum is 2 seconds.

Craig
-----------[000007][next][prev][last][first]----------------------------------------------------
Date:      Sat, 6-Dec-86 20:38:02 EST
From:      craig@LOKI.BBN.COM.UUCP
To:        mod.protocols.tcp-ip
Subject:   4.2/4.3 TCP and long RTTs


    I'm in the midst of doing comparisons between an RDP implementation and
the 4.2/4.3 TCP implementations and have run into a problem which I'm hoping
someone else can shed light on.

    I'm running tests on two machines, a VAX 750 running 4.3 and a SUN
workstation running 4.2.  The two machines are on the same Ethernet
and use the same gateway.  If I set up an experiment to test behaviour over
paths with long network delays (for example, bouncing packets off
Goonhilly), the TCP connections are established and then typically
fail part way through the transfer.  I don't understand this because
the RDP connections work just fine, and typically complete in 1/4 the
time it takes for a TCP connection to send about 20% of the data and
faint.  The experiment generally involves passing 50-100 segments of anywhere
from 64 to 1024 bytes to the protocols to send.  This is on weekends so
the delays aren't that long.

    The question I'm trying to answer is whether the problem is in the
RDP implementation (what anti-social things could it be doing to maintain
that connection?), or the TCP implementation (what might it be doing wrong
to die where another implementation succeeds?).  If I can, I'd like to
discourage invective.  I'm simply trying to figure out why this is happening
so I can identify and fix the problem and do a comparison between the two
implementations/protocols.  (And soon -- hair pulling over this problem
is beginning to threaten the health of my scalp and beard).

    General information on the RDP implementation:  it will retransmit
up to 10 times and calculates the round-trip time based on the first
packet sent with the caveat that it ignores round-trip times of segments
with sequence numbers lower than those of segments whose round-trip time
has already been computed (this feature is an experiment which may not
stay).  The maximum RTT is 2 minutes, the minimum is 2 seconds.

Craig

-----------[000008][next][prev][last][first]----------------------------------------------------
Date:      Sat, 6-Dec-86 22:20:09 EST
From:      hedrick@TOPAZ.RUTGERS.EDU.UUCP
To:        mod.protocols.tcp-ip
Subject:   Re:  4.2/4.3 TCP and long RTTs

I suspect that what you are seeing is that fact that Sun TCP (to be
fair, most 4.2 based TCP's) doesn't perform well with bad connections.
A number of sites have replaced Sun's TCP modules with their 4.3
equivalents.  This makes a dramatic difference in dealing with
difficult cases.

-----------[000009][next][prev][last][first]----------------------------------------------------
Date:      Sun, 07 Dec 86 09:15:07 -0500
From:      Craig Partridge <craig@loki.bbn.com>
To:        hedrick%topaz.rutgers.edu@sh.cs.net
Cc:        tcp-ip%sri-nic.arpa@loki.bbn.com
Subject:   clarification

> I suspect that what you are seeing is that fact that Sun TCP (to be
> fair, most 4.2 based TCP's) doesn't perform well with bad connections.
> A number of sites have replaced Sun's TCP modules with their 4.3
> equivalents.  This makes a dramatic difference in dealing with
> difficult cases.

    Re-reading my note I realize it isn't clear.  I'm seeing the behaviour
under both 4.2 and 4.3 (though I haven't tested the 4.3 implementation
as extensively).  I did the port to 4.3 for precisely the reasons you
mention and now find myself stymied.

Craig
-----------[000010][next][prev][last][first]----------------------------------------------------
Date:      Sun, 7-Dec-86 08:44:03 EST
From:      van@LBL-CSAM.ARPA.UUCP
To:        mod.protocols.tcp-ip
Subject:   Re: 4.2/4.3 TCP and long RTTs

What you observe is probably poor tcp behavior, not antisocial rdp
behavior.  If the link is lossy or the mean round trip time is 
greater than 15 seconds, the 4.3bsd tcp throughput degrades rapidly.
For long transfers, a link that gives 2.7KB/s throughput with a
1% loss rate, gives 0.07KB/s throughput with a 10% loss rate.  (As
appalling as this looks, 4.2bsd, TOPS-20 tcp, and some other major
implementations that I've measured, get worse faster.  The 4.3
behavior was the best of everything I looked at.)

I know some of the reasons for the degradation.  As one might
expect, the failure seems to be due to the cumulative effect of
a number of small things.  Here's a list, in roughly the order
that they might bear on your experiment.

1. There is a kernel bug that causes IP fragments to be generated
   and ip fragments have only a 7.5s TTL.

In the distribution 4.3bsd, there is a bug in the routine
in_localaddr that makes it say all addresses are "local".  In
most cases, this makes tcp use a 1k mss which results in a lot of
ip fragmentation.  On high loss or long delay circuits, a lot of
the tcp traffic gets timed out and discarded at the destination's
ip level. 

The bug fix is to change the line:
	if (net == subnetsarelocal ? ia->ia_net : ia->ia_subnet)
in netinet/in.c to
	if (net == (subnetsarelocal ? ia->ia_net : ia->ia_subnet))

I also changed IPFRAGTTL in ip.h to two minutes (from 15 to 240)
because we have more memory than net bandwidth.


2. The retransmit timer is clamped at 30s.

The 4.3 tcp was put together before the arpanet went to hell and
has some optimistic assumptions about time.  Since the retransmit
timer is set to 2 * RTT, an RTT > 15s is treated as 15s.  (Last
week, the mean daytime rtt from LBL to UCB was 17s.) On a circuit
with 2min rtt, most packets would be transmitted four times and
the protocol pipelining would be effectively turned off (if 4.3
is retransmitting, it only sends one segment rather than filling
the window).  When running in this mode, you're very sensitive to
loss since each dropped packet or ack effectively uses up 4 of
your 12 retries. 

I would at least change TCPTV_MAX in netinet/tcp_timer.h to a
more realistic value, say 5 minutes (remembering to adjust
related timers like MSL proportionally).  I changed the
TCPT_RANGESET macro to ignore the maximum value because I
couldn't see any justification for a clamp.


3. It takes a long time for tcp to learn the rtt.  

I've harped on this before.  With the default 4k socket buffers
and a 512 byte mss, 4.3 tcp will only try to measure the rtt of
every 8th packet.  It will get a measurement only if that packet
and its 7 predecessors are transmitted and acked without error. 
Based on trpt trace data, tcp gets the rtt of only one in every
80 packets on a link with a 5% drop rate.  Then, because of the
gross filtering suggested in rfc793, only 10% of the new
measurement is used.  For a 15s rtt, this means it takes at least
400 packets to get the estimate from the default 3s to 7.5s
(where you stop doing unnecessary retransmits for segments with
average delay) and 1700 packets to get the estimate to 14s (where
you stop unnecessary retransmits because of variance in the
delay).  Also, if the minimum delay is greater than 6s
(2*TCPTV_SRTTDFLT), tcp can never learn the rtt because there
will always be a retransmit canceling with the measurement. 

There are several things we want to try to improve this
situation.  I won't suggest anything until we've done some
experiments.  But, the problem becomes easier to live with if
you pick a larger value for TCPTV_SRTTDFLT, say 6s, and improve
the transient response in the srtt filter (lower TCP_ALPHA to,
say, .7).


4. The retransmit backoff is wimpy.

Given that most of the links are congested and exhibit a lot of
variance in delay, you would like the retransmit timer to back
off pretty aggressively, particularly given the lousy rtt
estimates.  4.3 backs of linearly most of the time.  The actual
intervals, in units of 2*rtt, are:
  1  1  2  4  6  8  10  15  30  30  30 ...
While this is only linear up to 10, the 30s clamp on timers
means you never back off as far as 10 if the mean rtt is >1.5s.
The effect of this slow backoff is to use up a lot of your 
potential retries early in a service interruption.  E.g., a
2 minute outage when you think the rtt is 3s will cost you 9
of your 12 retries.  If the outage happens while you were
trying to retransmit, you probably won't survive it.

This is another area where we want to do some experiments.  It
seems to me that you want to back off aggressively early on, say
 1 4 8 16 ...
for the first part of the table.  It also seems like you want
to go linear or constant at some point, waiting 8192*rtt for the
12th retry has to be pointless.  The dynamic range depends to
some extent on how good your rtt estimator is and on how robust
the retransmit part of your tcp code is.  Also, based on some
modelling of gateway congestion that I did recently, you don't 
want the retransmit time to be deterministic.  Our first cut
here will probably look a lot like the backoff on an ethernet.


5. "keepalive" ignores rtt.

If you are setting SO_KEEPALIVE on any of your sockets, the 
connection will be aborted if there's no inbound packet for
6 minutes (TCPTV_MAXIDLE).  With a 2m rtt, that could happen
in the worst case with one dropped packet followed by one
dropped ack.  ("Sendmail" sets keepalive and we were having
a lot of problems with this when we first brought up 4.3.)

A fix is to multiply by t_srtt when setting the keepalive
timer and divide t_idle by t_srtt when comparing against
MAXIDLE.


6. The initial retransmit of a dropped segment happens, at
   best, after 3*rtt rather than 2*rtt.

If the delay is large compared to the window, the steady state
traffic looks like a burst acks interleaved with data, an ~rtt
delay, a burst of acks interleaved with data and repeat.  4.3
doesn't time individual segments.  It starts a 2*rtt timer for
the first segment, then, when the first segment is acked,
restarts the timer at 2*rtt to time the next segment.  Since the
2nd segment went out at approximately the same time as the first
and since the ack for the first segment took rtt to come back,
the retransmit time for the 2nd segment is 3*rtt.  In the usual
internet case of 4k windows and an mss of 512, the probability of
a loss taking 3*rtt to detect is 7/8. 

The situation is actually worse than this on lossy circuits.
Because segments are not individually timed, all retransmits
will be timed 2*rtt from the last successful transfer (i.e.,
the last ack that moved snd_una).  This tends add the time
taken by previous retransmissions into the retransmission time
of the the current segment, increasing the mean rexmit time
and, thus, lowering the average throughput.  On a link with
a 5% loss rate, for long transfers, I've measured the mean time
to retransmit a segment as ~10*rtt.

The preceeding may not be clear without a picture (it sure took
me a long time to figure out what was going on) but I'll try to
give an example.  Say that the window is 4 segments, the rtt is
R, you want to ship segments A-G and segments B and D are going
to get dropped.  At time zero you spit out A B C D.  At time R you
get back the ack for A, set the retransmit timer to go off at 3R
("now" + 2*rtt), and spit out E.  At 3R the timer goes off and you
retransmit B.  At 4R you get back an ack for C, set the retransmit
timer to go off at 6R and transmit F G. At 6R the timer goes off,
you retransmit D.  [D should have been retransmitted at 2R.]  Even
if we count the retransmit of B delaying everything by 2R (in
what is essentially a congestion control measure), there is an
extra 2R added to D's retransmit because its retransmit time is
slaved to B's ack.  Also note that the average throughput has
gone from 8 packets in 2R (if no loss) to 8 packets in 7R, a
factor of four degradation.

The obvious fix here is to time each segment.  Unfortunately,
this would add 14 bytes to a tcpcb which would then no longer fit
in an mbuf.  So, we're still trying to decide what to do.  It's
(barely) possible to live within the space limitations by, say,
timing the first and last segments and assuming the segments were
generated at a uniform rate. 


7. the retransmit policy could be better.

In the preceeding example, you might have wondered why F G were
shipped after the ack for C rather than D.  If I'd changed the
example so that C was dropped rather than D, C D E F would have
been shipped when the ack for B came in (unnecessarily resending
D and E).  In either case the behavior is "wrong".  The reason it
happens is because an ack after a retransmit is treated the same
way as normal ack.  I.e., because of data that might be in
transit you ignore what the ack tells you to send next and just
use it to open the window.  But, because the ack after a
retransmit comes 3*rtt after the last new data was injected, the
two sides are essentially in sync and the ack usually does tell
you what to send next. 

It's pretty clear what the retransmit policy should be.  We
haven't even started looking into the details of implementing
that policy in tcp_input.c & tcp_output.c.  If a grad student
would like a real interesting project ...

------------
There's more but you're probably as tired of reading as I am of
writing.  If none of this helps and if you have any Sun-3s handy,
I can probably send you a copy of my tcp monitor (as long as our
lawyers don't find out).  This is something like "etherfind"
except it prints out timestamps and all the tcp protocol info.
You'll have to agree to post anything interesting you find out
though...

Good luck.

  - Van

-----------[000011][next][prev][last][first]----------------------------------------------------
Date:      Sun, 7-Dec-86 11:42:47 EST
From:      craig@loki.bbn.com.UUCP
To:        mod.protocols.tcp-ip
Subject:   clarification


> I suspect that what you are seeing is that fact that Sun TCP (to be
> fair, most 4.2 based TCP's) doesn't perform well with bad connections.
> A number of sites have replaced Sun's TCP modules with their 4.3
> equivalents.  This makes a dramatic difference in dealing with
> difficult cases.

    Re-reading my note I realize it isn't clear.  I'm seeing the behaviour
under both 4.2 and 4.3 (though I haven't tested the 4.3 implementation
as extensively).  I did the port to 4.3 for precisely the reasons you
mention and now find myself stymied.

Craig

-----------[000012][next][prev][last][first]----------------------------------------------------
Date:      8 Dec 1986 08:01-EST
From:      CERF@a.isi.edu
To:        craig@loki.bbn.com
Cc:        tcp-ip%sri-nic.arpa@SH.CS.NET
Subject:   Re: 4.2/4.3 TCP and long RTTs
Craig,

can you obtain traces of the TCP level exchanges to see whether the
death of the TCP connection is retransmission time-out related?


Vint
-----------[000013][next][prev][last][first]----------------------------------------------------
Date:      Mon, 8-Dec-86 08:35:40 EST
From:      walsh@HARVARD.HARVARD.EDU.UUCP
To:        mod.protocols.tcp-ip
Subject:   Re:  4.2/4.3 TCP and long RTTs

By having a maximum RTT of 2+ minutes, your RDP connection will stay open
at times when the Berkeley UNIX system will arbitrarily close the connection
after 30 seconds (rather than just informing the application about the
problem).  You are also seeing the benefits of EACKs.

bob

-----------[000014][next][prev][last][first]----------------------------------------------------
Date:      Mon, 8 Dec 86 08:35:40 EST
From:      Bob Walsh <walsh@harvard.harvard.edu>
To:        craig@loki.bbn.com, tcp-ip%sri-nic.arpa@SH.CS.NET
Subject:   Re:  4.2/4.3 TCP and long RTTs
By having a maximum RTT of 2+ minutes, your RDP connection will stay open
at times when the Berkeley UNIX system will arbitrarily close the connection
after 30 seconds (rather than just informing the application about the
problem).  You are also seeing the benefits of EACKs.

bob
-----------[000015][next][prev][last][first]----------------------------------------------------
Date:      Mon, 8 Dec 86 11:00:16 CST
From:      Linda Crosby <lcrosby@ALMSA-1.ARPA>
To:        TCP-IP@SRI-NIC.ARPA
Cc:        COOL@ALMSA-1.ARPA
Subject:   REPLYS TO TCP/IP - ETHERNET QUERY

Here is the consolidation of the meaningfull replys to our query to
the TCP/IP mailing list.

(And before anyone asks..Mr. Savacool originated the query,
I was the 'pipeline' since I am on the TCP-IP mailing list and 
Mr. Savacool is not.)

Thank you all,

Linda J. Crosby
lcrosby@almsa-1

------------------------------------------

The original question was :

>     I am in the process of establishing a TCP/IP network that will
>run on Broadband ethernet.  The prototype installation will consist of
>4 mainframe computers (VAX 780) and 8 or 16 users using IBM-PC clones.  The
>highwater mark could be as many as 8 mainframe computers and 400 IBM-PC
>clone users.
>
>     I would like to know if 400 users, as described above, running the
>standard suite of DOD protocols (TCP/IP, TELNET, FTP, SMTP) can be
>supported on a single ethernet ?  How many connections would be
>reasonable before the network begins to degrade ?  Is anyone familiar
>with a broadband ethernet, similar to what we are proposing, that is
>configured with 400 connections ?  Is this reasonable ?
>
>     I would like to here from anyone who has covered this ground
>before.  A successfull installation elsewhere would be a big
>confidence builder.

     I would like to thank all those who were kind enough to answer.
The replys ranged from several people who felt compelled to point out
that I didn't know what I was talking about when I talked about
ethernet (broadband or baseband) to many people who shared some genuine
insight about their own experiences.

     To those who still think that ethernet is only baseband I suggest
you contact ChipCom Corp,  SYTEK,  DEC  or any of the other vendors now
selling the ethernet on broadband products that were approved by the
IEEE 802.3 committee on September 19, 1985.

     Why do I want to do ethernet on broadband and not just regular
baseband ethernet ?  Simple,  We have a large investment in a
broadband cabling plant and it is not reasonable for us to lay new
cable,  it is much more reasonable for us to hang transceviers off of
our existing cable to provide the same function.

     The following are the messages that I found addressed my original
question.  Once again thanks for the many replys.  We are going to do
some of this soon (government procurements take at least 6
months)  so I will have some real world experience within the year.

Bob Savacool
cool@almsa-1

[   The opinions expressed herein are my own.  I have no affiliation
with any company or any products mentioned. ]

--------------------------------------------------------------------
///////////////////////////////////////////////////////////////////
--------------------------------------------------------------------

Resent From: Cpt Brian Boyter  ut-ngp!boyter <boyter@ngp.utexas.edu>
From:  BOWDENPE%VTVM1.BITNET@wiscvm.wisc.edu

We have several large departmental Ethernets which we would like to
interconnect.  At the same time, we would like to provide Ethernet
capability throughout our campus.  One option which we are exploring
is to run Ethernet on our existing broadband cable system, which currently
supports video and Sytek LocalNet 20.
Because of the size of the resulting Ethernet, and because multiple protocols
will be used (TCP/IP, Lat-11, DDCMP), we believe we will also need to have
some bridges placed at critical locations.  It appears that there are several
broadband Ethernet, and Ethernet bridge products on the market.  Among the
vendors are Chipcom, Ungermann Bass, Applitek, DEC, and CMC.

I'd be interested in hearing other folks' experiences with these products.
Particularly helpful would be any recommendations or "gotchas" in configuring
such a network.

-------------------------------

From: John Lekashman <lekash@AMES-NASB.ARPA>

Its somewhat reasonable.  It all depends on what these users are going
to be doing.  If you intend to bring up remote file servers on vaxes
for pc's, and some sort of network file system, then you will probably
lose when the count of PC's gets above about 20.  If you intend to
use the pc's as terminals and login to the vaxes for random computational
needs, then the count can probably go into several hundred.  
This is also true if the primary traffic is going to mail between
users.

One approach that you can use that is extensible upward for a long time
is to properly subnet into groups.  This does mean some additional
hardware, and choosing the correct software.  For example, you could
group 25 to 50 pc's that are likely to have traffic between them
on one cable, along with an associated larger machine (incidentally,
a vax is only a mini, or at best a 'super-mini') and feed this
into a packet switch on the same cable to another cable that N other
such things exist on.  (In your case N becomes 8)  In this way,
high traffic flows along each of the separate ethernets, and lower
density of flow from group to group.  Vaxes with appropriate 
software, (eg 4.3 unix from berkeley is one such OS that we use here
for such a purpose) can function as a subnet gateway.  If you need
it, I can point you at several vendors who will sell packet switches,
about 10K each.  These can switch at about 3 megabits currently,
with 1000 byte packets, so its not a terrible performance loss.

----------------------------------

From: Bob Napier <rwn@ORNL-MSR.ARPA>

We are assisting Lowry AFB (Denver) in evaluating/installing a broadband
network of up to 500 nodes for Office Automation.

I will be happy to provide you information on our experiences, although we
are just awaiting bids to  a RFP.

-----------------------------------

From: Charles Hedrick <hedrick%topaz.rutgers.arpa@ALMSA-1.ARPA>

We have 3 DEC-20's, 3 Pyramids, and a 785 on one Ethernet.  I don't
think you will have any problem.  It's real hard to saturate
an Ethernet, unless you are using diskless machines, which do their
paging over the network.

-----------------------------------

From:           Jeffrey C Honig <$JCH%CLVM.BITNET@wiscvm.wisc.edu>

I'm planning on doing something similar but, at least initially, on a
smaller scale.  I'm planning on making extensive use of LANBRIDGE 100's
to separate traffic into groups with a backbone connecting them.  That
could increase the traffic you could handle.

I'm planning on using Chipcom modems on the broadband, how about you?
I've researched the matter fairly carefully and have a paper that
describes my research and plans.  I can mail you a copy if you are
interested.

-----------------------------------

From: Dan Lynch <LYNCH%a.isi-venera.arpa@ALMSA-1.ARPA>

 The answer to your question is not easy.  I could easily work
up a scenario that would bog down -- like using the Ether
for remote file sharing (ala diskless workstations).  But if every host
is just using the Ether for mail, some random FTPs and some Telnetting
then it could work well.  The real question is what are those PC
users doing?  Using them as terminal emulators to get to the VAXen?  
Or as real hosts and only sending mail and spreadsheets around?

Anyway,  You should contact Charles Hedrick at Rutgers (Hedrick@Rutgers)
as he has a huge installation and knows a lot.
Can you give me your US Mail address so I can send you a brochure on
the upcoming TCP/IP Interoperability conference?  (March 87).

-------------------------------------

From: Dick Karpinski <dick@ccb.ucsf.edu>

No problem.  Each pair of nodes can use about 1 megabit/sec but unless
they're gonna really pass a bunch of data around, I'd expect to see
loadings like 2% and 4% for such a small net.  On baseband you get to
use like 70-80% before it degrades much.  It's 2-4 times worse on the
5 megabit ethernet on broadband, but you still have a safety factor 
of 4-8 before serious degradation.

But why use broadband at all??  The connections are more expensive and
less speedy and bigger too.  Buffered repeaters and filtered repeaters
like the DEC LANBridge handle length problems (and traffic levels too
with the filtering) at costs like $4k and $7K respectively.

We have about 30 VAXen and Suns etc and just about can't see the traffic.

------------------------------------

From: Bill Nowicki <nowicki@sun.com>

	I am in the process of establishing a TCP/IP network that will
	run on Broadband ethernet.

In a way, "Broadband ethernet" is an oxymoron. Ethernet is
baseband.  There are several companies that build systems that are
compatible with Ethernet transciever specs, but use broadband
signalling instead of baseband signalling.  Although this is an analog
issue that is mysterious to software types like me, the fact
that you modulate and demodulate should not help collision problems, so
you have the usual Ethernet length restrictions.  Of course you can
exceed the length restrictions, with possible collision problems.

	The prototype installation will consist of 4 mainframe
	computers (VAX 780) and 8 or 16 users using IBM-PC clones.

Your definition of "mainframe" is interesting, since the workstation on
my desk is twice as fast, and is the middle of our line.  At any
rate, we have many Ethernets with up to about 100 Sun-3 machines.  It is
interesting that bandwidth is not the first limitation.  The main
reason you don't want more than about 100 machines is that one faulty
machine can bring the whole network down.  The probability that someone
shorts the cable, or starts to continuously broadcast, or at least has
bad collision detection circuitry, becomes pretty close to one with
over 100 nodes.

Of course this might be because Ethernet networks just evolve, while
broadband networks are usually "planned".  We make each floor of each
building its own Ethernet, with additional Ethernets for labs.  You can
make a Sun into a gateway just by sliding in another Ethernet
controller.  It also helps to use transceiver multiplexor boxes such as
the ones made by TCL, to reduce the number of actual taps, (less likely
to short the cable). So you probably can put 400 PCs onto a single net,
but do you want to?


--------------------------------------------------------------------
///////////////////////////////////////////////////////////////////
--------------------------------------------------------------------

-----------[000016][next][prev][last][first]----------------------------------------------------
Date:      8 Dec 1986 10:14-EST
From:      CLYNN@g.bbn.com
To:        craig@loki.bbn.com
Cc:        tcp-ip%sri-nic.arpa@SH.CS.NET
Subject:   Re: 4.2/4.3 TCP and long RTTs
Craig,
	I would suggest that you try limiting unacknowledged data on the TCP
connection to an amount which will not cause fragmentation over the paths
that you are using.  I suspect that the problem you are seeing is that
packets are being generated which require fragmentation.  The usual
fragmentation algorithm dumps n consecutive packets into the net; if
any of the n get lost, reassembly cannot occur and retransmission is
required.  It was formerly the case that some interfaces could not receive
back to back packets from the net; it may still be; if so, a fragmented
packet will "never" get through.  Also, I suspect that the IP sequence
number in retransmitted TCP packets is different from the original;
consequently, fragments from different retransmissions cannot be recombined.
Does your RDP implementation specify the IP sequence number?  Can you tell
if your host is receiving fragments and having to discard them?  Do you
get any ICMP fragment reassembly time exceeded messages?  Can you find out
what the initial time-to-live value is for the TCP connection and the RDP?
Let us know what you find.

Charlie
-----------[000017][next][prev][last][first]----------------------------------------------------
Date:      Mon, 8-Dec-86 11:57:21 EST
From:      braden@ISI.EDU.UUCP
To:        mod.protocols.tcp-ip
Subject:   Re:  4.2/4.3 TCP and long RTTs


Craig:

The amount of your hair pulling must be small compared to the time integral
of hair pulled by our UCL friends over the years.  Quite simply, their
conclusion was that most TCP implementations have design problems
that make them behave poorly over paths which have very long delay and
moderate to high loss.  SATNET sometimes (often?) exhibits that behaviour.
Recently, the ARPANET+core_gateway system has also exhibited that 
behaviour, and many TCP's have not been up to it... lots of broken
connections, etc.

I suggest that the cause of this situation is a performance/ robustness
tradeoff inherent to TCP implementations.  Most of the
currently-available TCPs have been implemented and tested in an LAN
environment, to provide optimal performance in a low-delay, low-error
situation.  On the other hand, when we wrote the original experimental
implementations of TCP, we found the little beasties to be amazingly
robust; they would tenaciously hold on for minutes (or hours!)
retransmitting until a path came back, and would get the data through in
spite of terrible bugs.  But we were writing and testing them for
equally-experimental gateway implementations and frequently testing to
UCL, and did not demand high throughput or low delay.

It would certainly be interesting to understand exactly how these TCP;s
have failed. I suspect it is a combination of a Zhang-catastrophe (RTT
measurement diverging towards infinity due to high loss rate) with an
implementation-imposed upper bound on retransmission time before the
connection breaks.  On the other hand, the answer may be that selective
retransmission is really absolutely essential to deal with the
long-delay, lossy situation.  I would like to get someone interesting in
running some experiments on this (maybe you just did??) Would it be
possible for you to disable just the selective retransmisssion feature of
RDP and try again?

Bob Braden 

 

-----------[000018][next][prev][last][first]----------------------------------------------------
Date:      Mon, 8-Dec-86 12:00:16 EST
From:      lcrosby@ALMSA-1.ARPA.UUCP
To:        mod.protocols.tcp-ip
Subject:   REPLYS TO TCP/IP - ETHERNET QUERY


Here is the consolidation of the meaningfull replys to our query to
the TCP/IP mailing list.

(And before anyone asks..Mr. Savacool originated the query,
I was the 'pipeline' since I am on the TCP-IP mailing list and 
Mr. Savacool is not.)

Thank you all,

Linda J. Crosby
lcrosby@almsa-1

------------------------------------------

The original question was :

>     I am in the process of establishing a TCP/IP network that will
>run on Broadband ethernet.  The prototype installation will consist of
>4 mainframe computers (VAX 780) and 8 or 16 users using IBM-PC clones.  The
>highwater mark could be as many as 8 mainframe computers and 400 IBM-PC
>clone users.
>
>     I would like to know if 400 users, as described above, running the
>standard suite of DOD protocols (TCP/IP, TELNET, FTP, SMTP) can be
>supported on a single ethernet ?  How many connections would be
>reasonable before the network begins to degrade ?  Is anyone familiar
>with a broadband ethernet, similar to what we are proposing, that is
>configured with 400 connections ?  Is this reasonable ?
>
>     I would like to here from anyone who has covered this ground
>before.  A successfull installation elsewhere would be a big
>confidence builder.

     I would like to thank all those who were kind enough to answer.
The replys ranged from several people who felt compelled to point out
that I didn't know what I was talking about when I talked about
ethernet (broadband or baseband) to many people who shared some genuine
insight about their own experiences.

     To those who still think that ethernet is only baseband I suggest
you contact ChipCom Corp,  SYTEK,  DEC  or any of the other vendors now
selling the ethernet on broadband products that were approved by the
IEEE 802.3 committee on September 19, 1985.

     Why do I want to do ethernet on broadband and not just regular
baseband ethernet ?  Simple,  We have a large investment in a
broadband cabling plant and it is not reasonable for us to lay new
cable,  it is much more reasonable for us to hang transceviers off of
our existing cable to provide the same function.

     The following are the messages that I found addressed my original
question.  Once again thanks for the many replys.  We are going to do
some of this soon (government procurements take at least 6
months)  so I will have some real world experience within the year.

Bob Savacool
cool@almsa-1

[   The opinions expressed herein are my own.  I have no affiliation
with any company or any products mentioned. ]

--------------------------------------------------------------------
///////////////////////////////////////////////////////////////////
--------------------------------------------------------------------

Resent From: Cpt Brian Boyter  ut-ngp!boyter <boyter@ngp.utexas.edu>
From:  BOWDENPE%VTVM1.BITNET@wiscvm.wisc.edu

We have several large departmental Ethernets which we would like to
interconnect.  At the same time, we would like to provide Ethernet
capability throughout our campus.  One option which we are exploring
is to run Ethernet on our existing broadband cable system, which currently
supports video and Sytek LocalNet 20.
Because of the size of the resulting Ethernet, and because multiple protocols
will be used (TCP/IP, Lat-11, DDCMP), we believe we will also need to have
some bridges placed at critical locations.  It appears that there are several
broadband Ethernet, and Ethernet bridge products on the market.  Among the
vendors are Chipcom, Ungermann Bass, Applitek, DEC, and CMC.

I'd be interested in hearing other folks' experiences with these products.
Particularly helpful would be any recommendations or "gotchas" in configuring
such a network.

-------------------------------

From: John Lekashman <lekash@AMES-NASB.ARPA>

Its somewhat reasonable.  It all depends on what these users are going
to be doing.  If you intend to bring up remote file servers on vaxes
for pc's, and some sort of network file system, then you will probably
lose when the count of PC's gets above about 20.  If you intend to
use the pc's as terminals and login to the vaxes for random computational
needs, then the count can probably go into several hundred.  
This is also true if the primary traffic is going to mail between
users.

One approach that you can use that is extensible upward for a long time
is to properly subnet into groups.  This does mean some additional
hardware, and choosing the correct software.  For example, you could
group 25 to 50 pc's that are likely to have traffic between them
on one cable, along with an associated larger machine (incidentally,
a vax is only a mini, or at best a 'super-mini') and feed this
into a packet switch on the same cable to another cable that N other
such things exist on.  (In your case N becomes 8)  In this way,
high traffic flows along each of the separate ethernets, and lower
density of flow from group to group.  Vaxes with appropriate 
software, (eg 4.3 unix from berkeley is one such OS that we use here
for such a purpose) can function as a subnet gateway.  If you need
it, I can point you at several vendors who will sell packet switches,
about 10K each.  These can switch at about 3 megabits currently,
with 1000 byte packets, so its not a terrible performance loss.

----------------------------------

From: Bob Napier <rwn@ORNL-MSR.ARPA>

We are assisting Lowry AFB (Denver) in evaluating/installing a broadband
network of up to 500 nodes for Office Automation.

I will be happy to provide you information on our experiences, although we
are just awaiting bids to  a RFP.

-----------------------------------

From: Charles Hedrick <hedrick%topaz.rutgers.arpa@ALMSA-1.ARPA>

We have 3 DEC-20's, 3 Pyramids, and a 785 on one Ethernet.  I don't
think you will have any problem.  It's real hard to saturate
an Ethernet, unless you are using diskless machines, which do their
paging over the network.

-----------------------------------

From:           Jeffrey C Honig <$JCH%CLVM.BITNET@wiscvm.wisc.edu>

I'm planning on doing something similar but, at least initially, on a
smaller scale.  I'm planning on making extensive use of LANBRIDGE 100's
to separate traffic into groups with a backbone connecting them.  That
could increase the traffic you could handle.

I'm planning on using Chipcom modems on the broadband, how about you?
I've researched the matter fairly carefully and have a paper that
describes my research and plans.  I can mail you a copy if you are
interested.

-----------------------------------

From: Dan Lynch <LYNCH%a.isi-venera.arpa@ALMSA-1.ARPA>

 The answer to your question is not easy.  I could easily work
up a scenario that would bog down -- like using the Ether
for remote file sharing (ala diskless workstations).  But if every host
is just using the Ether for mail, some random FTPs and some Telnetting
then it could work well.  The real question is what are those PC
users doing?  Using them as terminal emulators to get to the VAXen?  
Or as real hosts and only sending mail and spreadsheets around?

Anyway,  You should contact Charles Hedrick at Rutgers (Hedrick@Rutgers)
as he has a huge installation and knows a lot.
Can you give me your US Mail address so I can send you a brochure on
the upcoming TCP/IP Interoperability conference?  (March 87).

-------------------------------------

From: Dick Karpinski <dick@ccb.ucsf.edu>

No problem.  Each pair of nodes can use about 1 megabit/sec but unless
they're gonna really pass a bunch of data around, I'd expect to see
loadings like 2% and 4% for such a small net.  On baseband you get to
use like 70-80% before it degrades much.  It's 2-4 times worse on the
5 megabit ethernet on broadband, but you still have a safety factor 
of 4-8 before serious degradation.

But why use broadband at all??  The connections are more expensive and
less speedy and bigger too.  Buffered repeaters and filtered repeaters
like the DEC LANBridge handle length problems (and traffic levels too
with the filtering) at costs like $4k and $7K respectively.

We have about 30 VAXen and Suns etc and just about can't see the traffic.

------------------------------------

From: Bill Nowicki <nowicki@sun.com>

	I am in the process of establishing a TCP/IP network that will
	run on Broadband ethernet.

In a way, "Broadband ethernet" is an oxymoron. Ethernet is
baseband.  There are several companies that build systems that are
compatible with Ethernet transciever specs, but use broadband
signalling instead of baseband signalling.  Although this is an analog
issue that is mysterious to software types like me, the fact
that you modulate and demodulate should not help collision problems, so
you have the usual Ethernet length restrictions.  Of course you can
exceed the length restrictions, with possible collision problems.

	The prototype installation will consist of 4 mainframe
	computers (VAX 780) and 8 or 16 users using IBM-PC clones.

Your definition of "mainframe" is interesting, since the workstation on
my desk is twice as fast, and is the middle of our line.  At any
rate, we have many Ethernets with up to about 100 Sun-3 machines.  It is
interesting that bandwidth is not the first limitation.  The main
reason you don't want more than about 100 machines is that one faulty
machine can bring the whole network down.  The probability that someone
shorts the cable, or starts to continuously broadcast, or at least has
bad collision detection circuitry, becomes pretty close to one with
over 100 nodes.

Of course this might be because Ethernet networks just evolve, while
broadband networks are usually "planned".  We make each floor of each
building its own Ethernet, with additional Ethernets for labs.  You can
make a Sun into a gateway just by sliding in another Ethernet
controller.  It also helps to use transceiver multiplexor boxes such as
the ones made by TCL, to reduce the number of actual taps, (less likely
to short the cable). So you probably can put 400 PCs onto a single net,
but do you want to?


--------------------------------------------------------------------
///////////////////////////////////////////////////////////////////
--------------------------------------------------------------------

-----------[000019][next][prev][last][first]----------------------------------------------------
Date:      Mon, 8-Dec-86 13:15:31 EST
From:      karels%okeeffe@UCBVAX.BERKELEY.EDU (Mike Karels)
To:        mod.protocols.tcp-ip
Subject:   Re: 4.2/4.3 TCP and long RTTs

Bob,
The timeout for TCP on 4.2/3 is rather longer than 30 sec.
The actual time depends on the round-trip time, as the limit
is on the number of retransmissions.  On 4.3, the timeout
is at least 108 sec. with short RTT's.  The limit on 4.2
was nearer 45 sec.  As Van Jacobson says in his message,
the keepalive time will have to be adjusted for long RTT's
as well, but I doubt that Craig is using the keepalive timer.

How can the TCP "just" inform the application about the problem?
Unless there's a control channel to the application that allows
the passage of status data, a send call must return an error.
After that error, the application can't tell how much of the data,
if any, was transmitted.  "Reliable byte stream with possible gaps
in case of error" isn't very satisfying.

		Mike

-----------[000020][next][prev][last][first]----------------------------------------------------
Date:      Mon, 8-Dec-86 13:32:44 EST
From:      walsh@HARVARD.HARVARD.EDU (Bob Walsh)
To:        mod.protocols.tcp-ip
Subject:   Re: 4.2/4.3 TCP and long RTTs

Mike,
	How can the TCP "just" inform the application about the problem?
	Unless there's a control channel to the application that allows
	the passage of status data, a send call must return an error.
	After that error, the application can't tell how much of the data,
	if any, was transmitted.  "Reliable byte stream with possible gaps
	in case of error" isn't very satisfying.

Informing the application that the networking system is having trouble
getting acknowledgements does not mean that the networking system has
given up on sending that data.  My intent was to point out that such a
decision should be left up to the application, which may in turn defer
the decision to the user.  This was one of the qualities of the BBN
TCP/IP software, as you know.  It is one of the reasons Bob Gilligan used
the BBN software for his demos.

Whether it is 30, 45 or 108 seconds doesn't matter.  What does matter is
that Craig is preserving the RDP connection for a longer time under such
circumstances and therefore is less likely to see the connection fail.

There is also the point of extended acknowledgements.

Bob Walsh

-----------[000021][next][prev][last][first]----------------------------------------------------
Date:      Mon, 8 Dec 86 13:32:44 EST
From:      Bob Walsh <walsh@harvard.harvard.edu>
To:        karels%okeeffe@berkeley.edu, walsh@HARVARD.HARVARD.EDU
Cc:        craig@loki.bbn.com, tcp-ip%sri-nic.arpa@SH.CS.NET
Subject:   Re: 4.2/4.3 TCP and long RTTs
Mike,
	How can the TCP "just" inform the application about the problem?
	Unless there's a control channel to the application that allows
	the passage of status data, a send call must return an error.
	After that error, the application can't tell how much of the data,
	if any, was transmitted.  "Reliable byte stream with possible gaps
	in case of error" isn't very satisfying.

Informing the application that the networking system is having trouble
getting acknowledgements does not mean that the networking system has
given up on sending that data.  My intent was to point out that such a
decision should be left up to the application, which may in turn defer
the decision to the user.  This was one of the qualities of the BBN
TCP/IP software, as you know.  It is one of the reasons Bob Gilligan used
the BBN software for his demos.

Whether it is 30, 45 or 108 seconds doesn't matter.  What does matter is
that Craig is preserving the RDP connection for a longer time under such
circumstances and therefore is less likely to see the connection fail.

There is also the point of extended acknowledgements.

Bob Walsh
-----------[000022][next][prev][last][first]----------------------------------------------------
Date:      Mon, 8-Dec-86 13:44:47 EST
From:      van@LBL-CSAM.ARPA.UUCP
To:        mod.protocols.tcp-ip
Subject:   Re: 4.2/4.3 TCP and long RTTs

I've been told that my 6th & 7th points (4.3bsd retransmit timers
need some work) were incomprehensible.  That's what you get
when you reply to messages at 3am Sunday morning.  Since the
retransmit timer behavior results in the biggest performance
loss (the other problems affect congestion & stability more than
performance), I'll take a crack at explaining it better.

Attached is a picture of the problem.  It is taken directly from
a trace but the window size has been reduced from 8*MSS to 4*MSS
to simplify the drawing.  Time runs down the page.  The time axes
has tick marks at multiples of the round trip time, R.  The sender
is on the left, the receiver on the right.  Seven segment are
sent, labeled A through G.  Two segments, B and D, get lost or 
damaged in transit.  A lower case letter is used for a receiver's
ack (e.g., "g" is the ack for all bytes up to and including the
last byte of segment "G").  A list of all the segments successfully
received so far is in square brackets at the point where each
ack is generated.  Holes in the sequence space are indicated by "-".

All the traffic goes one direction (this was an ftp).  4.3 almost
always sends MSS byte segments and all these were of size MSS (512B).
Because of the 4.3 delayed ack code, the receiver almost always
reports a full size window (4KB in 4.3, 2KB in this example) in an
ack.  All these acks report a 4 MSS (2KB) window.

All sends are timed.  The retransmit timer is set to 2 times the
smoothed round trip time (TCP_BETA * t_srtt).  The timer is set
on each ack that's not a duplicate of a previous ack (i.e., that
changes the "sent but unacknowleged" pointer, snd_una).  If the
timer times out, the segment starting at snd_una is retransmitted
and the timer is restarted at 2*srtt.  Exactly one segment is
retransmitted.  Periodic retransmissions of that segment continue
until it is acked.  When the segment is acked, the retransmit
timer is set to 2*srtt and "normal" behavior resumes (see rfc793
if you're not sure what normal behavior is). 

	  0-| A				(set timer to 2R, send enough
	    | B\			 packets to fill window (4))
	    | C\\
	    | D\\\
	    |  \\*\
	    |   *\ a [A]		(ack A)
	    |     X
	    |    / a [A - C]		(save C but can only ack through A)
	    |   / /
	    |  / /
	 1R-| E /			(A ack received, set timer to 3R,
	    |  \			 ack opens window by 1 so send E)
	    | - \			(duplicate A ack discarded)
	    |    \
	    |     \
	    |      a [A - C - E]	(save E but can only ack through A)
	    |     /
	    |    /
	    |   /
	    |  /
	 2R-| -				(duplicate A ack discarded)
	    | 
	    |
	    |
	    |
	    |
	    |
	    |
	    |
	    |
	 3R-| B				(timer goes off, rexmit first
	    |  \			 unacked segment (B), timer set to 5R)
	    |   \
	    |    \
	    |     \
	    |      c [A B C - E]	("B" fills in sequence space up
	    |     /			 through "C", ack C)
	    |    /
	    |   /
	    |  /
	 4R-| F				(ack of C opens window for 2 more
	    | G\			 segments, timer set to 6R)
	    |  \\
	    |   \\
	    |    \\
	    |     \c [A B C - E F]	(missing D, can only ack through C)
	    |     /c [A B C - E F G]
	    |    //
	    |   //
	    |  //
	 5R-| -/			(duplicate acks for C discarded)
	    | -
	    |
	    |
	    |
	    |
	    |
	    |
	    |
	    |
	 6R-| D				(timer goes off, rexmit first
	    |  \			 unacked segment (D), timer set 8R)
	    |   \
	    |    \
	    |     \
	    |      g [A B C D E F G]	(sequence space complete, ack G)
	    |     /
	    |    /
	    |   /
	    |  /
	 7R-| 


There are two problems here: the gap between 2R & 3R and the fact
that we don't send D at 4R.  The idle time from 2-3 (and from
5-6) happens because our timer is always 2*R from the last useful
ack and is essentially unrelated to when a segment is originally
sent (The code wasn't intended to work this way and on low delay
circuits it works correctly.) We should really be retransmitting
B 2*R from its first transmission (i.e., 1 line after the 2R tick
mark).  It's not too hard to show analytically that this (the
current 4.3 algorithm) "feeds forward" (e.g., the recovery for D
is moved later in time and is more likely to conflict with F,G
recovery) which is why throughput degrades much faster than
linearly with increasing loss rate. 

You can view the late transmission of D two ways.  It could be
another example of the timer problem.  I.e., we should have
retransmitted D 3 lines after the 2R tick.  We held off sending
it then because we thought the network might be congested and we
wanted to send a minimum amount of data until we got back an
indication (the ack) that the congestion had cleared up.  But we
certainly should have sent D at 4R when we got the "c" ack. 

Or you can say that when we get the "c" ack after the
retransmission of B, no packets have been injected into the
network for 2*R.  The ack tells you pretty clearly that the
receiver is missing D. (Either point of view will do the "right"
thing in this case but treating a retransmit ack specially buys
you a bit in one other case). 

If the two problems are corrected, the total time drops from 7R
to 4R (2R is the total time if no packets are lost).  If we don't
do the send-1-packet-on-rexmit congestion control, the total time
drops to 3R, the mimimum possible if one or more packets is
dropped. 

Also, this partly illustrates why I thought Craig's measurements
demonstrated a problem in TCP rather than the superiority of RDP.
Even with EACKs, it takes RDP 3R to send the data if the same two
packets are lost, exactly the same time it takes TCP.  I think I
can show that EACKs aren't a big win until the drop rate is >50%,
if TCP is working as well as it can (that's not to say RDP isn't
a win for other reasons). 

  - Van

-----------[000023][next][prev][last][first]----------------------------------------------------
Date:      Mon, 8-Dec-86 20:50:59 EST
From:      BUDDENBERGRA@A.ISI.EDU.UUCP
To:        mod.protocols.tcp-ip
Subject:   DDN implementation

Pardon me if I should post this to another board, but...
What is the status of DDN implementation as replacement for Autodin?
I'm concerned because we (USCG) are about to buy into a bunch
of ports that are called 'Autodin' up front.  Are they likely to
be DDN under the hood?

Where do we find this information on a continuing and updated basis?
tks
-------

-----------[000024][next][prev][last][first]----------------------------------------------------
Date:      8 Dec 1986 20:50:59 EST
From:      Rex Buddenberg <BUDDENBERGRA@A.ISI.EDU>
To:        tcp-ip@SRI-NIC.ARPA
Cc:        BUDDENBERGRA@A.ISI.EDU
Subject:   DDN implementation
Pardon me if I should post this to another board, but...
What is the status of DDN implementation as replacement for Autodin?
I'm concerned because we (USCG) are about to buy into a bunch
of ports that are called 'Autodin' up front.  Are they likely to
be DDN under the hood?

Where do we find this information on a continuing and updated basis?
tks
-------
-----------[000025][next][prev][last][first]----------------------------------------------------
Date:      Tue, 9-Dec-86 17:15:16 EST
From:      roden%husc4@HARVARD.HARVARD.EDU.UUCP
To:        mod.protocols.tcp-ip
Subject:   (none)


	We have installed a product called FTP-VMS, a TCP/IP Ethernet File
Transfer Software from a company called Process Software in Amherst, MA
(413) 549-6994, on a trial basis.

	According to the product literature, it

		o  operates with DEC Ethernet/802.3 Hardware
		o  implements TCP/IP FTP Networking Standard
		o  shares Ethernet Hardware with DECnet
		o  works with TELNET-VMS (another of their products)

	I am interested to know if anyone else uses this product, and 
if so, what are their experiences?   Please send me mail directly. 

	Thank you.

				Peter Roden, VMS Systems Manager
				Harvard University Science Center

Bitnet:    roden@harvsc3		
UUCP:      ihnp4!wjhiz!HUSC3!roden
internet:  roden%husc4@harvard.harvard.edu
voice:     (617) 495-1270		

-----------[000026][next][prev][last][first]----------------------------------------------------
Date:      Tue, 9 Dec 86 17:15:16 EST
From:      roden%husc4@harvard.HARVARD.EDU
To:        tcp-ip@sri-nic.arpa
	We have installed a product called FTP-VMS, a TCP/IP Ethernet File
Transfer Software from a company called Process Software in Amherst, MA
(413) 549-6994, on a trial basis.

	According to the product literature, it

		o  operates with DEC Ethernet/802.3 Hardware
		o  implements TCP/IP FTP Networking Standard
		o  shares Ethernet Hardware with DECnet
		o  works with TELNET-VMS (another of their products)

	I am interested to know if anyone else uses this product, and 
if so, what are their experiences?   Please send me mail directly. 

	Thank you.

				Peter Roden, VMS Systems Manager
				Harvard University Science Center

Bitnet:    roden@harvsc3		
UUCP:      ihnp4!wjhiz!HUSC3!roden
internet:  roden%husc4@harvard.harvard.edu
voice:     (617) 495-1270		
-----------[000027][next][prev][last][first]----------------------------------------------------
Date:      Wed, 10-Dec-86 17:12:10 EST
From:      Charles_Russell_Severance@UM.CC.UMICH.EDU
To:        mod.protocols.tcp-ip
Subject:   (none)

We at Michigan State University are implementing a campus
wide Ethernet over broadband cable.  I would like to ask two
questions:
 
1.   We have installed a Fibronics K-320 on our IBM mainframe
     to do TELNET and FTP.   We can't seem to find a satisfactory
     program to run in our PC compatibles with 3COM cards.  We
     are looking for people who are using the K-320 with 3COM
     cards doing 3270 emulation.  We would like to find sites
     who are satisfied with a TELNET program on the PC when
     communicating with a K-320.
 
2.   We are looking for possible network analyzers to use when
     debugging network problems.  The only one that we currently
     have any information on is one from EXCELAN which is $9,500
     for a card and some PC software.  We have access to both
     a SUN and AT-compatible equipment.
 
BITNET:     20095CRS@MSU
PHONE:      517-353-2984
ADRESS:     Charles Severance
            Michigan State University
            301 Computer Center
            East Lansing, MI 48824
 
Thank you in advance.

-----------[000028][next][prev][last][first]----------------------------------------------------
Date:      Fri, 12-Dec-86 00:07:53 EST
From:      gds@EDDIE.MIT.EDU.UUCP
To:        mod.protocols.tcp-ip
Subject:   Re: 4.2/4.3 TCP and long RTTs

> Mike,
> 	How can the TCP "just" inform the application about the problem?
> 	Unless there's a control channel to the application that allows
> 	the passage of status data, a send call must return an error.
> 	After that error, the application can't tell how much of the data,
> 	if any, was transmitted.  "Reliable byte stream with possible gaps
> 	in case of error" isn't very satisfying.
> 
> Informing the application that the networking system is having trouble
> getting acknowledgements does not mean that the networking system has
> given up on sending that data.  My intent was to point out that such a
> decision should be left up to the application, which may in turn defer
> the decision to the user.  This was one of the qualities of the BBN
> TCP/IP software, as you know.  It is one of the reasons Bob Gilligan used
> the BBN software for his demos.
> 

There are some Unix applications, like the 4.2 version of telnet,
which do a close() if they get something like ETIMEDOUT, which can
occur if a TCP timer has gone off.  In the 4.2 BBN TCP/IP, the timer
going off does not cause the connection to be closed.  Instead, a
routine called advise_user lets the higher layers know about the
problem and lets them deal with it.  I was rather surprised to see
that 4.2 telnet was giving up when the actual connection was not
remotely closed.  With a quick patch to telnet to prevent it from
closing for errors when a TCP timer has gone off, you can maintain
telnet connections for hours.  The reconstitution protocols required
that connections remain open for long periods of time during network
dynamics.

I haven't looked at 4.3 telnet or TCP to see if this is fixed or
handled differently.

--gregbo

-----------[000029][next][prev][last][first]----------------------------------------------------
Date:      Fri, 12-Dec-86 02:21:18 EST
From:      rhc@hplb.CSNET.UUCP
To:        mod.protocols.tcp-ip
Subject:   Re:  4.2/4.3 TCP and long RTTs

I would like to enforce Bobs concern at the Internet layer. As an
ex-UCL person I am also concerned at the current tradeoff in TCP
implementations towards LAN-specific high performance and away from
Internet-general robustness.

The maximum packet size on SATNET is 256 bytes. If you are sending
large (>576) TCP packets then the gateway to the ARPANET will
fragment, then the ARPANET/SATNET gateway will fragment each of these
again. The result is a large number of fragments bursting onto SATNET,
pinging off Goonhilly (Why do you have to use the busiest European
earth station anyway) then each fragment tries to get back across the
ARPANET. At the SATNET/ARPANET gateway we have the ARPANET flow
control problem. It is quite likely that every packet sent from the
host has turned into at least 4 and possibly 6 packets for the return!
Have you considered your reassembly timeout! 
I have seen packets hang around in these gateways for 16 minutes (yes
thats right MINUTES).
After a little while the gateway queues will fill up and the TCP will
timeout because the IP reassembly is not able to get enough fragments
to build enough packets to keep the TCP happy.
Now if the TCP was able to use the same IP packet number for all
retransmissions then the situation will improve - and I suspect that
this is where RDP is winning!

Just because the TCP connection is failing does not mean that the TCP
layer is the only one that is broken.

Happy satellites,
Robert.

-----------[000030][next][prev][last][first]----------------------------------------------------
Date:      Fri, 12-Dec-86 14:57:00 EST
From:      stanonik@NPRDC.ARPA.UUCP
To:        mod.protocols.tcp-ip
Subject:   4.3bsd/telnet break

We've had some users, who connect to us via tacs,
complain that programs now mysteriously crash.  One hunch
is that we're receiving telnet breaks (iac brk), which the
4.3bsd telnet server turns into the dump core signal (SIGQUIT).
So, the question is, "Can a telnet break be sent from a tac
and, if so, how?".  I haven't found much information on tacs
(any suggestions, I've tried rfcs and nic tacnews), but tacs
do use "@" as an escape character (which has caused grief
when users try to send mail to say, user@colorado.edu).  Is
there a tac escape sequence for sending the telnet break;
eg, @b?

Thanks,

Ron Stanonik
stanonik@nprdc.arpa

-----------[000031][next][prev][last][first]----------------------------------------------------
Date:      Fri, 12-Dec-86 15:40:35 EST
From:      ahill@CC7.BBN.COM.UUCP
To:        mod.protocols.tcp-ip
Subject:   Re: 4.3bsd/telnet break

Ron,
	Yes.  There is a TAC escape sequence that will cause 
breaks to be transmitted to a host.  The sequence is @s b.  There
are probably others but that one works.  The default command intercept
is @ and it can be transmitted through the TAC by doubling the character,
ie, two @'s.  If there is need, you can change the intercept.

Alan

-----------[000032][next][prev][last][first]----------------------------------------------------
Date:      Fri, 12 Dec 86 15:40:35 EST
From:      "Alan R. Hill" <ahill@cc7.bbn.com>
To:        stanonik@nprdc.arpa
Cc:        tcp-ip@sri-nic.arpa
Subject:   Re: 4.3bsd/telnet break
Ron,
	Yes.  There is a TAC escape sequence that will cause 
breaks to be transmitted to a host.  The sequence is @s b.  There
are probably others but that one works.  The default command intercept
is @ and it can be transmitted through the TAC by doubling the character,
ie, two @'s.  If there is need, you can change the intercept.

Alan

-----------[000033][next][prev][last][first]----------------------------------------------------
Date:      Fri, 12-Dec-86 17:00:00 EST
From:      KNapier@DDN2.UUCP.UUCP
To:        mod.protocols.tcp-ip
Subject:   Need info on PC/IP

I am placing this request for a co-worker of mine.

He would like to know where he can get information on PC/IP.  The
type of information would be of a general nature. After a review of
the available information a more specific request may be made.

Thanks in advance.

//Ken//KNapier@DDN2

-----------[000034][next][prev][last][first]----------------------------------------------------
Date:      12 Dec 86 17:00 EST
From:      KNapier @ DDN2
To:        tcp-ip @ sri-nic.arpa
Cc:        KNapier @ DDN2
Subject:   Need info on PC/IP
I am placing this request for a co-worker of mine.

He would like to know where he can get information on PC/IP.  The
type of information would be of a general nature. After a review of
the available information a more specific request may be made.

Thanks in advance.

//Ken//KNapier@DDN2

-----------[000035][next][prev][last][first]----------------------------------------------------
Date:      12 Dec 1986 17:49-EST
From:      CERF@A.ISI.EDU
To:        stanonik@NPRDC.ARPA
Cc:        tcp-ip@SRI-NIC.ARPA
Subject:   Re: 4.3bsd/telnet break
There is a command of the form @s b for send break. You might
try that to see if you can duplicate the problem at will.

vint
-----------[000036][next][prev][last][first]----------------------------------------------------
Date:      Sat, 13-Dec-86 14:15:29 EST
From:      OLE@SRI-NIC.ARPA (Ole Jorgen Jacobsen)
To:        mod.protocols.tcp-ip
Subject:   Re: Need info on PC/IP


Perhaps it is time once again to remind you all that information about
TCP/IP implementations for a number of different machines can be found
in [SRI-NIC.ARPA]NETINFO:VENDORS-GUIDE.DOC. This file is updated frequently
and a printed version is also available from the NIC.

Ole
-------

-----------[000037][next][prev][last][first]----------------------------------------------------
Date:      Sat, 13-Dec-86 18:22:39 EST
From:      ROMKEY@XX.LCS.MIT.EDU.UUCP
To:        mod.protocols.tcp-ip
Subject:   Re: Need info on PC/IP

Anyone who has questions about PC/IP can direct them to me; I can
answer questions about the MIT version, the FTP Software version
and most of the academic spinoffs. 'Course, I might be somewhat
prejudiced in the matter...

John Romkey			FTP Software, Inc.
(617) 864-1711			PO Box 150
UUCP: romkey@mit-vax.UUCP	Kendall Square Branch
ARPA: romkey@xx.lcs.mit.edu	Boston, MA 02142
-------

-----------[000038][next][prev][last][first]----------------------------------------------------
Date:      Sun, 14 Dec 86 11:52:37 -0500
From:      Craig Partridge <craig@loki.bbn.com>
To:        tcp-ip%sri-nic.arpa@SH.CS.NET
Subject:   TCP RTT woes revisited

    This weekend I had time to start processing Van Jacobson's suggested
fixes/modifications.  Things started working very well after the first fix
which made TCP choose better fragment sizes and increased the time to live
for IP fragments.

    The subsequent testing also revealed some interesting results.  (These
are preliminary and subject to reappraisal).

    (1) EACKs appear to make a huge difference in use of the network.
    After seeing signs this was the case, I ran the simple test of
    pushing 50,000 data packets though a software loopback that
    dropped 4% of the packets.
    
    With EACKs there were 1,930 retransmissions, of which 1 received
    packet was a duplicate (note that some of the retransmissions were
    also dropped).

    Without EACKS there were 12,462 retransmissions of which 9,344
    received packets were duplicates.

    12,462 retransmissions is, of course, bad news, and comes from
    the fact that this RDP sends up to four packets in parallel.
    Typically the four get put into the send queue in the same
    tick of the timer, so when the first gets retransmitted,
    all four do.  The moral seems to be use EACKs even though
    they aren't required for a conforming implementation.

    (2) Lixia Zhang's suggestion that one use the RTT of the SYN to
    compute the initial timeout estimate appears to work very well.

    (3) EACKs may make it possible to all but stomp out RTT feedback
    (those unfortunate cases where a dropped packet leds to an
    RTT = (the number of retries * SRTT) + SRTT being used to compute
    a new SRTT.  I've been experimenting with discarding RTTs for out of
    order acks.  This is best explained by example.  If packets 1, 2, 3
    and 4 are sent, and the first ack is an EACK for 3, the implementation
    uses the RTT for 3 to recompute the SRTT, but will discard the RTTs
    for 1 and 2 when they are eventually acked (or EACKed).  The
    argument in favor of this scheme is that the acks for 1, and 2
    probably represent either (a) RTTs for packets that were dropped,
    and thus including them would lead to feedback or (b) RTTs that reflect 
    an earlier (and slower) state of the network (3 was sent after 1 and 2)
    and using them would make the SRTT a less good prediction of the
    RTT of the next packet.  Note that (b) would be more convincing
    if it wasn't the case that 1, 2, 3 and 4 were probaby sent within
    a few milliseconds of each other.

    Watching 5 trial runs of 100 64-byte data packets bounced off Goonhilly
    this algorithm kept the SRTT within the observed range of real RTTs
    (as opposed to RTTs for packets that were dropped and had to be
    retransmitted).

    Using EACKs but taking the RTT for every packet, (again doing 5 trial
    runs) several cases of RTT-feedback were seen.  In one case the SRTT
    soared to ~35 seconds when a few packets were dropped in a short period.
    Since the implementation uses Mill's suggested changes which make
    lowering the SRTT take longer than raising it, the SRTT took some
    time to recover.

People may be wondering about observed throughput.  How fast does RDP
run vis-a-vis TCP?  That turns out to be very difficult to answer.
Identical tests run in parallel or one right after another give
throughput rates that vary by factors of 2 of more.  As a result it
is difficult to get throughput numbers that demonstrably show differences
which reflect more than random variation.   After running tests for 7
weekends (and millions of packets) I have some theories, but those keep
changing as different tests are run.

Craig

P.S.  Those millions of packets are almost all over a software loopback.
The contribution to network congestion has been small.
-----------[000039][next][prev][last][first]----------------------------------------------------
Date:      Sun, 14-Dec-86 14:28:26 EST
From:      farber@HUEY.UDEL.EDU.UUCP
To:        mod.protocols.tcp-ip
Subject:   Re: Need info on PC/IP

There is also a mailing list
pcip@huey.udel.edu devoted to this subject

Dave

-----------[000040][next][prev][last][first]----------------------------------------------------
Date:      Sun, 14-Dec-86 15:46:14 EST
From:      craig@LOKI.BBN.COM.UUCP
To:        mod.protocols.tcp-ip
Subject:   TCP RTT woes revisited


    This weekend I had time to start processing Van Jacobson's suggested
fixes/modifications.  Things started working very well after the first fix
which made TCP choose better fragment sizes and increased the time to live
for IP fragments.

    The subsequent testing also revealed some interesting results.  (These
are preliminary and subject to reappraisal).

    (1) EACKs appear to make a huge difference in use of the network.
    After seeing signs this was the case, I ran the simple test of
    pushing 50,000 data packets though a software loopback that
    dropped 4% of the packets.
    
    With EACKs there were 1,930 retransmissions, of which 1 received
    packet was a duplicate (note that some of the retransmissions were
    also dropped).

    Without EACKS there were 12,462 retransmissions of which 9,344
    received packets were duplicates.

    12,462 retransmissions is, of course, bad news, and comes from
    the fact that this RDP sends up to four packets in parallel.
    Typically the four get put into the send queue in the same
    tick of the timer, so when the first gets retransmitted,
    all four do.  The moral seems to be use EACKs even though
    they aren't required for a conforming implementation.

    (2) Lixia Zhang's suggestion that one use the RTT of the SYN to
    compute the initial timeout estimate appears to work very well.

    (3) EACKs may make it possible to all but stomp out RTT feedback
    (those unfortunate cases where a dropped packet leds to an
    RTT = (the number of retries * SRTT) + SRTT being used to compute
    a new SRTT.  I've been experimenting with discarding RTTs for out of
    order acks.  This is best explained by example.  If packets 1, 2, 3
    and 4 are sent, and the first ack is an EACK for 3, the implementation
    uses the RTT for 3 to recompute the SRTT, but will discard the RTTs
    for 1 and 2 when they are eventually acked (or EACKed).  The
    argument in favor of this scheme is that the acks for 1, and 2
    probably represent either (a) RTTs for packets that were dropped,
    and thus including them would lead to feedback or (b) RTTs that reflect 
    an earlier (and slower) state of the network (3 was sent after 1 and 2)
    and using them would make the SRTT a less good prediction of the
    RTT of the next packet.  Note that (b) would be more convincing
    if it wasn't the case that 1, 2, 3 and 4 were probaby sent within
    a few milliseconds of each other.

    Watching 5 trial runs of 100 64-byte data packets bounced off Goonhilly
    this algorithm kept the SRTT within the observed range of real RTTs
    (as opposed to RTTs for packets that were dropped and had to be
    retransmitted).

    Using EACKs but taking the RTT for every packet, (again doing 5 trial
    runs) several cases of RTT-feedback were seen.  In one case the SRTT
    soared to ~35 seconds when a few packets were dropped in a short period.
    Since the implementation uses Mill's suggested changes which make
    lowering the SRTT take longer than raising it, the SRTT took some
    time to recover.

People may be wondering about observed throughput.  How fast does RDP
run vis-a-vis TCP?  That turns out to be very difficult to answer.
Identical tests run in parallel or one right after another give
throughput rates that vary by factors of 2 of more.  As a result it
is difficult to get throughput numbers that demonstrably show differences
which reflect more than random variation.   After running tests for 7
weekends (and millions of packets) I have some theories, but those keep
changing as different tests are run.

Craig

P.S.  Those millions of packets are almost all over a software loopback.
The contribution to network congestion has been small.

-----------[000041][next][prev][last][first]----------------------------------------------------
Date:      Sun, 14-Dec-86 19:39:04 EST
From:      minshall%opal.Berkeley.EDU@UCBVAX.BERKELEY.EDU.UUCP
To:        mod.protocols.tcp-ip
Subject:   (none)

Regarding 3270 access from a PC, I think that John Romkey at FTP Software, Inc.
(near Boston or MIT or somewhere) is porting tn3270 to run on various
vendor's hardware (probably 3COM included).  Also, Ungermann-Bass is porting
it to run on THEIR smart PC board.  Also, UC Berkeley has a version that runs
on the UB smart board.

Any and all of these SHOULD talk to a whatever-it-is-you-have.

Greg Minshall

-----------[000042][next][prev][last][first]----------------------------------------------------
Date:      Sun 14 Dec 86 23:00:14-EST
From:      Dennis G. Perry <PERRY@VAX.DARPA.MIL>
To:        tcp-ip@SRI-NIC.ARPA
Cc:        perry@VAX.DARPA.MIL
Subject:   [Ken Pogran <pogran@ccq.bbn.com>: Recent ARPANET Performance]
The message below indicates what I am sure some of you have suspected,
the Arpanet has had a relaps.

dennis
                ---------------

Received: from ccq.bbn.com by vax.darpa.mil (4.12/4.7)
	id AA15979; Sun, 14 Dec 86 12:44:18 est
Message-Id: <8612141744.AA15979@vax.darpa.mil>
Date: Sun, 14 Dec 86 12:32:30 EST
From: Ken Pogran <pogran@ccq.bbn.com>
Subject: Recent ARPANET Performance
To: Perry@vax.darpa.mil
Cc: CStein@ccw.bbn.com, McKenzie@j.bbn.com, Mayersohn@alexander.bbn.com,
        FSerr@alexander.bbn.com, BDlugos@ccy.bbn.com, Bartlett@cct.bbn.com,
        pogran@ccq.bbn.com

Dennis,

Over the past few weeks, users may have noticed some degradation
of ARPANET performance over the level attained in early November.
BBN Communications Corporation believes that this change in
network performance correlates with outages in certain key
network lines; in particular, lines between CMU and DCEC and,
most recently and most significantly, TEXAS and BRAGG.  The
effect of these line outages, which are not uncommon events,
demonstrate that the ARPANET is still "on the edge", particularly
where cross-country bandwidth is concerned.

The primary measure that we have of the degradation in network
performance is the increase in congestion- or performance-related
"traps" or exception reports made by the C/30 packet switches in
the network to the Network Operaions Center.  We reported on
November 12 that, following several changes made in the network,
the number of traps diminished by an order of magnitude.
Recently, the number of traps has increased substantially,
indicating degraded performance, although the number of traps has
not risen to the extremely high levels seen earlier this fall.

The DDN PMO and BBNCC are taking action to correct the present
line outage problems.

Regards,
 Ken Pogran
 BBN Communications Corporation


The following data is provided by the Bob Pyle of the BBNCC
Network Analysis Department:

-----------------------------

From:    rpyle@cc5.bbn.com
Date:    10 Dec 86 13:49:26 EST (Wed)
Subject: Recent ARPANET performance

We can divide time since mid-October into four periods to see the
effect of various trunking problems on ARPANET performance:

  Oct 27 - Nov 6	19.2 kb bottleneck at USC still in effect
  Nov 7 - Nov 24	"Good" period
  Nov 25 - Dec 5	CMU-DCEC line out of service
  Dec 8 - Dec 9		CMU-DCEC and TEXAS-BRAGG lines out of service

Each of these periods is characterized by a more-or-less constant
production of performance-related traps with fairly abrupt transitions
between them (the last period is of course too short as yet to talk
about in statistical terms).  The most numerous trap is the 63 trap,
long wait for all8.  The table below shows the average number of 63
traps per day and the average total number of traps per day on the
ARPANET for the four periods (workdays only, no weekends, no
holidays).

 Time Period |  63 traps   |	total traps
---------------------------------------------
  10/27-11/6 |	72190	   |	101115
	     |		   |
  11/7-11/24 |	 4186	   |	  9954
	     |		   |
  11/25-12/5 |	11990	   |	 23681
	     |		   |
  12/8-12/9  |	34602	   |	 68352

-------
-------
-----------[000043][next][prev][last][first]----------------------------------------------------
Date:      Mon, 15 Dec 86 02:19:25 est
From:      hedrick@topaz.rutgers.edu (Charles Hedrick)
To:        tcp-ip@sri-nic.arpa
Subject:   Arpanet outage
A few days ago, the East Coast vanished.  We talked to NOC about it.
I didn't follow everything they said, so I could be misinterpreting
it.  But what I thought I heard was that the Arpanet had fractured
into a number of small, self-contained pieces.  This was due to a
single physical cable break.  NOC itself was cut off from most of the
rest of the network, as were we (using an IMP in New York City).  One
would have thought that the Arpanet had enough redundancy that a
single cable break would not isolate an IMP, and even if it did, that
it would not do this for both NYC and Boston.  I had hoped that
someone more in the know would give us a more detailed report of what
happened.  Is this going to happen?

-----------[000044][next][prev][last][first]----------------------------------------------------
Date:      Mon 15 Dec 86 05:42:30-EST
From:      Dennis G. Perry <PERRY@VAX.DARPA.MIL>
To:        hedrick@TOPAZ.RUTGERS.EDU
Cc:        tcp-ip@SRI-NIC.ARPA, perry@VAX.DARPA.MIL
Subject:   Re: Arpanet outage
I will follow up on your comment and report back.  Please see my previous
report of yesterday on Arpanet relaps.  That doesn't seem to explain
what you are talking about.

dennis
-------
-----------[000045][next][prev][last][first]----------------------------------------------------
Date:      Mon, 15 Dec 86 13:47:40 -0500
From:      Craig Partridge <craig@loki.bbn.com>
To:        walsh@harvard.harvard.edu
Cc:        tcp-ip%sri-nic.arpa@SH.CS.NET
Subject:   Re: TCP RTT woes revisited

> Have you thought of using a separate variable to measure the RTT of each
> packet so that you can update you smoothed RTT using the EACKs?

    That's precisely what I'm doing.  Then the out-of-order rule is used
to discard RTTs that seem likely to cause SRTT explosion.

> When I last did RDP work, RDP and TCP were roughly the same speed.  Maybe
> RDP was a bit quicker even in the LAN environment.  The reason RDP did
> not dominate TCP was that the machines I was using were VAXes and the
> RDP checksumming algorithm did not run as fast as it would on a machine
> with a different byte ordering (like the 68K based workstations).

    Certainly the RDP checksum on the VAX is a real problem.  On the
SUN the checksum I use is 40% faster than the TCP checksum;  on the
VAX the checksum is about 3 times *slower* than the TCP checksum. (You
probably wrote a better one, I haven't compared them).  And over a
perfect network, the checksum performance seems to dictate speed.

    But once there is any packet loss on the network the data handling
costs seem to become rather insignifigant, and the big issue (I believe)
is retransmission mechanisms.   Unfortunately, once the network drops
packets, there seems to be a very wide variation in throughput from
test to test and it gets hard to say anything definitive.  There's also
the problem of, when you get a definitive answer, is it a real difference,
or merely demonstrating an odd quirk of the particular RDP or TCP
implementation?   (I.e. am I asking the right question?) One quickly
develops a healthy respect for TCP.

Craig
-----------[000046][next][prev][last][first]----------------------------------------------------
Date:      Mon, 15-Dec-86 10:46:48 EST
From:      malis@CCS.BBN.COM (Andrew Malis)
To:        mod.protocols.tcp-ip
Subject:   Re: Arpanet outage

Charles,

What happened is as follows:

At 1:11 AM EST on Friday, AT&T suffered a fiber optics cable
break between Newark NJ and White Plains NY.  They happen to have
routed seven ARPANET trunks through that one fiber optics cable.
When the cable was cut, all seven trunks were lost, and the PSNs
in the northeast were cut off from the rest of the network.
Service was restored by AT&T at 12:12.

The MILNET also suffered some trunk outages, but has more
reduncancy, so it was not partitioned.

Regards,
Andy Malis

-----------[000047][next][prev][last][first]----------------------------------------------------
Date:      Mon, 15 Dec 86 10:46:48 EST
From:      Andrew Malis <malis@ccs.bbn.com>
To:        hedrick@topaz.rutgers.edu
Cc:        tcp-ip@sri-nic.arpa, malis@ccs.bbn.com
Subject:   Re: Arpanet outage
Charles,

What happened is as follows:

At 1:11 AM EST on Friday, AT&T suffered a fiber optics cable
break between Newark NJ and White Plains NY.  They happen to have
routed seven ARPANET trunks through that one fiber optics cable.
When the cable was cut, all seven trunks were lost, and the PSNs
in the northeast were cut off from the rest of the network.
Service was restored by AT&T at 12:12.

The MILNET also suffered some trunk outages, but has more
reduncancy, so it was not partitioned.

Regards,
Andy Malis
-----------[000048][next][prev][last][first]----------------------------------------------------
Date:      Mon, 15-Dec-86 13:04:49 EST
From:      walsh@HARVARD.HARVARD.EDU (Bob Walsh)
To:        mod.protocols.tcp-ip
Subject:   Re:  TCP RTT woes revisited

Craig,

Have you thought of using a separate variable to measure the RTT of each
packet so that you can update you smoothed RTT using the EACKs?

When I last did RDP work, RDP and TCP were roughly the same speed.  Maybe
RDP was a bit quicker even in the LAN environment.  The reason RDP did
not dominate TCP was that the mach