----MESSAGE-BEGIN---- <1983110107050000> Return-Path: <@SU-DSN:greep@SU-DSN> Received: from SU-DSN by SRI-NIC with TCP; Tue 1 Nov 83 15:04:10-PST Date: Tuesday, 1 Nov 1983 15:05-PST To: Charles Hedrick Cc: TCP-IP@SRI-NIC, Info-VAX@SRI-CSL Subject: Re: more Overtime for the Protocol Police In-reply-to: Your message of 1 Nov 83 16:36:34 EST. From: greep@SU-DSN I don't see why a format error in the date field should break anyone's reply command, since the reply command only has to copy the contents of the field, not try to understand it (except maybe for removing leading blanks). ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110108140000> Return-Path: <@SU-DSN:greep@SU-DSN> Received: from SU-DSN by SRI-NIC with TCP; Tue 1 Nov 83 16:13:11-PST Date: Tuesday, 1 Nov 1983 16:14-PST To: Charles Hedrick Cc: tcp-ip@SRI-NIC, info-vax@SRI-CSL Subject: Re: more Overtime for the Protocol Police In-reply-to: Your message of 1 Nov 83 18:44:50 EST. From: greep@SU-DSN Oh. Yes I notice your "in-reply-to" has the time in EST whereas my original was in PST. Actually I think that's more a bug than a feature, since it makes it that much harder for someone to correlate the times (unless his system also does that transformation), but I don't think it's important enough to spend much time discussing. ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110110130000> Return-Path: Received: from JPL-VAX.ARPA by SRI-NIC with TCP; Tue 1 Nov 83 18:13:24-PST Date: 1 Nov 1983 1813 PST From: Eric P. Scott Subject: "Same network" isn't necessary good enough for "jumbograms" To: TCP-IP@SRI-NIC Reply-To: EPS@JPL-VAX Some of us piggyback logical hosts on our 1822 ports... -=EPS=- ------ ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110111363400> Return-Path: Received: from RUTGERS.ARPA by SRI-NIC with TCP; Tue 1 Nov 83 13:39:58-PST Date: 1 Nov 83 16:36:34 EST From: Charles Hedrick Subject: Re: more Overtime for the Protocol Police To: EPS@JPL-VAX.ARPA cc: TCP-IP@SRI-NIC.ARPA, don.provan@CMU-CS-A.ARPA, Info-VAX@SRI-CSL.ARPA In-Reply-To: Message from "Eric P. Scott " of 31 Oct 83 12:43:00 EST Of course if we really want to get technical, almost everybody sends invalid date and times. The new RFC's do not use a hyphen between the time and the zone. Most sites use the hyphens in both DATE fields and the MAIL-RECEIVED timestamp. This could break REPLY commands, if software wanted to say re: Hedrick's message of 1-Jan-84 0:00 UT Of course we accept both the old AT and the old hyphen, but as long as we are cleaning things up, we might as well clean up everything. ------- ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110113445000> Return-Path: Received: from RUTGERS.ARPA by SRI-NIC with TCP; Tue 1 Nov 83 15:47:56-PST Date: 1 Nov 83 18:44:50 EST From: Charles Hedrick Subject: Re: more Overtime for the Protocol Police To: greep@SU-DSN.ARPA cc: tcp-ip@SRI-NIC.ARPA, info-vax@SRI-CSL.ARPA In-Reply-To: Message from "greep@SU-DSN" of 1 Nov 83 18:05:00 EST I believe MM actually turns the date into internal date/time format and then puts it back out again. Note the in-reply-to on this message. The date is used for various purposes, such as asking to see all message since a certain date/time. For some of these purposes, we do actually have to understand the date. So it is reasonable to have the routine that parses headers turn the date/time into internal format. ------- ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110213090000> Return-Path: Received: from USC-ISIB.ARPA by SRI-NIC with TCP; Wed 2 Nov 83 21:11:04-PST Date: 2 Nov 1983 2109-PST Subject: Re: "Same network" isn't necessary good enough for "jumbograms" From: Craig Milo Rogers To: EPS@JPL-VAX, TCP-IP@SRI-NIC In-Reply-To: Your message of 1 Nov 1983 1813 PST Logical hosts per se aren't pertinent. However, diversity of IP implementations is pertinent. If your host is connected only to a local network, and all the hosts on that net (except the gateway) are running the same IP software, then jumbograms are probably safe. (I'm assuming that a host is prepared to receive the largest jumbogram that it is prepared to send.) However, once you mix IP implementations on a net, the odds of some host not supporting jumbograms (or supporting sub-maximal jumbograms for the net) increase. On a net as diverse as the ARPANET, jumbograms aren't safe. So, what we really need are specific guidelines/standards for implementing IP/TCP on top of various networks. It would be nice if we could say, "As of Aug 1984 all ARPANET hosts will support IP packets up to the size limit imposed by the IMP subnet." At the moment, we can't. Craig Milo Rogers ------- ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110221540000> Return-Path: Received: from USC-ISID.ARPA by SRI-NIC with TCP; Thu 3 Nov 83 05:56:41-PST Date: 3 Nov 1983 05:54-PST Sender: HAGAN@USC-ISID Subject: TCP-IP Mailing Lists From: HAGAN@USC-ISID To: TCP-IP@SRI-NIC Cc: Foster@USC-ISID, Gorman@USC-ISID, Hagan@USC-ISID Message-ID: <[USC-ISID] 3-Nov-83 05:54:04.HAGAN> Bryan Gorman has joined SRI, as one of the new software types on-site at the Fort Bragg Testbed. Could you please arrange to put him on all of the pertinent mailing lists relative to TCP-IP development and the notes to or about about the Protocol Police. @ISID Regards, Doug ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110301230000> Return-Path: Received: from USC-ISIB.ARPA by SRI-NIC with TCP; Thu 3 Nov 83 09:25:13-PST Date: 3 Nov 1983 0923-PST Subject: Re: "Same network" isn't necessary good enough for "jumbograms" From: Craig Milo Rogers To: David C. Plummer , ROGERS@USC-ISIB cc: TCP-IP@SRI-NIC, EPS@JPL-VAX In-Reply-To: Your message of 3 November 1983 10:49 EST Reply #1: Modularity IP deals with per-packet processing: routing, fragmenting, reassembly, etc. It has a notion of time (Time-to-live, reassembly time, ICMP "pinging" in some cases), but basically assumes that the packets it processes are unrelated to each other. TCP deals with sequences of packets in time. It retransmits data when unacknowleged, manages windows and window updates, calculates smoothed round trip times, and in some implementations probes the remote site to be assured that it is up. One of the most important properties of a local network (from a practical standpoint) are its delay, capacity, and throughput. The Internet is composed of networks with a wide range of parameters: 1200 bps land lines to 50 Mbit rings to geosynch satellite links with ~1/4 sec delays. A TCP implementation (usually) deals with this variation by selecting an initial set of retransmission parameters for a connection, and adjusting the parameters based on end-to-end round-trip-time. However, because of the wide dynamic range of the network properties mentioned above, the selection of the initial TCP parameters can be critical. The appropriate initial parameters for an Ethernet are different from those for a 1200 bps dial-up line. The effect of selecting the wrong initial parameters may be relatively benign, such as having unnecessary pauses during the start of a connection. On the other hand, the effect may be disasterous, such as flooding a slow-speed line with unnecessary retransmissions. So, it is very important that the TCP module get the proper set of initial parameters. In the general case, a path composed of arbitrary networks, it may not be possible to derive the "optimal" set, and a "conservative" set should be used. However, when it is possible to deduce (by comparing source and destination addresses) that the probable path is single network with well-known properties, then the "optimal" TCP parameters based on those properties may be selected. IP doesn't help here. There is no IP "estimated delay" or "estimated capacity" option. The IP specification doesn't mention maintaining a table of local network properties for use by higher-level protocols as one of the duties of an IP implementation. Perhaps it should. Certainly it can, in individual implementations. In fact, a good implementation might provide a common database of estimated path properties which may be accessed by TCP, TFTP, and any other potentially interested party. So, that's why I uttered "TCP" and "networks" in the same breath. Modularity is in the eye of the beholder. Reply #2: On Mountains and Valleys Imagine a collection of people who live in rough terrain. Widely scattered, they are lonely for each other's company (or perhaps they just want to trade). Some people tried climing the mountains to get to their neighbors, while others used the longer but smoother routes through the valleys below. However, the knowlege of how to get from one point to another wasn't widely known, and not all routes were safe. Many people were killed by avalanches, while others wandered into swamps in the valleys. Eventually a group of people got to gether to set up a system of roads. It seemed to make sense to put the roads in the valleys, because some people just didn't have the extra energy it takes to climb mountains, and its far easier to detour around all the swamps than it is to build avalanche shelters in all the passes (to assure year-round passage). So, the roads were build in the valleys, and signs were posted at the intersections, so you could go from any place to any other place based on its address. However, it takes a long time to wander about in those twisty valleys. Perhaps you have an urgent need to visit your neighbor, and there's only one mountain inbetween, and you are full of energy and have a good sense of direction. The mountains are pretty safe in the summertime, so why not take the shortcut? Let's not make it illegal to hike in the mountains. Instead, lets publish guidelines that people can use to estimate whether they are ready for the trip. Craig Milo Rogers ------- ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110303222400> Return-Path: Received: from USC-ISIF.ARPA by SRI-NIC with TCP; Thu 3 Nov 83 11:24:33-PST Date: 3 Nov 1983 11:22:24 PST From: POSTEL@USC-ISIF Subject: Specs for "How to do IP on Net of Type X" To: TCP-IP@SRI-NIC Hi: I would very much appreciate receiving draft RFCs on "How do/implement/use IP on networks of type X", where X is Ethernet, ARPANET, or anything. There is only one such memo (you could use it as a model), RFC 877 on doing IP on Public Data Nets. --jon. ------- ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110304450000> Return-Path: Received: from MIT-MULTICS.ARPA by SRI-NIC with TCP; Thu 3 Nov 83 06:48:01-PST Date: Thursday, 3 November 1983 09:45 est From: DClark@MIT-MULTICS.ARPA Subject: Re: TCP/IP & PR1ME To: JFisher.Help@USGS1-MULTICS.ARPA cc: TCP-IP@SRI-NIC.ARPA In-Reply-To: Message of 31 October 1983 17:26 est from "JFisher.Help at USGS1-MULTICS" Message-ID: <831103144527.793253@MIT-MULTICS.ARPA> Folks, For info on protocols fro Prime, you might call Dave Jacobs at Prime. His number is 617-879-2960 x4113. Dave ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110305131800> Return-Path: Received: from ucla-locus by SRI-NIC with TCP; Thu 3 Nov 83 13:20:10-PST Date: Thu, 3 Nov 83 13:13:18 PST From: Rich Wales To: TCP-IP@SRI-NIC Subject: Query about "logical" ARPANET hosts I would like more information about the third ("logical host") byte in ARPANET addresses. Specifically, a few people seem to be using this byte either to give multiple identities to a single host (as with Rut- gers' RU-GREEN and RU-BLUE pseudo-hosts) or to allow additional hosts to appear to be on the ARPANET via a transparent gateway (as with the hosts sharing a port with SRI-C3P0 and -- apparently -- ISI-PNG11). On the surface, interfacing a class-C local network to the Internet by using the third byte of the ARPANET address as a local-network-host specifier -- together with some special "smarts" in the host that is actually, physically, attached to the IMP -- seems to me to be a much more reliable approach than publicizing your local net's set of addres- ses, advertising a gateway host, and hoping that people will eventually update their routing tables accordingly. However, I have been unable to find any RFC which says anything about legitimate uses of the "logical host" byte. Can I legally do anything with this byte? If so, whose permission do I need? Also, is the third byte going to be more-or-less permanently available for "logical host" purposes? In particular, when either the ARPANET or the MILNET grows to more than 256 IMP's, is the third byte going to be used as an extension of the IMP number? (I would hope not -- especial- ly since a couple of bits could seemingly be taken from the high-order end of the second or "physical port" byte.) -- Rich Wales ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110305240000> Return-Path: Received: from SRI-KL.ARPA by SRI-NIC with TCP; Thu 3 Nov 83 13:23:47-PST Date: 3 Nov 1983 13:24-PST Sender: BILLW@SRI-KL Subject: I dont understand... From: William "Chops" Westfield To: tcp-ip@NIC Message-ID: <[SRI-KL] 3-Nov-83 13:24:34.BILLW> Isnt IP supposed to take care of fragmenting packets so that TCP doesnt have to worry about such things ? BillW ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110305430000> Return-Path: <@MIT-MC:Hornig%SCRC-QUABBIN@MIT-MC> Received: from MIT-MC by SRI-NIC with TCP; Thu 3 Nov 83 07:42:55-PST Received: from SCRC-YAMASKA by SCRC-QUABBIN with CHAOS; Thu 3-Nov-83 10:41:55-EST Date: Thu, 3 Nov 83 10:43 EST From: Charles Hornig Subject: jumbograms To: tcp-ip@SRI-NIC.ARPA, eps@JPL-VAX.ARPA Message-ID: <831103104313.2.Hornig@QUABBIN.SCRC.Symbolics> Date: 2 Nov 1983 2109-PST From: Craig Milo Rogers Logical hosts per se aren't pertinent. However, diversity of IP implementations is pertinent. If your host is connected only to a local network, and all the hosts on that net (except the gateway) are running the same IP software, then jumbograms are probably safe. (I'm assuming that a host is prepared to receive the largest jumbogram that it is prepared to send.) However, once you mix IP implementations on a net, the odds of some host not supporting jumbograms (or supporting sub-maximal jumbograms for the net) increase. On a net as diverse as the ARPANET, jumbograms aren't safe. So, what we really need are specific guidelines/standards for implementing IP/TCP on top of various networks. It would be nice if we could say, "As of Aug 1984 all ARPANET hosts will support IP packets up to the size limit imposed by the IMP subnet." At the moment, we can't. Lets try, though. In particular, I would like it to be adopted as part of the IP standard on embedding IP in specific transport media what the maximum IP datagram size on that medium is. If we expect different implementations to work together, we NEED this. I suggest, IMP subnet 1008 bytes 10MB Ethernet 1500 bytes Suggestions for other media are welcome. This will at least give us something to work towards. ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110305490000> Return-Path: Received: from MIT-MC by SRI-NIC with TCP; Thu 3 Nov 83 07:48:18-PST Date: 3 November 1983 10:49 EST From: David C. Plummer Subject: Re: "Same network" isn't necessary good enough for "jumbograms" To: ROGERS @ USC-ISIB cc: TCP-IP @ SRI-NIC, EPS @ JPL-VAX Date: 2 Nov 1983 2109-PST From: Craig Milo Rogers So, what we really need are specific guidelines/standards for implementing IP/TCP on top of various networks. It would be nice if we could say, "As of Aug 1984 all ARPANET hosts will support IP packets up to the size limit imposed by the IMP subnet." At the moment, we can't. I'm sorry, but I have to disagree. You are suggesting the ARPANET revert to a special network instead of part of a (little i) internet, namely, the (big I) Internet. The whole purpose of Internet was for all hosts to see the world as a uniform collection of hosts. There is some validity to what you said, but I don't feel your reasons were quite right. By modularity, TCP should know nothing about the local network or networks that are carrying the packets. There are various interactions with IP that are intended to avoid fragmentation. This breaks modularity but exists for practicality. Therefore, you should not have said TCP in the paragraph I included. There IS a guideline for packet sizes of IP packets on a given subnet. "Management" determines the maximum packet size, which could be affected by hardware limitations (e.g. 512 byte ram buffers), specification restrictions (e.g. 1500 byte on Ethernets) or practicality. For the ARPANET "management" currently says 576 bytes. For an Ethernet per/site "management" could say 1500 (and make sure its gateways do fragmentation to the outside world) or 576 bytes to avoid the fragmentation problem as much as possible. ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110305574100> Return-Path: Received: from bbnccs by SRI-NIC with TCP; Thu 3 Nov 83 08:05:13-PST Date: 3 Nov 1983 10:57:41 EST (Thursday) From: Andrew Malis Subject: Re: jumbograms In-Reply-to: Your message of Thu, 3 Nov 83 10:43 EST To: Charles Hornig Cc: tcp-ip@SRI-NIC.ARPA, eps@JPL-VAX.ARPA, malis@BBN-UNIX Charles, The limit for the IMP subnet is 1007 bytes, not 1008. The 1822 specification limits the maximum message size, not including the 1822 leader, to 8063 bits. Since IP datagrams must be an integral number of bytes in length, this restricts the maximum datagram length to 1007 bytes. Andy ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110307450000> Return-Path: Received: from MIT-MULTICS.ARPA by SRI-NIC with TCP; Thu 3 Nov 83 09:53:14-PST Posted-Date: 3 November 1983 12:47 est Date: Thursday, 3 November 1983 12:45 est From: JSLove@MIT-MULTICS.ARPA (J. Spencer Love) Subject: Re: "Same network" isn't necessary good enough for "jumbograms" To: David C. Plummer cc: ROGERS@USC-ISIB.ARPA, TCP-IP@SRI-NIC.ARPA, EPS@JPL-VAX.ARPA In-Reply-To: Message of 3 November 1983 10:49 est from "David C. Plummer" Message-ID: <831103174528.774353@MIT-MULTICS.ARPA> Date: 3 November 1983 10:49 est From: David C. Plummer Subject: Re: "Same network" isn't necessary good enough for "jumbograms" Date: 2 Nov 1983 2109-PST From: Craig Milo Rogers So, what we really need are specific guidelines/standards for implementing IP/TCP on top of various networks. There IS a guideline for packet sizes of IP packets on a given subnet. "Management" determines the maximum packet size, which could be affected by hardware limitations (e.g. 512 byte ram buffers), specification restrictions (e.g. 1500 byte on Ethernets) or practicality. For the ARPANET "management" currently says 576 bytes. For an Ethernet per/site "management" could say 1500 (and make sure its gateways do fragmentation to the outside world) or 576 bytes to avoid the fragmentation problem as much as possible. "Management" currently says that the default packet size is 576 bytes, for the whole Internet, in the absence of any other information. That doesn't mean that you can't send bigger datagrams. The hard limit for the arpanet is about 1005 bytes. The hard limit for some other networks is less than 576. There is no problem with ethernets which can process 1500 byte packets. Let each host send the TCP packet size option when setting up the connection, specifying 1460 (data) byte packets. If either side does not send such an option, then let it be assumed that they specified 536. Each host should then format its packets taking the min of two numbers: the packet size it offers, and the packet size the other side offers. The packet size it offers is presumably based on the hardware limit as well as any limits having to due with buffer allocation (which are better handled using window size). Thus, if MIT-MULTICS sends a TCP SYN to CMU-CS-A packet offering a max packet size of 1005, and CMU-CS-A sends back a packet from behind its gateway specifying a max packet size of 300, then 300 bytes is the max packet size in both directions. This solves the problem for all same-net connections between hosts that implement the TCP packet size option, and makes a start on the other cases. If two networks permitting jumbograms are connected by a third network which doesn't, then this simplistic approach will fail. In this case, the IP layer must be consulted to find out if the max packet size is 1005, 576, or some other number. If the IP layer doesn't know, then it should tell the TCP layer that 576 is the limit (in this sense, the preceding example was contrived, since IP probably wouldn't know that CMU-CS-A was only two hops away). As TCP/IP is currently defined, networks that can't accept 576 byte datagrams are in violation of the standard (albeit in a minor way) and thus should implement the TCP max packet size option to keep the rest of the world from exercising their (possibly nonexistent) fragment reassembly algorithms. ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110310490900> Return-Path: Received: from bbnccq by SRI-NIC with TCP; Thu 3 Nov 83 13:04:48-PST Date: 3 Nov 1983 15:49:09 EST (Thursday) From: Jack Haverty Subject: Re: "Same network" isn't necessary good enough for "jumbograms" In-Reply-to: Your message of Thursday, 3 November 1983 12:45 est To: JSLove@MIT-MULTICS.ARPA (J. Spencer Love) Cc: David C. Plummer , ROGERS@USC-ISIB.ARPA, TCP-IP@SRI-NIC.ARPA, EPS@JPL-VAX.ARPA I think: 1/ a gateway or host which doesn't implement fragmentation or reassembly respectively is simply not conforming to the specification. Those functions are not optional. 2/ Given #1, all a host need do is: a/ not try to send anything to its attached network bigger than that network's max packet size; b/ not try to send anything bigger than 576 (or whatever that number is exactly), without confirming that the receiver will accept bigger things based on a TCP negotiation c/ not try to send anything bigger than some number < 576 which has been obtained via the TCP option from the other end. 3/ There is no need to couple the send and receive parameters. 4/ Gateways must be able to accept packets from their attached networks which are the maximum packet size for that network technology (although for some 'technologies' like raw wires that doesn't work). Nets with snmaller max sizes than 576 are perfectly legal. Jack ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110313380000> Return-Path: Received: from MIT-MC by SRI-NIC with TCP; Thu 3 Nov 83 15:38:04-PST Date: 3 November 1983 18:38 EST From: David C. Plummer Subject: What is modular, anyway? To: JSLove @ MIT-MULTICS, BILLW @ SRI-KL, Haverty @ BBN-UNIX, ROGERS @ USC-ISIB cc: tcp-ip @ SRI-NIC Sorry, the primary recipients got this twice. ------------------------------ Date: 3 Nov 1983 13:24-PST From: William "Chops" Westfield Isnt IP supposed to take care of fragmenting packets so that TCP doesnt have to worry about such things ? Yes. See more below. ------------------------------ Date: 3 Nov 1983 0923-PST From: Craig Milo Rogers Reply #1: Modularity I agree with nearly everything you said here. Your previous messages could have easily led to gross modularity violations if the reader was not careful. (I'm paranoid that there are several non-careful readers.) What you said, I think, is that for a protocol implementation (e.g. TCP) to be practical over a transport layer (e.g. IP), communication may be desirable (maybe even necessary) between the two layers. Communication is FAR different than assumptions. Assumptions are not modular, communication is. We may actually be in 99% agreement on what "modularity" is when appllied to the IP/TCP world. Reply #2: On Mountains and Valleys Amusing. I think I know what you were saying, but it doesn't propose any solutions. ------------------------------ Date: 3 Nov 1983 15:49:09 EST (Thursday) From: Jack Haverty 1/ a gateway or host which doesn't implement fragmentation or reassembly respectively is simply not conforming to the specification. Those functions are not optional. Correct!! 2/ Given #1, all a host need do is: a/ not try to send anything to its attached network bigger than that network's max packet size; No! b/ not try to send anything bigger than 576 (or whatever that number is exactly), without confirming that the receiver will accept bigger things based on a TCP negotiation No!! c/ not try to send anything bigger than some number < 576 which has been obtained via the TCP option from the other end. No!!! You are confusing TCP with IP which is what Craig an I are having a little discussion about. Your point 1 deals with IP, which has nothing to do with the issues of point 2, which deals with TCP. It is perfectly valid for my implementation of TCP (on both sides) to send a max segment size of 5000 bytes. It is further the responsibility of my IP implementation to notice if a packet this large cannot fit over the local network. If it cannot fit, it must perform fragmentation. The foreign host assembles the IP fragments, sends the packet up to TCP which then handles the 5000 byte TCP segment. This is completely valid, probably inefficient, probably impractical, and REQUIRES NO KNOWLEDGE of local network packet sizes or medium types (it could be 1200 baud). It does not require the information, but communication with the transport layer (IP) would greatly improve efficiency. 4/ Gateways must be able to accept packets from their attached networks which are the maximum packet size for that network technology (although for some 'technologies' like raw wires that doesn't work). Nets with snmaller max sizes than 576 are perfectly legal. Again, NO! Maximum packet size on a particular (sub)network is determined by management. This actually has nothing to do with transport layers (e.g. IP)! The implementation of interfaces should be sufficiently flexible to be able to tell each transport layer (e.g. IP) that asks it what the maximum segment size is (a "site variable" determined by management) for the particular (sub)net to which the interface is attached. IP would ask the interfaces this in order to determine when it must fragment over the various interfaces. It must also be prepared to accept each of these sizes over each of the interfaces. Therefore, it can take the MIN for packets being transmitted, but it must be prepared to receive the differing size maximums. Note a few things. Ethernets at different sites could have different IP max packet sizes. Different Ethernets at the same site could have different IP max packet sizes!! This could happen if the site manager puts small address space machines which have little buffering capacity (e.g. PDP-11s) on one Ethernet and large address space machines (just about everything else) on the other Ethernet. It is important that the IP max packet size for a network be a property of the INTERFACE, not the medium type. This is nearly imperative for different implementation to be able to use large packets on a medium that can handle them. Management sets the numbers. When a protocol is installed on a machine, management tells the installer what the local site parameters are. The installer sets the appropriate variables and constants in the interfaces, a process which is dependent upon the implementation of the interfaces and operating system. ------------------------------ Date: Thursday, 3 November 1983 12:45 est From: JSLove@MIT-MULTICS.ARPA (J. Spencer Love) There is no problem with ethernets which can process 1500 byte packets. Let each host send the TCP packet size option when setting up the connection, specifying 1460 (data) byte packets. If either side does not send such an option, then let it be assumed that they specified 536. Each host should then format its packets taking the min of two numbers: the packet size it offers, and the packet size the other side offers. The packet size it offers is presumably based on the hardware limit as well as any limits having to due with buffer allocation (which are better handled using window size). Right idea, wrong modularity. I think it should go something like this: (Note that IP should really be "the transport layer" and TCP should be "the protocol layer", but I leave it as IP and TCP for the purpose of example) IP asks the interfaces what the max IP packet size is for the interfaces. IP slowly gathers routing information. TCP tells IP that it wants to talk to host FOO IP /guesses/ the route to host FOO and caches this. TCP asks IP how big a packet/segment it should deliver to IP when talking to FOO before efficiency (e.g. fragmentation) would degrade service. IP returns a value to TCP based on the guessed route. Unless the route is highly dynamic, the guess will usually be right and accidental fragmentation will not occur. If you really want to get hairy (and depending on the implementation) TCP always asks IP for the packet to use, and IP only lets TCP use as many bytes as the currently determined route should use. TCP mins this with the number of bytes it is most comfortable receiving, and uses this as the max segment size. If it wishes, it can further min this with the max segment size of the foreign implementation, since it may know something about the route that you don't. ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110314190000> Return-Path: Received: from purdue.ARPA by SRI-NIC with TCP; Thu 3 Nov 83 16:19:03-PST Received: from merlin.ARPA by purdue.ARPA; Thu, 3 Nov 83 19:20:37 EST From: Christopher A Kent Message-Id: <8311040019.AA05371@merlin.ARPA> Received: by merlin.ARPA; Thu, 3 Nov 83 19:19:57 EST Date: 3 Nov 1983 1919-EST (Thursday) To: Rich Wales Cc: TCP-IP@SRI-NIC.ARPA Subject: Re: Query about "logical" ARPANET hosts In-Reply-To: Your message of Thu, 3 Nov 83 13:13:18 PST. <8311032302.AA04422> When we first got our TCP up, and it was clear that we needed some extended addressing for our proNET, I thought about this, too; before making such a (for me) momentous decision, I, too, went to others for their opinions. When I suggested it to Rob Gurwitz, I recall that he said something to the effect of "It's a kludge, and should go away. Don't even think it." The "vox Internet" was largely in the same vein. Because of this, I built in-host gateway code for the BBN implementation of TCP/IP for the VAX. It would seem that the uses of the logical host octet are largely undocumented; the only one that I am sure of is the SRI Port Expander black box, which provides some of the "extra smarts" you allude to. I have heard mutterings about these boxes, too -- to the effect of "they cause more troubles than they solve" (admittedly secondhand information). I personally feel that going the gateway route is much more in the spirit of the Internet concept; I also feel it is more flexible. We ran for about six months using my special code in one of our Vaxen as gateway to our class C net; then the use of that machine increased, as did the traffic on our net, so we dedicated an 11/34 to do gatewaying duties, using off-the-shelf code. We also realized that we were going to need more address space than a class C network provided, so we switched to a class B number. Had I been running logical host decoding, I would have been totally at sea; as it was, the effort was quite small. In short, I now agree with Rob's assessment. It might seem like a way to do it, but in the long run, doing it the "right" way pays off. Cheers, chris ---------- ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110314375500> Return-Path: Received: from bbnccq by SRI-NIC with TCP; Thu 3 Nov 83 16:52:42-PST Date: 3 Nov 1983 19:37:55 EST (Thursday) From: Mike Brescia Subject: Re: Query about "logical" ARPANET hosts In-Reply-to: Your message of Thu, 3 Nov 83 13:13:18 PST To: Rich Wales Cc: TCP-ip@nic, brescia@BBN-UNIX IEN-115, 'Address Mappings' dated August '79 outlines the alignment between local net addresses and the 'rest' part of the IP address (that part beyond the 'net' field). I don't have the 'Transition Notebook' with me, but I think that section is included. Nets which have smaller local net address fields than the 3 or 2 bytes in class A or B net addresses have an inherent logical address capability. There was in the past a statement by a usually reliable source that the ARPANET would not expand beyond 255 imps, so the 'third byte' is used by having the routine which maps the IP address (rest) into the arpanet address set the high order imp byte to zero. This code is in the core gateways, among (many) other implementations. Mike Brescia ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110316340000> Return-Path: Received: from MIT-MC by SRI-NIC with TCP; Thu 3 Nov 83 18:35:36-PST Date: 3 November 1983 21:34 EST From: David C. Plummer Subject: Re: Query about "logical" ARPANET hosts To: brescia @ BBN-UNIX cc: TCP-ip @ SRI-NIC, v.wales @ UCLA-LOCUS Date: 3 Nov 1983 19:37:55 EST (Thursday) From: Mike Brescia IEN-115, 'Address Mappings' dated August '79 outlines the alignment between local net addresses and the 'rest' part of the IP address (that part beyond the 'net' field). I don't have the 'Transition Notebook' with me, but I think that section is included. August '79 was before my dealings with networks (any type), but I'll bet there wasn't even such a thing as class B or C addresses. Therefore, I wouldn't trust ANYTHING from that era which mentions host address formats. ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110317370200> Return-Path: Received: from RUTGERS.ARPA by SRI-NIC with TCP; Thu 3 Nov 83 19:40:10-PST Date: 3 Nov 83 22:37:02 EST From: Charles Hedrick Subject: Re: Query about "logical" ARPANET hosts To: v.wales@UCLA-LOCUS.ARPA cc: TCP-IP@SRI-NIC.ARPA In-Reply-To: Message from "Rich Wales " of 3 Nov 83 16:13:18 EST Please note that Rutgers does not use the logical host number for gatewaying. We use it strictly to simplify the relaying of incoming mail. I point this out because we have not asked DCA to authorized us to have a gateway, and I would not want anyone to think we are sneaking around the requirement for approval. However the use suggested by Wales had occured to me, and it sounded like a good idea. ------- ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110406400000> Return-Path: <@MIT-ML:Hornig%SCRC-QUABBIN@MIT-MC> Received: from MIT-ML by SRI-NIC with TCP; Fri 4 Nov 83 09:08:32-PST Received: from SCRC-YAMASKA by SCRC-QUABBIN with CHAOS; Fri 4-Nov-83 11:37:02-EST Date: Fri, 4 Nov 83 11:40 EST From: Charles Hornig Subject: IP on Ethernet draft RFC To: TCP-IP@SRI-NIC.ARPA, Postel@USC-ISIF.ARPA, networks%SCRC-TENEX@MIT-MC.ARPA Message-ID: <831104114041.6.Hornig@QUABBIN.SCRC.Symbolics> (Here is a draft RFC for IP in Ethernet. Comments?) Charles Hornig Symbolics Cambridge Research Center November 1983 A Standard for the Transmission of IP Datagrams Over Ethernet Networks This RFC specifies a standard method of encapsulating Internet Datagram Protocol (IP)[1] datagrams on the Ethernet[2]. Frame Format IP packets are transmitted in standard Ethernet frames. The type field of the Ethernet frame must contain the value X'0800'. The data field contains the IP header followed immediately by the IP data. If necessary, the data field may be padded to meet the Ethernet minimum frame size. This padding is not part of the IP packet and is not included in the total length stored in the IP header. The maximum length of an IP packet sent over an Ethernet is 1500 octets. Implementations are encouraged to support full-length packets. Gateways MUST be prepared to accept full-length packets and fragment them if necessary. If a system cannot receive full-length packets, it should take steps to discourage others from sending them, such as using the TCP Maximum Segment Size option. Note: Packets on the Ethernet may be longer than the general Internet default maximum packet size of 576 octets. Hosts connected to an Ethernet should keep this in mind when sending packets to hosts not on the same Ethernet. It may be appropriate to send smaller packets to avoid unnecessary fragmentation at intermediate gateways. Address Mappings Mappings between 32-bit Internet addresses and 48-bit Ethernet addresses are accomplished through the Address Resolution Protocol[3]. Internet addresses are assigned arbitrarily on some class B or C network. Each host's implementation must know its own Internet address and respond to Ethernet Address Resolution packets appropriately. It should also use the protocol to translate Internet addresses to Ethernet addresses when needed. Trailer Formats Some versions of Unix 4.2bsd use a different encapsulation method in order to get better network performance with the VAX virtual memory architecture. Consenting systems on the same Ethernet may use this format between themselves. No host is required to implement it, and no datagrams in this format should be sent to any host unless the sender has positive knowledge that the recipient will be able to interpret them. References [1] Information Sciences Institute, Internet Protocol. ARPA Network Information Center RFC 791. September 1981. [2] Digital Equipment Corporation, Intel Corporation, Xerox Corporation. The Ethernet. Version 1.0. September 30, 1980. [3] Plummer, David C., An Ethernet Address Resolution Protocol. ARPA Network Information Center RFC 826. November 1982. ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110407423000> Return-Path: Received: from CMU-CS-IUS by SRI-NIC with TCP; Fri 4 Nov 83 09:41:46-PST Date: Friday, 4 November 1983 12:42:30 EST From: Mike.Accetta@CMU-CS-IUS To: Charles Hornig cc: TCP-IP@SRI-NIC.ARPA, Postel@USC-ISIF.ARPA, networks%SCRC-TENEX@MIT-MC.ARPA Subject: Re: IP on Ethernet draft RFC Message-ID: <1983.11.4.17.33.48.Mike.Accetta@CMU-CS-IUS> In-Reply-To: <831104114041.6.Hornig@QUABBIN.SCRC.Symbolics> Charles, Your draft agrees with the standards which have been adopted for use at CMU (i.e. the "straight-forward" encapsulation and use of ARP). I'm curious as to why Class A networks are explicitly excluded from the address mapping, though. - Mike ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110408040000> Return-Path: <@MIT-ML:Hornig%SCRC-QUABBIN@MIT-MC> Received: from MIT-ML by SRI-NIC with TCP; Fri 4 Nov 83 10:00:55-PST Received: from SCRC-YAMASKA by SCRC-QUABBIN with CHAOS; Fri 4-Nov-83 13:00:32-EST Date: Fri, 4 Nov 83 13:04 EST From: Charles Hornig Subject: Re: IP on Ethernet draft RFC To: Mike.Accetta@CMU-CS-IUS.ARPA Cc: TCP-IP@SRI-NIC.ARPA, Postel@USC-ISIF.ARPA In-reply-to: <1983.11.4.17.33.48.Mike.Accetta@CMU-CS-IUS> Message-ID: <831104130440.0.Hornig@QUABBIN.SCRC.Symbolics> Date: Friday, 4 November 1983 12:42:30 EST From: Mike.Accetta@CMU-CS-IUS Your draft agrees with the standards which have been adopted for use at CMU (i.e. the "straight-forward" encapsulation and use of ARP). I'm curious as to why Class A networks are explicitly excluded from the address mapping, though. Since there can only be 1024 interfaces on an Ethernet, I didn't see the need to ever use a class A network number for one. If someone has a legitimate application, I have no objection. ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110408040001> Return-Path: <@MIT-ML:DCP%SCRC-TENEX@MIT-MC> Received: from MIT-ML by SRI-NIC with TCP; Fri 4 Nov 83 10:10:54-PST Received: from SCRC-CHARLES by SCRC-TENEX with CHAOS; Fri 4-Nov-83 13:08:31-EST Date: Friday, 4 November 1983, 13:04-EST From: "David C. Plummer" Subject: IP on Ethernet draft RFC To: Hornig%SCRC-QUABBIN at MIT-MC.ARPA, TCP-IP at SRI-NIC, Postel at USC-ISIF, networks%SCRC-TENEX at MIT-MC.ARPA In-reply-to: <831104114041.6.Hornig@QUABBIN.SCRC.Symbolics> What this boils down to is a choice between the following two options: (1) Define one global maximum for a medium (1500 IP bytes for the Ethernet), or (2) Allow per cable maximums, set by the manager of the medium. An advantage for each: (1) Uniform IP implementations on the Ethernet. (2) Allows a manager to set a smaller limit for the benifit of small address space machines (e.g., IBM PC, PDP-11s). The IBM PC, however, will likely limit the IP packet size by limiting the TCP segment size. A disadvantage for each: (1) Requires all gateways to fragment (I know of at least one that doesn't). (2) Implementators must make setting the maximum an easy part of site installation. Code modulatity may not be quite right to easily allow this. Personally, I think (2) is the *right thing*, but would probably agree that (1) is easier to implement and more practical. Note that similar RFCs are going to be needed for ARPANET, proNET, etc. ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110411585100> Return-Path: Received: from CMU-CS-IUS by SRI-NIC with TCP; Fri 4 Nov 83 14:02:10-PST Date: Friday, 4 November 1983 16:58:51 EST From: Mike.Accetta@CMU-CS-IUS To: Charles Hornig cc: TCP-IP@SRI-NIC.ARPA Subject: Re: IP on Ethernet draft RFC Message-ID: <1983.11.4.21.43.54.Mike.Accetta@CMU-CS-IUS> In-Reply-To: <831104130440.0.Hornig@QUABBIN.SCRC.Symbolics> Charles, One possible reason for not arbitrarily disallowing class A networks is to permit logical Class A IP networks to be constructed from more than one cable (some of which may not even be ethernets). In our case for example, addresses on CMU's Class B IP network actually span around 7 different physical cables including two 3Mb and three 10Mb ethernets among others. - Mike ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110510320000> Return-Path: Received: from purdue.ARPA by SRI-NIC with TCP; Sat 5 Nov 83 12:31:10-PST Received: from merlin.ARPA by purdue.ARPA; Sat, 5 Nov 83 15:32:45 EST From: Christopher A Kent Message-Id: <8311052032.AA04037@merlin.ARPA> Received: by merlin.ARPA; Sat, 5 Nov 83 15:32:13 EST Date: 5 Nov 1983 1532-EST (Saturday) To: Charles Hornig Cc: TCP-IP@SRI-NIC.ARPA, Postel@USC-ISIF.ARPA, networks%SCRC-TENEX@MIT-MC.ARPA Subject: Re: IP on Ethernet draft RFC In-Reply-To: Your message of Fri, 4 Nov 83 11:40 EST. <831104114041.6.Hornig@QUABBIN.SCRC.Symbolics> It would be nice if you could document the trailer encapsulation that Berkeley provides. There are going to be a lot of 4.2 systems out there soon, and there might even be manufacturers that want to be compatible with it. Mike Karels or Sam Leffler should be able to help out with this. Does anyone else feel the need for an encapsulation negotiation protocol, so that decisions about whether or not to use trailers (for example) can be handled well? How is one supposed to get the "positive knowledge" that trailers are supported? I've been wondering if ARP responses couldn't be extended to say "my address is foo, and, by the way, I speak trailers". Comments? Cheers, chris ---------- ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110701090000> Return-Path: Received: from rand-unix by SRI-NIC with TCP; Mon 7 Nov 83 09:27:06-PST Date: Monday, 7 Nov 1983 09:09-PST To: Charles Hornig Cc: Rob Gurwitz , Christopher A Kent , TCP-IP@SRI-NIC, Postel@USC-ISIF Subject: Re: IP on Ethernet draft RFC In-reply-to: Your message of Mon, 7 Nov 83 10:40 EST. <831107104007.8.Hornig@QUABBIN.SCRC.Symbolics> From: obrien@rand-unix I had understood that the performance gain was more than moderate; was, in fact, measured in orders of magnitude or something. Clearly there's a tradeoff. Just how much does trailer protocol buy you? ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110701280000> Return-Path: Received: from SRI-KL.ARPA by SRI-NIC with TCP; Mon 7 Nov 83 09:28:27-PST Date: Mon 7 Nov 83 09:28:00-PST From: Mathis@SRI-KL.ARPA Subject: Re: IP on Ethernet draft RFC To: Hornig%SCRC-QUABBIN@MIT-MC.ARPA cc: TCP-IP@SRI-NIC.ARPA, Postel@USC-ISIF.ARPA, networks%SCRC-TENEX@MIT-MC.ARPA, Mathis@SRI-KL.ARPA In-Reply-To: Message from "Charles Hornig " of Fri 4 Nov 83 11:53:51-PST The "how to attach to net X" document also needs to same something about how a host "finds" gateways (also needs to talk about "finding" a name server but that is a separate mailing list). In the case of Ethernet, how is this done? For example, the implicit rules for attaching to the ARPANET requires hosts to maintain a partial table of gateway addresses since the ARPANET itself will currently give no help to a host in trying to find a gateway. In the PRNET, gateways are always assigned to logical addresses B or C. For Ethernet, ARP could be used to not only translate local addresses into 48 bit Ethernet address but also non-local addresses into 48 bit Ethernet addresses that just happen to be gateways. Do gateways currently do this? ------- ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110704232700> Return-Path: Received: from bbn-vax by SRI-NIC with TCP; Mon 7 Nov 83 06:25:25-PST Date: Mon, 7 Nov 83 9:23:27 EST From: Rob Gurwitz Subject: Re: IP on Ethernet draft RFC In-Reply-To: Your message of 5 Nov 1983 1532-EST (Saturday) To: Christopher A Kent Cc: Charles Hornig , TCP-IP@SRI-NIC.ARPA, Postel@USC-ISIF.ARPA, networks%SCRC-TENEX@MIT-MC.ARPA Yes. Trailers are a kludge and a loss and even some from Xerox admit it. I think there's an IFDEF that turns them off. ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110705360000> Return-Path: <@MIT-ML:Hornig%SCRC-QUABBIN@MIT-MC> Received: from MIT-ML by SRI-NIC with TCP; Mon 7 Nov 83 07:36:38-PST Received: from SCRC-YAMASKA by SCRC-QUABBIN with CHAOS; Mon 7-Nov-83 10:36:45-EST Date: Mon, 7 Nov 83 10:36 EST From: Charles Hornig Subject: Re: IP on Ethernet draft RFC To: karels@ucbarpa, sam@ucbarpa Cc: Christopher A Kent , TCP-IP@SRI-NIC.ARPA, Postel@USC-ISIF.ARPA In-reply-to: <8311052032.AA04037@merlin.ARPA> Message-ID: <831107103618.7.Hornig@QUABBIN.SCRC.Symbolics> From: Christopher A Kent Date: 5 Nov 1983 1532-EST (Saturday) <831104114041.6.Hornig@QUABBIN.SCRC.Symbolics> It would be nice if you could document the trailer encapsulation that Berkeley provides. There are going to be a lot of 4.2 systems out there soon, and there might even be manufacturers that want to be compatible with it. Mike Karels or Sam Leffler should be able to help out with this. Does anyone else feel the need for an encapsulation negotiation protocol, so that decisions about whether or not to use trailers (for example) can be handled well? How is one supposed to get the "positive knowledge" that trailers are supported? I've been wondering if ARP responses couldn't be extended to say "my address is foo, and, by the way, I speak trailers". Comments? Cheers, chris ---------- Could you help out with this? I need a formal specification of the trailer format. ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110705400000> Return-Path: <@MIT-ML:Hornig%SCRC-QUABBIN@MIT-MC> Received: from MIT-ML by SRI-NIC with TCP; Mon 7 Nov 83 07:40:05-PST Received: from SCRC-YAMASKA by SCRC-QUABBIN with CHAOS; Mon 7-Nov-83 10:40:17-EST Date: Mon, 7 Nov 83 10:40 EST From: Charles Hornig Subject: Re: IP on Ethernet draft RFC To: Rob Gurwitz , Christopher A Kent Cc: Charles Hornig , TCP-IP@SRI-NIC.ARPA, Postel@USC-ISIF.ARPA In-reply-to: The message of 7 Nov 83 09:23-EST from Rob Gurwitz Message-ID: <831107104007.8.Hornig@QUABBIN.SCRC.Symbolics> Date: Mon, 7 Nov 83 9:23:27 EST From: Rob Gurwitz Yes. Trailers are a kludge and a loss and even some from Xerox admit it. I think there's an IFDEF that turns them off. I think that it is probably a good idea to document trailers in an appendix to the document. I also think, though, that their presence adds a lot of gratuitous complexity to the standard for a moderate performance gain on only one particular network implementation. How strongly should we discourage their use? ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110705460000> Return-Path: <@MIT-ML:Hornig%SCRC-QUABBIN@MIT-MC> Received: from MIT-ML by SRI-NIC with TCP; Mon 7 Nov 83 07:46:31-PST Received: from SCRC-YAMASKA by SCRC-QUABBIN with CHAOS; Mon 7-Nov-83 10:46:52-EST Date: Mon, 7 Nov 83 10:46 EST From: Charles Hornig Subject: Re: IP on Ethernet draft RFC To: Mike.Accetta@CMU-CS-IUS.ARPA Cc: TCP-IP@SRI-NIC.ARPA In-reply-to: <1983.11.4.21.43.54.Mike.Accetta@CMU-CS-IUS> Message-ID: <831107104646.0.Hornig@QUABBIN.SCRC.Symbolics> Date: Friday, 4 November 1983 16:58:51 EST From: Mike.Accetta@CMU-CS-IUS Charles, One possible reason for not arbitrarily disallowing class A networks is to permit logical Class A IP networks to be constructed from more than one cable (some of which may not even be ethernets). In our case for example, addresses on CMU's Class B IP network actually span around 7 different physical cables including two 3Mb and three 10Mb ethernets among others. Agreed. Change made. ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110706043500> Return-Path: Received: from bbn-vax by SRI-NIC with TCP; Mon 7 Nov 83 08:18:07-PST Date: Mon, 7 Nov 83 11:04:35 EST From: Rob Gurwitz Subject: Re: IP on Ethernet draft RFC In-Reply-To: Your message of Mon, 7 Nov 83 10:40 EST To: Charles Hornig Cc: Rob Gurwitz , Christopher A Kent , TCP-IP@SRI-NIC.ARPA, Postel@USC-ISIF.ARPA I think they should be strongly discouraged. As you point out, they are of moderate benefit in a very narrow environment and probably cost more in implementation complexity (both inside and between hosts, i.e. inventing an option negotiation) than they are worth. Now I'm sure there are people who would argue the other side vehemently, but if you're talking about trying to set a standard for people to conform to it's appropriate to take strong stands. On another issue, I take exception to some of the strong modularity issues that have been discussed recently. They seem to imply that performance should be sacrificed for some abstract idea of modularity. I am more of a pragmatist than that. Sure, we should have well defined boundaries and paramaterized implementations for flexibility. On the other hand, the good implementer will look for ways of taking advantage of all the information he can glean from various levels. That may mean that TCP can find out about what the appropriate segment sizes it can send on a net that avoids IP fragmentation. I see nothing wrong with it as long as it is not made specific to one net, internet, or implementation. After all, IP when making a routing "guess" is taking advantage of information it gets or infers from the local net layer. ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110706065700> Return-Path: Received: from bbnccq by SRI-NIC with TCP; Mon 7 Nov 83 08:23:04-PST Date: 7 Nov 1983 11:06:57 EST (Monday) From: Jonathan Dreyer Subject: Re: IP on Ethernet draft RFC In-Reply-to: Your message of Mon, 7 Nov 83 10:40 EST To: Charles Hornig Cc: Rob Gurwitz , Christopher A Kent , TCP-IP@SRI-NIC.ARPA, Postel@USC-ISIF.ARPA Exceptions to a standard almost never help anyone. If some implementations want to violate the standard and keep to themselves, that's their business, but I see no reason to sanction this in the spec. ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110707110000> Return-Path: Received: from lbl-csam.ARPA by SRI-NIC with TCP; Mon 7 Nov 83 15:07:48-PST From: mo@LBL-CSAM (Mike O'Dell[Group-L]) Return-Path: Message-Id: <8311072311.AA05605@lbl-csam.ARPA> Received: by lbl-csam.ARPA ; Mon, 7 Nov 83 15:11:30 PST Date: 7 Nov 1983 1511-PST (Monday) To: obrien@rand-unix Cc: gurwitz@BBN-VAX, cak@PURDUE, TCP-IP@SRI-NIC, Postel@USC-ISIF, Hornig%SCRC-QUABBIN@MIT-MC Subject: Re: IP on Ethernet draft RFC In-Reply-To: Your message of Monday, 7 Nov 1983 09:09-PST. <8311071742.AA01945@lbl-csam.ARPA> The improvement is routinely about 20-30% according to measurements we did. In the future, on systems with tear-away reads and writes, trailers make it possible to get packets from user address space to user address space with the only data copies those done by the DMA hardware. This is a decidedly non-trivial gain. -Mike ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110707470000> Return-Path: Received: from SRI-KL.ARPA by SRI-NIC with TCP; Mon 7 Nov 83 15:49:49-PST Date: 7 Nov 1983 15:47-PST Sender: BILLW@SRI-KL Subject: modularity vs efficiency in fragmentation. From: William "Chops" Westfield To: tcp-ip@NIC Message-ID: <[SRI-KL] 7-Nov-83 15:47:32.BILLW> Hmm. I did think of this. however, thinking abot it further, it isnt inherently obvious that fragmenting a large piece of data at IP level isnt MORE efficient than fragmenting it at TCP level. After all, an IP packet is a simpler thing, and in general requires less effort on the part of the OS than a TCP packet. This assumes that most packets are not required to be retransmitted. If the packet does end up needing to be retransmitted, then the relative efficiencey of fragmentation at TCP level goes up, since less data is likely to need retransmitting. Comments? Bill W ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110708050000> Return-Path: Received: from purdue.ARPA by SRI-NIC with TCP; Mon 7 Nov 83 10:04:56-PST Received: from merlin.ARPA by purdue.ARPA; Mon, 7 Nov 83 13:06:15 EST From: Christopher A Kent Message-Id: <8311071805.AA27093@merlin.ARPA> Received: by merlin.ARPA; Mon, 7 Nov 83 13:05:46 EST Date: 7 Nov 1983 1305-EST (Monday) To: obrien@rand-unix.ARPA Cc: Rob Gurwitz , Christopher A Kent , TCP-IP@SRI-NIC.ARPA, Postel@USC-ISIF.ARPA, Charles Hornig Subject: Re: IP on Ethernet draft RFC In-Reply-To: Your message of Monday, 7 Nov 1983 09:09-PST. <8311071728.AA26516> I think Mike has got the right ballpark. I have become convinced that trailers are not just a kludge; designing with them in mind can simplify an implementation, rather than making it more complex. The problem is that in an Internet environment, you can't necessarily take full advantage. As I understand it, the two advantages of trailer encapsulation are: - you can do most of your i/o out of page-aligned buffers, which makes the dma much more efficient - on input, you don't have to save header state until you know you have a fully undamaged input packet. That is, you can collect data buffers until they're all in, and only then do you need to interpret the header, which is precisely when it shows up. If you run out of buffers, or the packet is short, or whatever, you can just stop listening, and not have to worry about undoing whatever you did to keep the header state around. Since we (the Internet community) have to implement both trailer and non-trailer encapsulations, and the non-trailer is most straightforward, the trailer encapsulation is often viewed as "a bag on the side" of the driver. The trailer encapsulation need not be restricted to Ethernets; the encapsulation also exists, for example, on proNET 10Mb ring nets. I don't think we should regard this as a deviation from the standard (what standard, ?); we're trying to define the standard right now, so there is no such thing. It's a different way of looking at things, and shouldn't be tossed out without careful consideration. 4.2bsd is going to be running on a lot of Ethernets in the near future; if we're documenting Ethernet/IP behaviour, we can't just ignore this and hope it goes away. Cheers, chris ---------- ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110708210000> Return-Path: Received: from purdue.ARPA by SRI-NIC with TCP; Mon 7 Nov 83 10:20:51-PST Received: from merlin.ARPA by purdue.ARPA; Mon, 7 Nov 83 13:22:19 EST From: Christopher A Kent Message-Id: <8311071821.AA27193@merlin.ARPA> Received: by merlin.ARPA; Mon, 7 Nov 83 13:21:50 EST Date: 7 Nov 1983 1321-EST (Monday) To: "David C. Plummer" Cc: Hornig%SCRC-QUABBIN@MIT-MC.ARPA, TCP-IP@SRI-NIC.ARPA, Postel@USC-ISIF.ARPA, networks%SCRC-TENEX@MIT-MC.ARPA Subject: Re: IP on Ethernet draft RFC In-Reply-To: Your message of Friday, 4 November 1983, 13:04-EST. <8311042153.AA23172> There have been some rumblings about building an encapsulation negotiation protocol, that would allow hosts to exchange information about trailers, mtus, and whatever else might come up. I don't know if I like the idea of different mtus on a single cable, but it might be useful if you need to put small machines on the same cable with big ones. In general, I don't think we should try to legislate a maximum, since there will always be someone who wants to tinker. And anything that is a reason to make all gateways be able to fragment is a good thing, in my book. Cheers, chris ---------- ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110708510000> Return-Path: Received: from purdue.ARPA by SRI-NIC with TCP; Mon 7 Nov 83 10:58:50-PST Received: from merlin.ARPA by purdue.ARPA; Mon, 7 Nov 83 14:00:11 EST From: Christopher A Kent Message-Id: <8311071851.AA28188@merlin.ARPA> Received: by merlin.ARPA; Mon, 7 Nov 83 13:51:13 EST Date: 7 Nov 1983 1351-EST (Monday) To: David C. Plummer Cc: tcp-ip@SRI-NIC.ARPA, JSLove@MIT-MULTICS.ARPA, BILLW@SRI-KL.ARPA, Haverty@BBN-UNIX.ARPA, ROGERS@USC-ISIB.ARPA Subject: Re: What is modular, anyway? In-Reply-To: Your message of 3 November 1983 18:38 EST. <8311040112.AA06567> Modularity is all well and good, as long as it doesn't preclude communication between layers. I'm currently involved with a TCP implementation that doesn't communicate with the IP layer about segment sizes, and therefore tries to negotiate a max segment size of 1024 on the arpanet. If you go through a gateway, the tiny fragments at the end often get lost by an Ethernet or proNET receiver that doesn't cycle fast enough to pick it up, leading to lost packets, large round trip times, and generally lousy performance. I would contend that it is NOT valid for two cooperating TCPs to try to exchange 5000 octet segments without consulting the underlying IP. If the IP can't reassemble packets of 5000+80 (or so) octets, you can't communicate this way! At least as I understand it, a TCP segment must be wholly encapsulated within a single IP datagram. If it is known that both IPs can handle datagrams of such size, then I agree that mtu on intervening (sub-)networks is not an issue; the IP layer must handle all fragmentation and reassembly, and intervening gateways should fragment fragments if necessary because of "impedance mismatches". ------ From: David C. Plummer Subject: What is modular, anyway? Date: 3 November 1983 18:38 EST 4/ Gateways must be able to accept packets from their attached networks which are the maximum packet size for that network technology (although for some 'technologies' like raw wires that doesn't work). Nets with snmaller max sizes than 576 are perfectly legal. Again, NO! Maximum packet size on a particular (sub)network is determined by management. This actually has nothing to do with transport layers (e.g. IP)! The implementation of interfaces should be sufficiently flexible to be able to tell each transport layer (e.g. IP) that asks it what the maximum segment size is (a "site variable" determined by management) for the particular (sub)net to which the interface is attached. IP would ask the interfaces this in order to determine when it must fragment over the various interfaces. It must also be prepared to accept each of these sizes over each of the interfaces. Therefore, it can take the MIN for packets being transmitted, but it must be prepared to receive the differing size maximums. ----- I don't get this. I would have said (and still would say) YES! to item 4. A gateway must be able to accept packets up to the maximum packet size expected on that network. If your gripe is that the administration may choose to run with a smaller maximum packet than the technology allows (i.e. proNET lets me send 2046-octet packets, but I never send a packet larger than 1536), then yes, I agree with you, though this is a fine point to me. Perhaps that's why you made the distinction; however, your reason for objection was (to me) as subtle as the possible misunderstanding in point 4. I agree that it's the managerial maximum that must be met by gateways. Cheers, chris ---------- ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110709022800> Return-Path: Received: from bbnccq by SRI-NIC with TCP; Mon 7 Nov 83 11:22:01-PST Date: 7 Nov 1983 14:02:28 EST (Monday) From: Jack Haverty Subject: Re: IP on Ethernet draft RFC In-Reply-to: Your message of 7 Nov 1983 1305-EST (Monday) To: Christopher A Kent Cc: obrien@rand-unix.ARPA, Rob Gurwitz , TCP-IP@SRI-NIC.ARPA, Postel@USC-ISIF.ARPA, Charles Hornig Chris hit the nail on the head -- the task is to define the standard. One question is, standard for what? It seems plausible given the recent messages that a standard might have two sections at first, the 'default' way, and the '4.2bsd' way, which differs from the default for performance reasons. I'm not arguing against that, but I wonder if that means there will be a stream of additions as more machines come on the network in large numbers. Perhaps we'll see a 'backwards-byte' way for some machine, in which all data bytes in the packet are swapped to improve efficiency. And that of course will come in two subversions, with and without trailers. Then maybe there will be a '36-bit' format, for use by consenting 36-bit machines to avoid wasting bits and/or cpu time to pack and unpack. Who knows what else might be appropriate to improve efficiency. Defining a standard seems to me to be very difficult, since all standards are compromises between the advantages of customization and the advantages of uniformity. A standard with only one 'wart' for one special case doesn't look too bad, but is that the end? I wonder why no one is arguing that the IP encapsulation for Ethernets between two machines of type-X should use a stripped-down IP header -- after all, there is no gateway so why worry about fragmentation? Etc., etc., etc. What do people think the purpose of a standard is? To make more canned implementations available? To make it possible for all implementations to interact? To create efficient use of network/host resources? To ease software maintenance? Jack ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110709270000> Return-Path: <@MIT-ML:Hornig%SCRC-QUABBIN@MIT-MC> Received: from MIT-ML by SRI-NIC with TCP; Mon 7 Nov 83 11:39:28-PST Received: from SCRC-YAMASKA by SCRC-QUABBIN with CHAOS; Mon 7-Nov-83 14:23:34-EST Date: Mon, 7 Nov 83 14:27 EST From: Charles Hornig Subject: Re: IP on Ethernet draft RFC To: Christopher A Kent , obrien@RAND-UNIX.ARPA Cc: Rob Gurwitz , TCP-IP@SRI-NIC.ARPA, Postel@USC-ISIF.ARPA, Charles Hornig In-reply-to: <8311071805.AA27093@merlin.ARPA> Message-ID: <831107142731.3.Hornig@QUABBIN.SCRC.Symbolics> I don't understand your point about header state. All of the network implementations I have worked with read in the whole packet before processing the header. What does this have to do with trailers? Also, the Internet community does not have to implement trailer protocols. You may want to, in order to get better VAX Unix performance, but it is never necessary. I'm sure that a lot of people without Unix systems would rather not do it. I agree that it should be documented. I will put it in an appendix to the RFC if someone will send me a copy of the specification. ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110709560900> Return-Path: Received: from bbn-vax by SRI-NIC with TCP; Mon 7 Nov 83 12:05:14-PST Date: Mon, 7 Nov 83 14:56:09 EST From: Rob Gurwitz Subject: Re: IP on Ethernet draft RFC In-Reply-To: Your message of 7 Nov 1983 1305-EST (Monday) To: Christopher A Kent Cc: obrien@rand-unix.ARPA, Rob Gurwitz , TCP-IP@SRI-NIC.ARPA, Postel@USC-ISIF.ARPA, Charles Hornig OK, perhaps moderate performance increase is understating the case. However, if millisecond per packet performance is needed for a diskless workstation application (Berkeley's intended goal), then perhaps it's IP that's inappropriate. After all, the purpose of TCP/IP (one of them) is interoperability. Since I can't interoperate with the funny trailer protocol, why use IP? After all, once I go off through a gateway, I'm not going to get my millisecond performance anymore. It sounds like a diskless workstation application using TCP/IP is just wrong. There's too much protocol there. And speaking of modularity, the reason for the trailer protocol in the first place was the performance of some Ethernet drivers with the Unibus. Must I be subjected to some weird variant of IP just to make up for poor performance of some hardware? I really dispute the fact that "we (the Internet community) have to implement both trailer and non-trailer encapsulations." It seems like we are bending inside out for a specific and very narrow case. ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110710140000> Return-Path: Received: from purdue.ARPA by SRI-NIC with TCP; Mon 7 Nov 83 12:14:19-PST Received: from merlin.ARPA by purdue.ARPA; Mon, 7 Nov 83 15:15:26 EST From: Christopher A Kent Message-Id: <8311072014.AA29620@merlin.ARPA> Received: by merlin.ARPA; Mon, 7 Nov 83 15:14:54 EST Date: 7 Nov 1983 1514-EST (Monday) To: Charles Hornig Cc: Rob Gurwitz , TCP-IP@SRI-NIC.ARPA, Postel@USC-ISIF.ARPA, Charles Hornig , Christopher A Kent , obrien@RAND-UNIX.ARPA Subject: Re: IP on Ethernet draft RFC In-Reply-To: Your message of Mon, 7 Nov 83 14:27 EST. <831107142731.3.Hornig@QUABBIN.SCRC.Symbolics> I think that the argument about header state is that you have to hang on to a copy of the header. If you're buffer limited, this can be a win; also the overhead of allocating and deallocating the buffer(s) for the header can be avoided. The points about standards and variations is well taken; there should be a base-level encapsulation, which is expected to be implemented by all systems that speak IP on an Ethernet. Jack Haverty, I don't know if there will be a stream of variations on encapsulation in the future. I think that standards are there to make it possible for all implementations to interact, and to document possible variations. Some variations are reasonable; it is not reasonable to cast a poor implementation into stone just because "it's the standard". (Perhaps this is a variation on "rules were made to be broken".) Since in this case we are defining a standard after the fact, we must try to consider all the implementations that exist. (I am reminded of a group of people that built a 3Mb Ether version of Pup without swapping bytes into the pup-defined network natural order; the efficiency hack was totally wrong in this case, which was discovered as soon as they tried to talk to a "real" pup implementation.) I think it's unfortunate that the trailer encapsulation format was implemented without specifying a way for machines wishing to use it to communicate this fact. This is a gross oversight. But ignoring it and hoping it will go away won't work. What I'm trying to insure is that if there are widely accepted variations, we don't just dismiss them out of hand because they aren't of interest to a particular group (say, the writers of the standards). Rob Gurwitz, I would tend to agree with you that IP is out of place in a diskless workstation environment. We live in a world of compromises; not everyone has the resources to implement a new protocol for their local network use, and the translation gateways needed to maintain interoperability with the rest of the Internet. Both of these are reasonable goals. Why begrudge the compromise in this case? I seem to have been pressed into the sole defender of the right for this variation to exist. I don't believe that it's "wrong" just because it's different. I think it's a mistake to ignore it. Am I the only one? Should I consider myself outnumbered and drop it? Cheers, chris ---------- ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110711483900> Return-Path: Received: from SRI-KL.ARPA by SRI-NIC with TCP; Mon 7 Nov 83 19:51:30-PST Date: Mon 7 Nov 83 19:48:39-PST From: Mathis@SRI-KL.ARPA Subject: Re: IP on Ethernet draft RFC To: Mike.Accetta@CMU-CS-IUS.ARPA cc: TCP-IP@SRI-NIC.ARPA, Mathis@SRI-KL.ARPA In-Reply-To: Message from "Mike.Accetta@CMU-CS-IUS" of Mon 7 Nov 83 18:06:53-PST Mike, Finding gateways/name servers isn't necessarily an aspect of a general resource location problem (if you don't mean general in the extreme), but an aspect of the "getting started" procedures for a given network. I make this distinction since gateways and (to some extent) host-name servers are resources at the network protocol level rather than resources at mail or user levels, for example. In any event, a gateway is more than a box that happens to speak EGP/GGP in the same manner that a name server is more than a TCP/UDP service. The need to "find" gateways/name servers is really a reflection on the inadequacies/limitations on our networks. As a host, I would much rather hand the ARPANET (for example) a message containing an IP datagram and let the IMPs figure our which gateway to route it to rather than have to deal with pinging and built-in gateway tables; or at least let the ARPANET get the message to a functional gateway that will return a redirect with the right gateway address. If gateways responded to ARP requests for host outside of the local Ethernet, then the 48-bit address of the proper, first-hop gateway would be returned to you. The IP module then wouldn't really care if the destination was local or remote for purposes of routing. Among many other items, the "how to connect" RFCs need to describes how a host, starting from ground-zero, knows about its IP address (which may or may not be the same as its LN address), knows how to route packets to places outside its network (which may or may not be the same procedures used for internal routing), and how to resolve names (of external internet entities) to addresses. It has been suggested that a "getting started" protoco/server needs to be developed. ------- ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110713360100> Return-Path: Received: from USC-ISIF.ARPA by SRI-NIC with TCP; Mon 7 Nov 83 21:35:44-PST Date: 7 Nov 1983 21:36:01 PST From: POSTEL@USC-ISIF Subject: re: TCP Maximum Segment Size Option To: tcp-ip@SRI-NIC < INC-PROJECT, MAX-SEG-SIZ.NLS.14, >, 7-Nov-83 21:32 JBP ;;;; This memo discusses the TCP Maximum Segment Size and its relation to the IP Maximum Datagram Size. This discussion is necessary because the current specification of this TCP option is ambiguous. Much of the difficulty with understanding these sizes and their relationship has been due to the variable size of the IP and TCP headers. There have been some assumptions made about using other than the default size for datagrams with some unfortunate results. HOSTS MUST NOT SEND DATAGRAMS LARGER THAN 576 OCTETS UNLESS THEY HAVE SPECIFIC KNOWLEDGE THAT THE DESTINATION HOST IS PREPARED TO ACCEPT LARGER DATAGRAMS. This is a long established rule. To resolve the ambiguity in the TCP Maximum Segment Size option definition the following rule is established: THE TCP MAXIMUM SEGMENT SIZE IS THE IP MAXIMUM DATAGRAM SIZE MINUS FORTY. The default IP Maximum Datagram Size is 576. The default TCP Maximum Segment Size is 536. 1. The IP Maximum Datagram Size Hosts are not required to reassemble infinitely large IP datagrams. The maximum size datagram that all hosts are required to accept or reassemble from fragments is 576 octets. The maximum size reassembly buffer every host must have is 576 octets. Hosts are allowed to accept larger datagrams and assemble fragments into larger datagrams, hosts may have buffers as large as they please. Hosts must not send datagrams larger than 576 octets unless they have specific knowledge that the destination host is prepared to accept larger datagrams. 2. The TCP Maximum Segment Size Option TCP provides an option that may be used at the time a connection is established (only) to indicate the maximum size TCP segment that can be accepted on that connection. This Maximum Segment Size (MSS) announcement (often mistakenly called a negotiation) is sent from the data receiver to the data sender and says "I can accept TCP segments up to size X". The size (X) may be larger or smaller than the default. The MSS can be used completely independently in each direction of data flow. The result may be quite different maximum sizes in the two directions. The MSS counts only data octets in the segment, it does not count the TCP header or the IP header. A footnote: The MSS value counts only data octets, thus it does not count the TCP SYN and FIN control bits even though SYN and FIN do consume TCP sequence numbers. 3. The Relationship of TCP Segments and IP Datagrams TCP segment are transmitted as the data in IP datagrams. The correspondence between TCP segments and IP datagrams must be one to one. This is because TCP expects to find exactly one complete TCP segment in each block of data turned over to it by IP, and IP must turn over a block of data for each datagram received (or completely reassembled). 4. Layering and Modularity TCP is an end to end reliable data stream protocol with error control, flow control, etc. TCP remembers many things about the state of a connection. IP is a one shot datagram protocol. IP has no memory of the datagrams transmitted. It is not possible for IP to keep any information about the maximum datagram size a particular destination might be capable of accepting. IP may keep some information for routing purposes on a per network basis. There is no current requirement that IP keep information on a per host basis. TCP and IP are distinct layers in the protocol architecture, and are often implemented in distinct program modules. Some people seem to think that there must be no communication between protocol layers or program modules. There must be communication between layers and modules, but it should be carefully specified and controlled. One problem in understanding the correct view of communication between protocol layers or program modules in general, or between TCP and IP in particular is that the documents on protocols are not very clear about it. This is often because the documents are about the protocol exchanges between machines, not the program architecture within a machine, and the desire to allow many program architectures with different cuts at modularizing the implementation. 5. The Relationship between IP Datagram and TCP Segment Sizes The relationship between the value of the maximum IP datagram size and the maximum TCP segment size is obscure. The problem is that both the IP header and the TCP header may vary in length. The TCP Maximum Segment Size option (MSS) is defined to specify the maximum number of data octets in a TCP segment exclusive of TCP (or IP) header. To notify the data sender of the largest TCP segment it is possible to receive the calculation of the MSS value to send is: MSS = MTU - sizeof(TCPHDR) - sizeof(IPHDR) On receipt of the MSS option the calculation of the size of segment that can be sent is: SndMaxSegSiz = MIN((MTU - sizeof(TCPHDR) - sizeof(IPHDR)), MSS) where MSS is the value in the option, and MTU is the Maximum Transmission Unit (or the maximum packet size) allowed on the directly attached network. This begs the question, though. What value should be used for the "sizeof(TCPHDR)" and for the "sizeof(IPHDR)"? There are three reasonable positions to take: the conservative, the moderate, and the liberal. The conservative or pessimistic position assumes the worst -- that both the IP header and the TCP header are maximum size, that is, 60 octets each. MSS = MTU - 60 - 60 = MTU - 120 If MTU is 576 then MSS = 456 The moderate position assumes the that the IP is maximum size (60 octets) and the TCP header is minimum size (20 octets), because there are no TCP header option field currently defined that would normally be sent at the same time as data segments. MSS = MTU - 60 - 20 = MTU - 80 If MTU is 576 then MSS = 496 The liberal or optimistic position assumes the best -- that both the IP header and the TCP header are minimum size, that is, 20 octets each. MSS = MTU - 20 - 20 = MTU - 40 If MTU is 576 then MSS = 536 If nothing is said about MSS, the data sender may cram as much as possible into a 576 octet datagram, and if the datagram has minimum headers (which is most likely), the result will be 536 data octets in the TCP segment. The rule relating MSS to the maximum datagram size ought to be consistent with this. A practical point is raised in favor of the liberal position too. Since the use of minimum IP and TCP headers is very likely in the very large percentage of cases, it seems wasteful to limit the TCP segment data to so much less than could be transmitted at once, especially since it is less that 512 octets. For comparison: 536/576 is 93% data, 496/576 is 86% data, 456/576 is 79% data. 6. Maximum Packet Size Each network has some maximum packet size, or maximum transmission unit (MTU). Ultimately there is some limit imposed by the technology, but often the limit is an engineering choice or even an administrative choice. Different installations of the same network product do not have to use the same maximum packet size. Even within one installation not all host must use the same packet size (this way lies madness, though). Some IP implementers have assumed that all hosts on the directly attached network will be the same or at least run the same implementation. This is a dangerous assumption. It has often developed that after a small homogeneous set of host have become operational additional hosts of different types are introduced into the environment. And it has often developed that it is desired to use a copy of the implementation in a different inhomogeneous environment. Designers of gateways should be prepared for the fact that successful gateways will be copied and used in other situation and installations. Gateways must be prepared to accept datagrams as large as can be sent in the maximum packets of the directly attached networks. And gateway implementations should be easily configured for installation in different circumstances. A footnote: The MTUs of some popular networks, note that the actual limit in some installations may be set lower by administrative policy: ARPANET, MILNET = 1007 Ethernet (10Mb) = 1500 Proteon PRONET = 2046 7. Source Fragmentation A source host would not normally create datagram fragments. Under normal circumstances datagram fragments only arise when a gateway must send a datagram into a network with a smaller maximum packet size than the datagram. In this case the gateway must fragment the datagram (unless it is marked "don't fragment" in which case it is discarded, with the option of sending an ICMP message to the source reporting the problem). It might be desirable for the source host to send datagram fragments if the maximum segment size (default or negotiated) allowed by the data receiver were larger than the maximum packet size allowed by the directly attached network. However, such datagram fragments must not combine to a size larger than allowed by the destination host. For example, if the receiving TCP announced that it would accept segments up to 5000 octets (in cooperation with the receiving IP) then the sending TCP could give such a large segment to the sending IP provided the sending IP would send it in datagram fragments that fit in the packets of the directly attached network. 8. Gateway Fragmentation Gateways must be prepared to do fragmentation. It is not an optional feature for a gateway. Gateways have no information about the size of datagrams destination hosts are prepared to accept. It would be inappropriate for gateways to attempt to keep such information. Gateways only know the 576 rule. Gateways must be prepared to accept the largest datagrams that are allowed on each of the directly attached networks, even if it is larger than 576 octets. Gateways must be prepared to fragment datagrams to fit into the packets of the next network, even if it smaller than 576 octets. If a source host thought to take advantage of the local network's ability to carry larger datagrams but doesn't have the slightest idea if the destination host can accept larger than default datagrams and expects the gateway to fragment the datagram into default size fragments, then the source host is misguided. If indeed, the destination host can't accept larger than default datagrams, it probably can't reassemble them either. If the gateway either passes on the large datagram whole or fragments into default size fragments the destination will not accept it. Thus, this mode of behavior by source hosts must be outlawed. A larger than default datagram can only arrive at a gateway because the source host knows that the destination host can handle such large datagrams (probably because the destination host announced it to the source host in an TCP MSS option). Thus, the gateway should pass on this large datagram in one piece or in the largest fragments that fit into the next network. An interesting conclusion is that even though the gateways know the 576 rule, it is irrelevant to them. 9. Inter-Layer Communication The Network Driver (ND) or interface should know the Maximum Transmission Unit (MTU) of the directly attached network. The IP should ask the Network Driver for the Maximum Transmission Unit. The TCP should ask the IP for the Maximum Datagram Data Size (MDDS). This is the MTU minus the IP header length (MDDS = MTU - IPHdrLen). When opening a connection TCP can send an MSS option with the value equal MDDS - TCPHdrLen. TCP should determine the Maximum Segment Data Size (MSDS) from either the default or the received value of the MSS option. TCP should determine if source fragmentation is possible (by asking the IP) and desirable. If so TCP may hand to IP segments (including the TCP header) up to MSDS + TCPHdrLen. If not TCP may hand to IP segments (including the TCP header) up to the lesser of (MSDS + TCPHdrLen) and MDDS. IP checks the length of data passed to it by TCP. If the length is less than or equal MDDS, IP attached the IP header and hands it to the ND. Otherwise the IP must do source fragmentation. 10. What is the Default MSS ? Another way of asking this question is "What transmitted value for MSS has exactly the same effect of not transmitting the option at all?". In terms of the previous section: The default assumption is that the Maximum Transmission Unit is 576 octets. MTU = 576 The Maximum Datagram Data Size (MDDS) is the MTU minus the IP header length. MDDS = MTU - IPHdrLen = 576 - 20 = 556 When opening a connection TCP can send an MSS option with the value equal MDDS - TCPHdrLen. MSS = MDDS - TCPHdrLen = 556 - 20 = 536 TCP should determine the Maximum Segment Data Size (MSDS) from either the default or the received value of the MSS option. Default MSS = 536, then MSDS = 536 TCP should determine if source fragmentation is possible and desirable. If so TCP may hand to IP segments (including the TCP header) up to MSDS + TCPHdrLen (536 + 20 = 556). If not TCP may hand to IP segments (including the TCP header) up to the lesser of (MSDS + TCPHdrLen (536 + 20 = 556)) and MDDS (556). 11. The Truth The rule relating the maximum IP datagram size and the maximum TCP segment size is: TCP Maximum Segment Size = IP Maximum Datagram Size - 40 The rule must match the default case. If the TCP Maximum Segment Size option is not transmitted then the data sender is allowed to send IP datagrams of maximum size (576) with a minimum IP header (20) and a minimum TCP header (20) and thereby be able to stuff 536 octets of data into each TCP segment. The definition of the MSS option can be stated: The maximum number of data octets that may be received by the sender of this TCP option in TCP segments with no TCP header options transmitted in IP datagrams with no IP header options. 12. The Consequences When TCP is used in a situation when either the IP or TCP headers are not minimum and yet the maximum IP datagram that can be received remains 576 octets then the TCP Maximum Segment Size option must be used to reduce the limit on data octets allowed in a TCP segment. For example, if the IP Security option (11 octets) were in use and the IP maximum datagram size remained at 576 octets, then the TCP should send the MSS with a value of 525 (536-11). --jon. ------- ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110714524300> Return-Path: Received: from CMU-CS-IUS by SRI-NIC with TCP; Mon 7 Nov 83 16:54:50-PST Date: Monday, 7 November 1983 19:52:43 EST From: Mike.Accetta@CMU-CS-IUS To: TCP-IP@SRI-NIC.ARPA Subject: Re: IP on Ethernet draft RFC Message-ID: <1983.11.7.23.52.46.Mike.Accetta@CMU-CS-IUS> 1) Finding gateways and name servers are two specific examples of a more general resource location problem. In the absence of a Internet standard, we are probably going to define a local resource location protocol on top of IP which looks something like: OpCode= (Protocol-ID, port#) ... (Protocol-ID, port#) with a symmetric reply OpCode= (Protocol-ID, port#) ... (Protocol-ID, port#) where "Protocol-ID" is the IP protocol identifier and "port#" is an (optional) port or some other Protocol-ID specific value. Then, gateways could be located by broadcasting an Opcode= (3, 0) {GGP} (8, 0) {EGP} request and name servers by broadcasting an OpCode= (6, 53) {TCP port 53} (17, 53) {UDP port 53} request. 2) This brings up another issue which should probably be specified in the IP ethernet standard, namely broadcast addressing. One of the network papers a while back (I think it was an IEN on ethernet address resolution by, I believe, Rob Gurwitz) suggested an approach which we are using at CMU. In that scheme, the IP network number and a host part consisting of all ones, was reserved for use as that network's IP broadcast address (i.e. address resolution modules would always resolve this IP address to the physical broadcast address for the underlying hardware). 3) For what its worth, I'd suggest that the 4.2BSD trailer protocol issue best be dealt with as a separate RFC written by someone at Berkeley with perhaps a reference to it in the ethernet standard RFC. The standard should specify precisely what is REQUIRED when implementing IP on an ethernet in order to guarantee communication with ANY other IP implementation on that ethernet. Unless the intent is to mandate implementation of the trailer protocol encapsulation, I think it would be ill-advised to include this directly or even as an appendix in the IP ethernet standard. It will only cause confusion. - Mike ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110721110000> Return-Path: Received: from lbl-csam.ARPA by SRI-NIC with TCP; Tue 8 Nov 83 05:07:56-PST From: mo@LBL-CSAM (Mike O'Dell[Group-L]) Return-Path: Message-Id: <8311081311.AA10067@lbl-csam.ARPA> Received: by lbl-csam.ARPA ; Tue, 8 Nov 83 05:11:58 PST Date: 8 Nov 1983 0511-PST (Tuesday) To: BillW@SRI-KL Cc: tcp-ip@NIC Subject: Re: modularity vs efficiency in fragmentation. In-Reply-To: Your message of 7 Nov 1983 15:47-PST. <[SRI-KL] 7-Nov-83 15:47:32.BILLW> There is a more serious issue here. Sending too large a TCP segment can actually cause the connection to fail in the following way. Assume two TCP's negotiate the segment size up to some number quite a bit larger than the max IP size along the route. A segment leaves one TCP in one IP jumbogram. Now, along the way it gets fragmented into 3 tinigrams which get sent along the same path. Now assume the destination host or gateway is short of resources, or more likely, is on a local net which has difficulty hearing back-to-back packets. It is not uncommon (actually obverved) for this case to result in an IP fragment being lost, usually the last one of the bunch. IP finally gives up and discards the irreconcillable fragments, or even worse, holds on to them for a while. Back at the source host, the retransmit timer fires inside TCP and it promptly sends the segment off again, insuring that only part of, and probably fewer useful, IP fragments will be recieved. (Because you tend to hear the first of a burst and lose the rest.) After a while, TCP declares the connection dead since it has not been able to get any segments through. -Mike ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110801060000> Return-Path: Received: from OFFICE-3.ARPA by SRI-NIC with TCP; Tue 8 Nov 83 09:08:54-PST Date: 8 Nov 1983 0906-PST Sender: WMARTIN at OFFICE-3 Subject: "Retransmitting" - Can we get rid of it? From: WMartin at Office-3 (Will Martin) To: Feedback at OFFICE-3, SNelson at OFFICE-3 Cc: TCP-IP at SRI-NIC Message-ID: <[OFFICE-3] 8-Nov-83 09:06:59.WMARTIN> Where in the TCP process does the display of the word "Retransmitting" come from? (I think that there is a [CR] at the end of it, too.) Can the code be changed to eliminate the display of this word to the terminal? I have just had quite a large batch of printing ruined because the word "Retransmitting" appears every ten lines or so in the middle of the document. There were no characters lost during this time when the host probably had some sort of problems. If the worthless "Retransmitting" had not been generated and appeared in the midst of the text, this printing would have been perfectly usable as is. It was printed at night, I believe, while the printer was unattended, and it probably came out at some abysmally low speed, but so what? If it wasn't for the "Retransmitting"s garbaging up the text, it could have been used. I realize that it can be useful to a human sitting at a terminal and wondering if the entire network has died, or just the host, or the TAC, or whatever. But I'd rather just put up with the non-echoing and eventual beeping when the buffer (wherever it is) fills up, than have "Retransmitting" displays endlessly repeat down my screen. The fact that these destroy printed output is enough reason alone to eliminate them, since their benefit is far outweighed by the harm that displaying the word does. I'm CC'ing this to a TCP mailing list in the hope that someone who has already fixed this problem can inform us of the proper repair. It is the most visible user-interface effect of TCP and the prime gripe most users have about the changeover. Will Martin USArmy DARCOM ALMSA ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110801570000> Return-Path: Received: from USC-ISIB.ARPA by SRI-NIC with TCP; Tue 8 Nov 83 10:03:20-PST Date: 8 Nov 1983 0957-PST Subject: re: TCP Maximum Segment Size Option From: Craig Milo Rogers To: POSTEL@USC-ISIF, tcp-ip@SRI-NIC In-Reply-To: Your message of 7 Nov 1983 21:36:01 PST I have a couple of minor cautions (ie, disagreements) with Jon's message on IP and TCP packet sizes. My complaints center on IP implementations issues, not the TCP issues. 1) Jon stated that there is no current requirement that IP keep information on a per-host basis. There are two counterexamples to this statement: 1) One of the ICMP Redirect subtypes is a per-host redirect. Isn't an IP implementation required to include an ICMP implementation? Or, are some parts of the ICMP specification optional? (Ref. RFC 792 p. 12) 2) The Address Resolution Protocol requires (or at least, strongly encourages) hosts to keep per-host information. The ARP is required to translate IP addresses to local network addresses on some local networks. Is this translation process considered an "IP" responsibility? 2) "A source host would not normally create datagram fragments." I have to approach this statement carefully; "normal" is a tricky word. Let's talk about an IP host connected to a network with a smaller-than-576 octet Maximum Transmission Unit. Here are two circumstances in which it would be quite "normal" for the host to send IP fragments: 1) The host may be using a higher-level protocol path which requires packets of a particular length, ie UDP/TFTP. 2) The host may have implemented the ICMP Echo/Echo-Reply protocol (it's a "required" component of an IP implementation, isn't it?). The host might receive an Echo request (as fragments), reassemble it, interpret it, create an Echo Reply, and be forced to fragment the Echo Reply in order to send it. Craig Milo Rogers ------- ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110802430100> Return-Path: Received: from USC-ISID.ARPA by SRI-NIC with TCP; Tue 8 Nov 83 10:42:49-PST Date: 8 Nov 1983 10:43:01 PST From: MILLS@USC-ISID Subject: Re: modularity vs efficiency in fragmentation. To: mo@LBL-CSAM, BillW@SRI-KL cc: tcp-ip@SRI-NIC, MILLS@USC-ISID In response to the message sent 8 Nov 1983 0511-PST (Tuesday) from mo@LBL-CSAM Mike, Your scenario is in fact common on SATNET paths, where resources are tight and packet losses relatively high. Reasembly congestion can be reduced by careful choice of the IP time-to-live field and minimum TCP retransmission timeout. We set the TTL field so that the 'gram expires of old age before the sending TCP fires off anothre one. This assumes, of course, intervening gateways as well as the destination host, correctly handle the TTL field. Dave ------- ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110803110000> Return-Path: Received: from SRI-CSL by SRI-NIC with TCP; Tue 8 Nov 83 11:13:34-PST Date: 8 Nov 1983 11:11-PST Sender: GEOFF@SRI-CSL Subject: Re: "Retransmitting" - Can we get rid of it? From: the tty of Geoffrey S. Goodfellow Reply-To: Geoff@SRI-CSL To: WMartin@OFFICE-3 Cc: Feedback@OFFICE-3, SNelson@OFFICE-3 Cc: TCP-IP@SRI-NIC Message-ID: <[SRI-CSL] 8-Nov-83 11:11:53.GEOFF> In-Reply-To: <[OFFICE-3] 8-Nov-83 09:06:59.WMARTIN> Will The word "Retransmitting" comes from your TAC. I would suggest that you notify DCA or BBN of your desire to see a TAC command option added which will disable the printing of the "Retransmitting" message. Geoff ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110803165800> Return-Path: Received: from USC-ISIF.ARPA by SRI-NIC with TCP; Tue 8 Nov 83 11:18:56-PST Date: 8 Nov 1983 11:16:58 PST From: POSTEL@USC-ISIF Subject: re: "retransmitting" To: TCP-IP@SRI-NIC Will Martin: It is a "feature" of the TAC. Take it up with the TAC people at BBN. --jon. ------- ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110803550000> Return-Path: Received: from USC-ISID.ARPA by SRI-NIC with TCP; Tue 8 Nov 83 11:59:00-PST Date: 8 Nov 1983 1155-PST Subject: Re: "Retransmitting" - Can we get rid of it? From: GOWER@USC-ISID To: WMartin@OFFICE-3 (Will Martin) cc: Feedback@OFFICE-3, SNelson@OFFICE-3, TCP-IP@SRI-NIC, GOWER@USC-ISID In-Reply-To: <[OFFICE-3] 8-Nov-83 09:06:59.WMARTIN> Will, We had the same experience with our printers on our TAC. IF your printer is on a TAC port, you may request the NOC to set the port 'QUIET'. This parameter must be set by the NOC. It is one of several that SHOULD be available to the TAC Liaison, but is NOT. Regards, Neil Gower Former TAC Liaison ------- ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110804362200> Return-Path: Received: from bbn-vax by SRI-NIC with TCP; Tue 8 Nov 83 06:42:47-PST Date: Tue, 8 Nov 83 9:36:22 EST From: Rob Gurwitz Subject: Re: modularity vs efficiency in fragmentation. In-Reply-To: Your message of 7 Nov 1983 15:47-PST To: William "Chops" Westfield Cc: tcp-ip@NIC No. Having to go through an extra fragment/reassembly step IS more costly. Think about it. Since TCP packetization happens anyway (you have to always put a header on the data and figure the checksum) and is the "normal" and hence optimized case, adding yet more handling by IP to break the packet up and then reassemble on the other end is not a gain. What advantage does it have? It certainly doesn't make better use of the wire, TCP might try to send 5000 bytes in a segment, but it will still go out as umpteen 576 (or whatever) byte packets. Why not avoid all the extra work in the first place and have TCP try to segment in such a way as to avoid gratutitous fragmentation. ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110805442900> Return-Path: Received: from bbnccq by SRI-NIC with TCP; Tue 8 Nov 83 07:59:58-PST Date: 8 Nov 1983 10:44:29 EST (Tuesday) From: Mike Brescia Subject: re: TCP Maximum Segment Size Option In-Reply-to: Your message of 7 Nov 1983 21:36:01 PST To: POSTEL@USC-ISIF Cc: brescia@BBN-UNIX, tcp-ip@nic Jon, It came as a surprise to me that gateways know the '576 rule'. (Your point 8, 'gateway fragmentation'.) Since you also noticed that it is irrelevant to gateways, I think you followed the same reasoning we did at implementation, that is, there is no parameter in the gateway that has the value 576 (+/- epsilon). Only if a network is declared to be of size 576 (MTU) does this appear in the configuration info for that particular net. There are no (BBN) gateways on networks which have a hardware limit of 576 bytes. Some have been 'administratively' limited to 576, viz. the NTA ring (Proteon hardware). Some networks have been adminstratively limited to 256 bytes, viz. those at UCL, to match the 256 byte size of the satnet interface. Mike ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110807061600> Return-Path: Received: from bbnccq by SRI-NIC with TCP; Tue 8 Nov 83 09:38:42-PST Date: 8 Nov 1983 12:06:16 EST (Tuesday) From: Jonathan Dreyer Subject: Re: IP on Ethernet draft RFC In-Reply-to: Your message of Fri, 4 Nov 83 11:40 EST To: Charles Hornig Cc: TCP-IP@SRI-NIC.ARPA, Postel@USC-ISIF.ARPA, networks%SCRC-TENEX@MIT-MC.ARPA I know that the IP world generally talks big-endian (high byte first), but I'm sure that some poor PDP-11 programmer will put a hex 800 in the Ethernet type field and get it backwards unless you say something about byte order in the RFC. In RFC 870 (Assigned numbers), the hex Ethernet type fields are written like "08,00" which makes this more obvious. ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110811222400> Return-Path: Received: from BRL-VGR by SRI-NIC with TCP; Tue 8 Nov 83 13:49:39-PST Date: Tue, 8 Nov 83 16:22:24 EST From: Ron Natalie To: Mike O'Dell@BRL-VGR.ARPA, mo@lbl-csam MMDF-Warning: Parse error in preceeding line at BRL-VGR.ARPA cc: BillW@sri-kl, tcp-ip@sri-nic Subject: Re: modularity vs efficiency in fragmentation. 1. For those of you who may have forgotten, gateways do not reassemble fragments. 2. The problem Mike ODell outlines does demonstrate a problem. Our TCP/IP was setting the data size to 1024 when talking to Berkeley who also liked to use that size. Packets were fragmented going through the ARPANET. When they reached the BRL-GATEWAY a 1004 byte fragment was pushed through our local network and then the 44 byte second fragment was sent. Due to a programming error of the network hardware while the remote host was processing the large 1004 packet the GATEWAY would get a busy notice. Believing this was a hard error the 44 byte packet was discarded. TCP retransmits were also lost because the smaller fragment was always lost behind the processing of the larger one. Of course there was no excuse for this behaviour, it was just that it was difficult to detect since it would only happen on packets that were fragmented with a large part. 3. If you wish to find out how big you can set the packet size before you can't send them without fragmenting, why not set the DF bit on a probe packet to see if it will fit through all the intervening network interfaces. -Ron ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110811370700> Return-Path: Received: from BRL-VGR by SRI-NIC with TCP; Tue 8 Nov 83 14:17:31-PST Date: Tue, 8 Nov 83 16:37:07 EST From: Ron Natalie To: Will Martin cc: Feedback@office-3, SNelson@office-3, TCP-IP@sri-nic Subject: Re: "Retransmitting" - Can we get rid of it? It's the TAC that says "Retransmitting." I don't know if anything can be done to stop it. The word "retransmitting" really is not a part of TCP/IP except that the TAC decides to print it whenever it encounters a particular condition. -Ron ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110817510000> Return-Path: Received: from OFFICE-2.ARPA by SRI-NIC with TCP; Wed 9 Nov 83 01:54:23-PST Date: 9-Nov-83 01:51 PST From: Robert N. Lieberman Subject: Re: "Retransmitting" - Can we get rid of it? To: WMartin at Office-3 (Will Martin) Cc: Feedback@OFFICE-3, SNelson@OFFICE-3, TCP-IP@SRI-NIC Message-ID: <[OFFICE-2]TYM-RLL-3I7JX> In-reply-to: <[OFFICE-3] 8-Nov-83 09:06:59.WMARTIN> Will I guess by now you got the word about the QUIET setting that BBN can implement for you. However, i am VERY curious about this. As you know we have had TCP related problems for 10 months. It is has generally been assumed that the problems are net load related, i.e., they appear only when the net is sufficiently loaded between one TAC and one HOst (maybe the local IMP being loaded plays a role too). We based this on not being able to duplicate any of the problems at night even with extremely heavy loads on host and host to host via net loads. The problem, of course, is one of slowness with eventual disconnection. RETRANSMITTING has been reported many times as one of the messages that occurs when this problems begins to rear its head. Now, if I read your message right, you were getting this message at night, presumedly when loads were low on the net (and host). If this can be repeated then I would think BBN and TYM would be VERY interested in exactly what was happening. It may not relate to the 'main' TCP problem but we don't want to leave a bit unflipped. Hopefully BBN will soon have their IMP trace package working so we can begin to see some data from the IMP (the key element missing for the last 10 months). Robert ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983110903112400> Return-Path: Received: from bbnccq by SRI-NIC with TCP; Wed 9 Nov 83 05:19:33-PST Date: 9 Nov 1983 8:11:24 EST (Wednesday) From: Jack Haverty Subject: Re: modularity vs efficiency in fragmentation. In-Reply-to: Your message of Tue, 8 Nov 83 16:22:24 EST To: Ron Natalie Cc: Mike O'Dell@BRL-VGR.ARPA, mo@lbl-csam, BillW@sri-kl, tcp-ip@sri-nic Re setting the DF bit to see if a certain packet size will fit through the internet -- in the current physical topology this might be useful, since there are few cases where alternate paths exist. In general however, since the internet is performing routing, there's no guarantee that successive packets take the same path from source to destination. Jack ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983111010320000> Return-Path: Received: from MIT-MC by SRI-NIC with TCP; Thu 10 Nov 83 12:26:42-PST Date: 10 November 1983 15:32 EST From: David C. Plummer Subject: Re: IP on Ethernet draft RFC To: Mike.Accetta @ CMU-CS-IUS cc: TCP-IP @ SRI-NIC Received: from CMU-CS-IUS by SRI-NIC with TCP; Mon 7 Nov 83 16:54:50-PST Date: Monday, 7 November 1983 19:52:43 EST From: Mike.Accetta@CMU-CS-IUS Opcode= (3, 0) {GGP} (8, 0) {EGP} request and name servers by broadcasting an OpCode= (6, 53) {TCP port 53} (17, 53) {UDP port 53} How do you do (Chaos, DUMP-ROUTING-TABLE) ?? Not all the world is reduced to numbers. There are some protocols out there that try to be mnemonic. ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983111012050500> Return-Path: Received: from CMU-CS-IUS by SRI-NIC with TCP; Thu 10 Nov 83 14:08:51-PST Date: Thursday, 10 November 1983 17:05:05 EST From: Mike.Accetta@CMU-CS-IUS To: David C.Plummer cc: TCP-IP@SRI-NIC Subject: Re: IP on Ethernet draft RFC Message-ID: <1983.11.10.21.11.48.Mike.Accetta@CMU-CS-IUS> David, I'd had no experience with CHAOS protocols so I have to admit that I tend to think in terms of the numbers those IP protocols with which I am familar use. The examples are perhaps a bit biased. Nothing precludes the interpretation of the protocol specific portion of the query as a string if that is appropriate for the protocol. It certainly must be of variable length to be useful for more than UDP and TCP and especially for protocols we can't forsee right now. I would argue, however, that whatever the interpretation of this field, it be fixed per protocol and that it should be the "natural" identifier for that protocol. Thus GGP and EGP perhaps have no identifier, UDP and TCP use 16-bit port numbers, the CHAOS stream protocol uses names, etc. - Mike ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983111517120000> Return-Path: <@MIT-XX:Hornig@SCRC-QUABBIN> Received: from MIT-XX by SRI-NIC with TCP; Tue 15 Nov 83 19:12:15-PST Received: from SCRC-ASSABET by SCRC-QUABBIN with CHAOS; Tue 15-Nov-83 22:12:21-EST Date: Tuesday, 15 November 1983, 22:12-EST From: Charles Hornig Subject: Trailer Encapsulations To: TCP-IP at nic Message-ID: <831115221214.4.Hornig@QUABBIN.SCRC.Symbolics> Here is a draft RFC on Trailer Encapsulations. I feel that it is most understandable if it is made a separate document from the IP/Ethernet RFC. Comments? --------- Sam Leffler & Make Karels University of California at Berkeley November 1983 Trailer Encapsulations This RFC discusses the motivation for use of "trailer encapsulations" on local-area networks and describes the implementation of such an encapsulation on various media. Introduction A trailer encapsulation is a link level packet format employed by 4.2BSD UNIX (among others). A trailer encapsulation, or "trailer", may be generated by a system under certain conditions in an effort to minimize the number and size of memory-to-memory copy operations performed by a receiving host when processing a data packet. Trailers are strictly a link level packet format and are not visible (when properly implemented) in any higher level protocol processing. This note cites the motivation behind the trailer encapsulation and describes the trailer encapsulation packet formats currently in use on 3 Mb/s and 10 Mb/s Ethernets, and 10 Mb/s V2LNI ring networks. The use of a trailer encapsulation was suggested by Greg Chesson, and the encapsulation described here was designed by Bill Joy. Motivation Trailers are motivated by the overhead which may be incurred during protocol processing when one or more memory to memory copies must be performed. Copying can be required at many levels of processing, from moving data between the network medium and the host's memory, to passing data between the operating system and user address spaces. An optimal network implementation would expect to incur zero copy operations between delivery of a data packet into host memory and presentation of the appropriate data to the receiving process. While many packets may not be processed without some copying operations, when the host computer provides suitable memory management support it may often be possible to avoid copying simply by manipulating the appropriate virtual memory hardware. In a page mapped virtual memory environment, two prerequisites are usually required to achieve the goal of zero copy operations during packet processing. Data destined for a receiving agent must be aligned on a page boundary and must have a size which is a multiple of the hardware page size (or filled to a page boundary). The latter restriction assumes virtual memory protection is maintained at the page level, different architectures may alter these prerequisites. Data to be transmitted across a network may easily be segmented in the appropriate size, but unless the encapsulating protocol header information is fixed in size, alignment to a page boundary is virtually impossible. Protocol header information may vary in size due to the use of multiple protocols (each with a different header), or it may vary in size by agreement (for example, when optional information is included in the header). To insure page alignment the header information which prefixes data destined for the receiver must be reduced to a fixed size; this is normally the case at the link level of a network. By taking all (possibly) variable length header information and moving it to after the data segment a sending host may "do its best" in allowing the receiving host the opportunity to receive data on a page aligned boundary. This rearrangement of data at the link level to force variable length header information to "trail" the data is the substance of the trailer encapsulation. There are several implicit assumptions in the above argument. 1. The receiving host must be willing to accept trailers. As this is a link level encapsulation, unless a host to host negotiation is performed (preferably at the link level to avoid violating layering principles), only certain hosts will be able to converse, or their communication may be significantly impaired if trailer packets are mixed with non-trailer packets. 2. The cost of receiving data on a page aligned boundary should be comparable to receiving data on a non-page aligned boundary. If the overhead of insuring proper alignment is too high, the savings in avoiding copy operations may not be cost effective. 3. The size of the variable length header information should be significantly less than that of the data segment being transmitted. It is possible to move trailing information without physically copying it, but often implementation constraints and the characteristics of the underlying network hardware preclude merely remapping the header(s). 4. The memory to memory copying overhead which is expected to be performed by the receiver must be significant enough to warrant the added complexity in the both the sending and receiving host software. The first point is well known and the motivation for this note. Thought has been given to negotiating the user of trailers on a per host basis using a variant of the Address Resolution Protocol (actually augmenting the protocol), but at present all systems using trailers require hosts sharing a network medium to uniformly accept trailers or never transmit them. (The latter is easily carried out at boot time in 4.2BSD without modifying the operating system source code.) The second point is (to our knowledge) moot. While a host may not be able to take advantage of the alignment and size properties of a trailer packet, it should nonetheless never hamper it. Regarding the third point, let us assume the trailing header information is copied and not remapped, and consider the header overhead in the TCP/IP protocols as a representative example . If we assume both the TCP and IP protocol headers are part of the variable length header information, then the smallest trailer packet (generated by a VAX) would have 512 bytes of data and 40+ bytes of header information (plus the trailer header described later). While the trailing header could have IP and/or TCP options included this would normally be rare (one would expect most TCP options, for example, to be included in the initial connection setup exchange) and certainly much smaller than 512 bytes. If the data segment is larger, the ratio decreases and the expected gain due to fewer copies on the receiving end increases. Given the relative overheads of a memory to memory copy operation and that of a page map manipulation (including translation buffer invalidation), the advantage is obvious. The fourth issue, we believe, is actually a non-issue. In our implementation the additional code required to support the trailer encapsulation amounts to about a dozen lines of code in each link level "network interface driver". The resulting performance improvement more than warrants this minor investment in software. It should be recognized that modifying the network (and normal link) level format of a packet in the manner described forces the receiving host to buffer the entire packet before processing. Clever implementations may parse protocol headers as the packet arrives to find out the actual size (or network level packet type) of an incoming message. This allows these implementations to avoid preallocating maximum sized buffers to incoming packets which it can recognize as unacceptable. Implementations which parses the network level format on the fly are violating layering principles which have been extolled in design for some time (but often violated in implementation). The problem of postponing link level type recognition is a valid criticism. In the case of network hardware which supports DMA both arguments are moot. Trailer Encapsulation Packet Formats In this section we describe the link level packet formats used on the 3 Mb/s and 10 Mb/s Ethernet networks as well as the 10 Mb/s V2LNI ring network. The formats used in each case differ only in the format and type field values used in each of the local area network headers. The format of a trailer packet is shown in the following diagram. +----+-------------------------------------------------+----+ | LH | data | TH | +----+-------------------------------------------------+----+ ^ ( ^ ) ^ LH: The fixed-size local network header. For 10 a Mb/s Ethernet, the 16-byte Ethernet header. The type field in the header indicates that both the packet type (trailer) and the length of the data segment. For the 10 Mb/s Ethernet, the types are between 1001 and 1010 hexadecimal (4096 and 4112 decimal). The type is calculated as 1000 (hex) plus the number of 512-byte pages of data. A maximum of 16 pages of data may be transmitted in a single trailer packet (8192 bytes). data: The "data" portion of the packet. This is normally only data to be delivered to the receiving processes (i.e. it contains no TCP or IP header information). Data size is always a multiple of 512 bytes. TH: The "trailer". This is actually a composition of the original protocol headers and a fixed size trailer prefix which defines the type and size of the trailing data. The format of a trailer is shown below. The carats (^) indicate the page boundaries the receiving host is expected to use in receiving a trailer packet. The link level receiving routine is able to locate the trailer using the size indicated in the link level header's type field. The receiving routine is expected to discard the link level header and trailer prefix, and remap the trailing data segment to the front of the packet to regenerate the original network level packet format. Trailer Format +----------------+----------------+------~...~----------+ | TYPE | HEADER LENGTH | ORIGINAL HEADER(S) | +----------------+----------------+------~...~----------+ Type: 16 bits The type field encodes the original link level type of the transmitted packet. This is the value which would normally be placed in the link level header if a trailer were not generated. Header length: 16 bits The header length field of the trailer data segment. This specifies the length in bytes of the following header data. Original headers: The header information which logically belongs before the data segment. This is normally the network and transport level protocol headers. Summary A link level encapsulation which promotes alignment properties necessary for the efficient use of virtual memory hardware facilities has been described. This encapsulation format is in use on many systems and is a standard facility in 4.2BSD UNIX. The encapsulation provides an efficient mechanism by which cooperating hosts on a local network may obtain significant performance improvements. The use of this encapsulation technique currently requires uniform cooperation from all hosts on a network; hopefully a per host negotiation mechanism may be added to allow consenting hosts to utilize the encapsulation in a non-uniform environment. ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983111604581700> Return-Path: Received: from bbn-clxx by SRI-NIC with TCP; Wed 16 Nov 83 07:12:27-PST Date: 16 Nov 1983 9:58:17 EST (Wednesday) From: Morton D. Hoffman Subject: Re: Trailer Encapsulations In-Reply-to: Your message of Tuesday, 15 November 1983, 22:12-EST To: tcp=ip@nic Cc: TCP-IP@nic, mdh@BBN-UNIX I hesitate to enter the fray, partly because so much has already been said. However, I would like to suggest another approach. I believe that if trailer encapsulation is needed only between high-performance consenting hosts, a separate protcol might be used. I have seen the argument that the Berkeley folks don't want to develop a whole new protocol family, and they don't. On the Cronus project we needed to tweak IP so that our local net communication would not meet DOD specs (In our case, the motivation was reserving broadcast and multicast addresses). Instead of violating IP and calling it IP, we got a new protocol number for our variant of IP. Now we can send our IP or standard IP with the same code, and we know from the (Ethernet) protocol field which it is. Wouldn't a similar approach be appropriate for trailer encapsulation? One other comment: I don't believe that many people realize the motitvation for trailer encapsulation. I do not believe that the goal is a fast Telnet -- rather, from my understanding of the Berkeley work, their interested in keeping the overhead of low-level protocols miniscule, so that they can perform remote system calls as if they were local over a high-performance local net in a distributed OS. Not really the game that IP/TCP is primarily designed to solve. I suspect that if the researchers involved could give a clear picture of their goals, the discussion would be more fruitful. Mort ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983111606413600> Return-Path: Received: from bbnccq by SRI-NIC with TCP; Wed 16 Nov 83 09:07:27-PST Date: 16 Nov 1983 11:41:36 EST (Wednesday) From: Jonathan Dreyer Subject: Re: Trailer Encapsulations In-Reply-to: Your message of Tuesday, 15 November 1983, 22:12-EST To: tcp-ip at nic It looks as if trailers aren't going to go away overnight. Here is what seems to be the most obvious "negotiation" mechanism: If you want to talk trailers with me, send me an ARP request encapsulated in a trailer packet. If I understand trailers, I'll understand the ARP request and send you a reply in a trailer packet. If I don't, I will (should) ignore the packet because I don't know that Ethernet type field (have all 11 or 12 type fields been registered with the Ethernet name czars?) and if you want you can try me again without the trailers to see if I'm there at all. If you only talk trailers you should only respond to trailer- encapsulated ARP requests and not regular ones. This mechanism should be simple to implement for those who talk trailers and trivial (i.e. no change) for those who don't. I also believe that trailer implementations should be encouraged to talk non-trailer if they want to talk with the rest of the world (without rebooting!), not the other way around. Is the lowest trailer type field 1001 hex (indicating one 512-byte block of data) or 4096 decimal (indicating zero blocks)? The latter makes more sense, and allows an ARP packet to be trailer-encapsulated with an empty data portion. ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983111610351600> Return-Path: Received: from CMU-CS-C.ARPA by SRI-NIC with TCP; Wed 16 Nov 83 12:37:43-PST Received: ID ; Wed 16 Nov 83 15:35:20-EST Date: Wed 16 Nov 83 15:35:16-EST From: Vince Fuller Subject: IP drivers for TOPS-20 To: TCP-IP@SRI-NIC.ARPA I am interested in collecting a summary of all of the TOPS-20 IP interfaces that have been cobbled up thus far. I am particularly interested in those involving Ethernet (both 10MB and 3MB) and those which involve use of a normal TTY serial line, but I'd also like to see what else is out there. Please send responses to VAF@CMU-CS-C. Thanks, Vince Fuller Systems Programmer, CMU-CSD ------- ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983111717500000> Return-Path: Received: from MIT-MC by SRI-NIC with TCP; Thu 17 Nov 83 19:47:07-PST Date: 17 November 1983 22:50 EST From: David C. Plummer Subject: Trailer Encapsulations To: TCP-IP @ SRI-NIC For the 10 Mb/s Ethernet, the types are between 1001 and 1010 hexadecimal (4096 and 4112 decimal). The type is calculated as 1000 (hex) plus the number of 512-byte pages of data. A maximum of 16 pages of data may be transmitted in a single trailer packet (8192 bytes). Somebody is highly confused. The 10Mbit Ethernet has a maximum byte frame of 1500 bytes. Therefore, at most 2 512 byte pages can fit, giving possible values 1001 and 1002. Maybe you intended this to be extensible for hardware that allows larger frames? As I recall, proNet does not use Ethernet type fields, but assigns there own. (This may have changed.) I can't imagine such a huge packet on the 3Mbit Ethernet. Land lines are already so slow that trailers aren't going to gain you anything. Can somebody clarify? ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983111809341000> Return-Path: Received: from bbnccj by SRI-NIC with TCP; Fri 18 Nov 83 11:39:05-PST Date: 18 Nov 1983 14:34:10 EST (Friday) From: Jack Sax Subject: mailing list To: TCP-IP@sri-nic Please take me off of the TCP-IP mailing list Jack Sax ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983112102430000> Return-Path: Received: from ddn1 by SRI-NIC with TCP; Mon 21 Nov 83 04:47:31-PST Date: 21 November 1983 07:43 EST From: dcab615 @ DDN1 Subject: mailing list To: tcp-ip @ sri-nic CC: ddn-dod @ DDN1 Date: November 21, 1983 Text: Please remove ddn-dod from the tcp-ip mailing list. Thanks. Vic Russell DDN/PMO B616 ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983112222200000> Return-Path: Received: from UCB-VAX.ARPA by SRI-NIC with TCP; Wed 23 Nov 83 12:39:41-PST Received: from ucbpopuli.CC.Berkeley.ARPA (ucbpopuli.ARPA) by UCB-VAX.ARPA (4.21/4.15) id AA25767; Wed, 23 Nov 83 12:37:31 pst Received: from ucbjade.CC.Berkeley.ARPA by ucbpopuli.CC.Berkeley.ARPA (4.11/4.8) id AA06428; Wed, 23 Nov 83 12:37:53 pst Message-Id: <8311232014.AA16553@ucbjade.CC.Berkeley.ARPA> Received: by ucbjade.CC.Berkeley.ARPA (4.11/4.8) id AA16553; Wed, 23 Nov 83 12:14:53 pst Date: WED, NOV 23 1983 From: FOXEA%VPIVM1.BITNET@Berkeley (ED FOX) To: TCP-IP@SRI-NIC.ARPA Reply-To: FOXEA%VPIVM1.BITNET@BERKELEY.ARPA Subject: ADDITION TO MAILING LIST PLEASE ADD TWO ENTRIES TO YOUR MAILING LIST, AS FOLLOWS. FIRST, ADD ME AT THE ABOVE REPLY-TO ADDRESS. SECOND, ADD THE SHARED ACCOUNT GIVEN BELOW. MANY THANKS, ED FOX (CSNET TECHNICAL LIASON). COMSAT%VPIVM2.BITNET@BERKELEY.ARPA ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983112808200000> Return-Path: Received: from purdue.ARPA by SRI-NIC with TCP; Mon 28 Nov 83 10:21:12-PST Received: from merlin.ARPA by purdue.ARPA; Mon, 28 Nov 83 13:21:06 EST From: Christopher A Kent Message-Id: <8311281820.AA26154@merlin.ARPA> Received: by merlin.ARPA; Mon, 28 Nov 83 13:20:21 est Date: 28 Nov 1983 1320-EST (Monday) To: Jonathan Dreyer Cc: tcp-ip@nic.ARPA Subject: Re: Trailer Encapsulations In-Reply-To: Your message of 16 Nov 1983 11:41:36 EST (Wednesday). <8311161740.AA25738> Here is what seems to be the most obvious "negotiation" mechanism: If you want to talk trailers with me, send me an ARP request encapsulated in a trailer packet. If I understand trailers, I'll understand the ARP request and send you a reply in a trailer packet. If I don't, I will (should) ignore the packet because I don't know that Ethernet type field (have all 11 or 12 type fields been registered with the Ethernet name czars?) and if you want you can try me again without the trailers to see if I'm there at all. If you only talk trailers you should only respond to trailer- encapsulated ARP requests and not regular ones. Hmm. Doesn't this defeat the idea in the ARP whereby anyone can respond to a mapping request? Or has that gone away? Cheers, chris ---------- ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983112809090000> Return-Path: <@MIT-MC:DCP%SCRC-TENEX@MIT-MC> Received: from MIT-MC by SRI-NIC with TCP; Mon 28 Nov 83 11:13:04-PST Received: from SCRC-CHARLES by SCRC-QUABBIN with CHAOS; Mon 28-Nov-83 14:13:20-EST Date: Mon, 28 Nov 83 14:09 EST From: "David C. Plummer" Subject: Philosophy questions To: tcp-ip@SRI-NIC.ARPA This message queries about connection timeouts, mostly. All references are to the September 1981 printing of the TCP specification. Scenario 1: Suppose I have a connection that is open at both ends, but no data has travelled for many minutes (e.g., 60). TCP-FTP might be such an example. Is it the contract of TCP to probe the other connection occasionally to determine if it is still up? If yes: How is this probing done? Sending a zero length segment in the window will not receive an answer from the other side. However, sending a zero length segment outside the window (e.g., one below snd.nxt) in theory elicits a valid zero length segment giving current snd.nxt (in seg.seq) and rcv.nxt (in seg.ack) [bottom of page 69]. I understand this is not being conservative in what it does, and the other side may not be liberal enough to deal with this. Comments? If no: What if the foreign machine died 45 minutes ago. I don't know that because no data has been transfered in the last 60 minutes. Must I wait indefinitely before I learn (by getting a reset segment) that the other side isn't there? By comparison, the Chaosnet protocol has a packet called SNS (Sense) whose purpose is to elicit a STS (Status) which contains valid acknowledgement information. SNS are generated starting at 1 minute of connection idle time and are sent every 5 seconds. If the connectin remains idle for 1.5 minutes (5 SNSes have been ignored) the connection is considered lost. SNS and STS are as valid as DATa packets in declaring the connection not idle. (SNS is also used in Chaos for the "probe the zero window" concept of TCP.) Scenario 2: Suppose one side of the connection goes into a debugger (or a query system). A human learns of this and takes corrective action which in the end results in data flow transparent to the TCP protocol. Suppose it takes 45 minutes (or more) between the time the debugger is entered and the user finally continues. What exactly is the meaning of "user timeout"? The TCP window is zero all this time, and presumably the sender is probing the zero window. The window does not open, but the sender is receiving positive knowledge that the window isn't opening (because he is trying to send the next byte which is eliciting a reply [bottom of page 69 again]). Does the user timeout if the window does not open or if no response is received? Specification bug? The bottom of page 65 first reads If the state is LISTEN then ... second check for an ACK ... all acks are bad, send reset and return ... third check for SYN ... huh? The ack bit must be off, so why is SEG.ACK valid? This lossage continues onto the next page. ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983112809180000> Return-Path: Received: from bbn-vax by SRI-NIC with TCP; Mon 28 Nov 83 11:19:42-PST Date: Mon, 28 Nov 83 14:18 EST From: Dennis Rockwell Subject: Re: Trailer Encapsulations To: Christopher A Kent , Jonathan Dreyer Cc: tcp-ip@nic.ARPA Not at all. Consider this: my host talks trailers as well as normal encapsulation. When I want to resolve an address, I send out two requests, one trailered, one not. I record the style(s) of responses I get. If I get a trailer response, then I talk trailers to that host. If not, I don't. I answer a trailer request with a trailer response. About the only objection to all this that I can think of is a layering violation: information about the encapsulation (a link-level concept) needs to be passed in both directions between the address resolver and the local net protocol module. Other than that, it sounds like an appropriate solution. ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983112809230000> Return-Path: Received: from purdue.ARPA by SRI-NIC with TCP; Mon 28 Nov 83 11:30:04-PST Received: from merlin.ARPA by purdue.ARPA; Mon, 28 Nov 83 14:29:51 EST From: Christopher A Kent Message-Id: <8311281923.AA26586@merlin.ARPA> Received: by merlin.ARPA; Mon, 28 Nov 83 14:23:53 est Date: 28 Nov 1983 1423-EST (Monday) To: Dennis Rockwell Cc: tcp-ip@nic.ARPA, Jonathan Dreyer Subject: Re: Trailer Encapsulations In-Reply-To: Your message of Mon, 28 Nov 83 14:18 EST. <8311281919.AA16629> Fine, but my question/argument is that the response may not necessarily come from the host whose address is being resolved. I could envision a situation where we have an exteremely simple host on the Ethernet, that doesn't know how to answer ARP requests; to talk to it, someone else has to answer for it. It broadcasts its mapping when it comes up, and never again; what if the responder doesn't speak trailers, but the dumb host does? Admittedly, this is a pathological case; but it's the best I could come up with for quick response. I just want to make sure that we don't limit the generality of the ARP by adding this solution. Cheers, chris ---------- ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983112809470000> Return-Path: Received: from bbn-vax by SRI-NIC with TCP; Mon 28 Nov 83 11:49:52-PST Date: Mon, 28 Nov 83 14:47 EST From: Dennis Rockwell Subject: Re: Trailer Encapsulations To: Christopher A Kent Cc: Jonathan Dreyer , tcp-ip@nic In that case, it looks like the simple host won't get to use its trailer code. I would think anything clever enough to answer for somebody else could be smart enough to use trailers, especially since the "simple host" can. Trailers *are* pretty easy, after all. I think your pathological case is vacuous as well. The generality of ARP has been expanded, not limited. ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983112905510100> Return-Path: Received: from USC-ISIF.ARPA by SRI-NIC with TCP; Tue 29 Nov 83 13:53:13-PST Date: 29 Nov 1983 13:51:01 PST From: POSTEL@USC-ISIF Subject: re: Philosophy Questions To: DCP%SCRC-tenex@MIT-MC cc: TCP-IP@SRI-NIC David C. Plummer: In response to your questions about TCP on 28 Nov 83 1. Do the TCP modules exchange probes on an open but idle connection? No. If the other side died some time ago what does it matter? You will find out it is dead (or died and came alive again but forgot about you) as soon as you try to communicate with it. It is ok for the higher level applications to do some sort of probes with their counter parts if they think this is necessary. But, i don't have any example of when it is useful (from the user's point of view) to have TCP level, or application level, probes. [The TCP module could clean up it's connection records a bit sooner perhaps, but i don't think the communication cost is worth that small and infrequent savings.] 2. What is the meaning of the "user timeout"? The general idea of the user timeout is that if the user process has said "send this data" to the TCP module and the TCP module can't get it sent (get an ACK for it) within the user timeout period the TCP module should let the user process know that (e.g., give it a pseudo interrupt). The timeout goes off if the data is not ACKed within the period. Getting ACKs to zero-window probes does not reset the timer. In the specification , it clearly says that when the User Timeout goes off, the TCP should ABORT the connection. Current thinking is that this is a bad idea, and that rather the TCP should simply notify the user process of the situation and keep trying. Let the user process decide what to do about it. 3. Specification Bug (Page 65) Yes. This RESET and the next one should be of the form --jon. ------- ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983112912490000> Return-Path: <@MIT-MC:DCP%SCRC-TENEX@MIT-MC> Received: from MIT-MC by SRI-NIC with TCP; Tue 29 Nov 83 14:49:54-PST Received: from SCRC-CHARLES by SCRC-QUABBIN with CHAOS; Tue 29-Nov-83 17:50:37-EST Date: Tue, 29 Nov 83 17:49 EST From: "David C. Plummer" Subject: re: Philosophy Questions To: POSTEL@USC-ISIF.ARPA, DCP%SCRC-TENEX@MIT-MC.ARPA Cc: TCP-IP@SRI-NIC.ARPA In-reply-to: The message of 29 Nov 83 16:51-EST from POSTEL at USC-ISIF Date: 29 Nov 1983 13:51:01 PST From: POSTEL@USC-ISIF 1. Do the TCP modules exchange probes on an open but idle connection? No. If the other side died some time ago what does it matter? Analogy (though not a very good one): I balance my checkbook every month to make sure there aren't any mistakes. (One time, it turned out my next packet was stolen from my dormroom and $420 was forged. Anyway...) TCP doesn't balance the checkbook, but instead continues to write checks (idles) until one bounces (other side wakes up). I'm don't want to argue the philosophy, I just wanted clarification. The only place this matters is on low address space machines who would want to know if they are hanging onto an already lost connection. 2. What is the meaning of the "user timeout"? You answered the question I didn't ask correctly. To restate what I think you said: The user timeout applies to acknowledgement of already sent sequence numbers, not to delays in opening a zero window. 3. Specification Bug (Page 65) Yes. This RESET and the next one should be of the form OK. Thanks for replying. ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983112914020000> Return-Path: Received: from CMU-CS-A by SRI-NIC with TCP; Tue 29 Nov 83 16:11:36-PST Received: from [128.2.254.192] by CMU-CS-PT with CMUFTP; 29 Nov 83 18:53:18 EST Date: 29 Nov 83 1902 EST (Tuesday) From: don.provan@CMU-CS-A To: "David C. Plummer" Subject: Re: Philosophy Questions CC: tcp-ip@SRI-NIC In-Reply-To: "\"David C. Plummer\"'s message of 29 Nov 83 17:49-EST" Message-Id: <29Nov83.190232.DP0N@CMU-CS-A> i wanted to clean up dead connections, too. i have limited space for connections information, so i didn't want to waste it. note that these connections are often kept forever, particularly if they are connections to a server of mine that doesn't time out. (that reminds me: i notice most of us timeout idle FTP connections, even though there doesn't seem to be any indication in the specs that this is legal.) anyway, to detect connections to a host that has crashed, on idle connections i send out a spontaneous ACK every so often. obviously 1822 and ICMP detectable problems will be reported. if the host has crashed and is back up now, it sees a packet for a non-existent connection and resets the connection. if nothing is wrong, the ACK is perfectly legal (the remote TCP sees it as an ACK that was delivered twice, so it is ignored) and everyone is happy. i can imagine situations where the ACK will disappear without an error, but these should be infrequent enough to not be a problem. ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983112919100000> Return-Path: Received: from JPL-VAX.ARPA by SRI-NIC with TCP; Wed 30 Nov 83 03:12:09-PST Date: 30 Nov 1983 0310 PST From: Eric P. Scott Subject: Not an RFC inspired by a somewhat infamous computer game To: TCP-IP@SRI-NIC Reply-To: EPS@JPL-VAX Network Hacking Group E. P. Scott Request for Kludges: XXX JPL November, 1983 Updates: RFC 821 SMTP POLYMORPH COMMAND Preface The purpose of this document is to present a partial workaround for an anticipated future problem in the ARPA Internet. It is hoped that it will prove unnecessary to adopt such an overtly ridiculous strategy as a practical means of preserving connectivity. The views expressed are the author's own and do not necessarily reflect those of NASA, the Jet Propulsion Laboratory, or the California Institute of Technology. Treat them as satirical in nature. Introduction Current plans call for the activation of "access control filters" in the six IP gateways that bridge MILNET and ARPANET on February 1, 1984 [1]. In most cases communications will be restricted to mail only. Sites such as ours which are in the wrong Community Of Interest vis a vis the sites they need to communicate with because the politics run counter to legitimate research needs will be unable to participate as functional nodes after this date. A trivial solution is of course to abandon the plan to segregate the two networks [2]. Assuming that this does not happen and the controls go into effect on schedule, it will be necessary to defeat the "protection" offered in order to "stay in business" (unless the Powers That Be can be convinced to put us back on the other side of the fence before then). Even so, other sites share a similar plight. Description I propose to implement an extension to the SMTP protocol [3] that would allow services such as Telnet [4] to be accessed via port 25. The extension consists of the new command POLY which accepts as its parameter a keyword identifying the service desired. Successful execution of POLY returns a 250 reply and replaces the SMTP server. It is assumed that POLY would be given as the first command in a session in order to avoid considering the implications of arbitrary placement. A typical scenario might look like: @TELNET TELNET>INTERNET (HOST) JPL-VLSI (ON PORT) 25 Trying... Open 220 JPL-VLSI.DDN SMTP Service POLY TELNET 250 Toto, I don't think we're in SMTP anymore! JPL VLSI Design Center VAX... Scott [Page 1] SMTP Polymorph Command RFK XXX Syntax POLY Replies S: 250 E: 500, 501, 502, 503, 504, 421 Concluding Comments "Polymorph" commands are nothing new; SMTP's TURN can be considered one; perhaps a better example is the Telnet SUPDUP Option [5]. FTP presents a special problem since data is not transmitted over the telnet connection. This won't help (most) TAC users. References [1] DDN Program Management Office, "Further Details on the MILNET/ ARPANET Split," in DDN Newsletter no. 28, Network Information Center, SRI International, July 1983. [2] Muuss, M., "On the Undesirability of `Mail Bridges' as a Security Measure," in TCP-IP Digest, vol. 2, no. 18, BRL, October 1983. [3] Postel, J., "Simple Mail Transfer Protocol," RFC 821, USC/ Information Sciences Institute, August 1982. [4] Postel, J. and J. Reynolds, "Telnet Protocol Specification," RFC 854, USC/Information Sciences Institute, May 1983. [5] Crispin, M., "Telnet SUPDUP Option," RFC 736, NIC 42213, Stanford Artificial Intelligence Laboratory, October 1977. Scott [Page 2] ------ ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983113007170000> Return-Path: Received: from USC-ISIB.ARPA by SRI-NIC with TCP; Wed 30 Nov 83 15:22:14-PST Date: 30 Nov 1983 1517-PST Subject: Netmail Spreads Common Cold From: Craig Milo Rogers To: TCP-IP@SRI-NIC Marina del Rey, CA (IP) -- Public health officials reported a sharp upswing in common colds among computer scientists this year. The new cold strains originally appear in major computer centers, then spread throughout the country in a matter of hours. Researchers grappling with this issue have concluded that there is only one possible explanation for the sudden appearance and rapid dissemination of the colds: they are spread through electronic mail. It is a long established fact that colds and other diseases may be transmitted through the mail. Viruses and bacteria accumulate on a letter while it is being written. The viruses and bacteria are dormant while the letter is in transit. When the letter is opened, the viruses and bacteria are shaken into the air and inhaled by the recipient, who becomes infected. A lesser-known fact is that colds may be spread over the phone. This usually occures when an infected individual sneezes into a public phone. The next individual to use that same phone will often be infected by the viruses and bacteria on the phone's mouthpiece. However, what most people don't know is that when a person with a cold sneezes into a phone, the person at the other end may be infected if they were holding their phone close enough for the germs to enter their ear canal. It is now possible to demonstrate similar effects for Internet mail. If a person sneezes while sending a message in Hermes or MM, the recipient stands a fair chance of catching the same cold. Strangely enough, this effect has not occured with multimedia mail, perhaps because it currently uses UDP datagrams instead of TCP connections between the user terminals and the mail forwarders. Other electronic mail systems also spread diseases. For example, UUCP spreads Unix. Of particular concern are the electronic mailing lists. Each message sent to one of these lists is replicated and retransmitted to dozens or even hundreds of recipients. A single infected message can strike dozens of victims coast-to-coast within a matter of minutes. Public health officials are quite worried about MCI mail, which uses both printed and electronic delivery systems, thus threatening the health of the entire nation. Internet Header Health Inspectors will work closely with the Protocol Police in the next few months to develop methods of dealing with infected packets. Netmail may be delayed at Internet Gateways if the Innoculated-by: records are not current. The EGP Quarantine command will be used to isolate Autonomous Systems which are suspected of sending contaminated datagrams. A recently released DoD report suggests that part of the impetus behind the ARPANET/Milnet split and the current partitioned network research, is to minimize the possible effects of Internet Bacteriological Warfare. These problems are also being pursued by the International Standards Organization. The committee on Open Systems Innoculation (ISO/OSInnoc) recently released a draft report on a 7-layer cold encapsulation for use by the World Health Organization in Third World Nations. ------- ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983113007313700> Mail-From: KLH created at 30-Nov-83 15:31:37 Date: Wed 30 Nov 83 15:31:37-PST From: Ken Harrenstien Subject: Re: Philosophy, Consistency question. To: DCP%SCRC-TENEX@MIT-MC, tcp-ip@SRI-NIC cc: KLH@SRI-NIC In-Reply-To: Message from ""David C. Plummer" " of Wed 30 Nov 83 12:50:27-PST ------- ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983113007415600> Mail-From: KLH created at 30-Nov-83 15:41:57 Date: Wed 30 Nov 83 15:41:56-PST From: Ken Harrenstien Subject: Re: Philosophy, Consistency question. To: DCP%SCRC-TENEX@MIT-MC, tcp-ip@SRI-NIC In-Reply-To: Message from ""David C. Plummer" " of Wed 30 Nov 83 12:50:27-PST Sorry, MM interprets ^L as "go send it now" and my instinctive attempt to clear the screen sent you a piece of junk instead. It sounds as if you are trying to implement TCP/IP by reading the "event processing" stuff in RFC-793. Since I did the same thing, you might find it helpful to read the file KLH;.TCP QS from MIT-MC, which is an assorted bunch of questions and answers about TCP that I collected while running into the same puzzling things that you seem to be running into now. I hope some of that stuff is helpful and may eliminate some TCP-IP traffic... Of course the right thing is to have all of these tidbits collected into a "TCP/IP document fix file". I trust that Jon is keeping such a file, although it may not be in publicly readable form yet. --Ken ------- ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983113009200000> Return-Path: Received: from KESTREL by SRI-NIC with TCP; Wed 30 Nov 83 23:39:29-PST Date: 30 Nov 1983 1720-PST From: Lynn Gold Subject: Re: Netmail Spreads Common Cold To: ROGERS at USC-ISIB, TCP-IP at SRI-NIC Address: Kestrel Institute, 1801 Page Mill Rd., Palo Alto, CA 94304 Phone: (415) 494-2233 In-Reply-To: Your message of 30-Nov-83 1517-PST AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA-CHOOOOOOO!!!!!!!!!!!!!!!!!! Cough, cough, cough, hack, hack, hack, ahem, ahem... Look out, folks -- I think I've caught something! --Lynn ------- ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983113010480000> Return-Path: <@MIT-MC:DCP%SCRC-TENEX@MIT-MC> Received: from MIT-MC by SRI-NIC with TCP; Wed 30 Nov 83 12:50:08-PST Received: from SCRC-CHARLES by SCRC-QUABBIN with CHAOS; Wed 30-Nov-83 15:49:17-EST Date: Wed, 30 Nov 83 15:48 EST From: "David C. Plummer" Subject: Philosophy, Consistency question. To: tcp-ip@SRI-NIC.ARPA Hi again. Another clarification please. Again, reference September 1981 printing of TCP, though I doubt this has changed. SND.NXT is the number of next byte I am about to send (if allowed). Fine. SEG.ACK is the number the other side is waiting to receive. It does NOT ack the number it says, but acks everything below it. Therefore, this number is EXCLUSIVE. Correct? (Diagram on page 20): SND.UNA points to the first byte of unacknowledged sent data. SND.UNA+SND.WND is the first byte I am NOT allowed to send. This is an EXCLUSIVE number of what I CAN send. Correct? Similarly, RCV.NXT+RCV.WND is an EXCLUSIVE limit of what the other side is allowed to send. Correct? (Page 56, SEND call, the point of this message) "If the urgent flag is set, then SND.UP <- SND.NXT-1 and set the urgent pointer in outgoing segments." When I receive urgent data, everything up to AND INCLUDING the urgent byte number is urgent. Therefore, this is an INCLUSIVE number? Yes? No? Why the inconsistency? Or don't others see it this way? Because it is inclusive, "to send an urgent indication the user must also send at least one data octect." [page 42] However, page 69 states "If the RCV.WND is zero, no segments will be acceptable, but special allowance should be made to accept valid ACKS, URGs, and RSTs." As far as I can tell, URGs will only get processed with a zero window when the segment length is zero and SEG.SEQ = RCV.NXT = RCV.NXT+RCV.WND. Therefore, the other side's SND.NXT = SEG.SEQ (both exclusive) and the urgent pointer is a inclusive number that can no longer be reached (unless the urgent field is a signed 16 bit number). Yes? No? I think this inclusive/exclusive question makes the segment acceptability test look worse than it needs to be (diagrams on pages 26 and 69). It currently reads RCV.NXT =< SEG.SEQ < RCV.NXT+RCV.WND or RCV.NXT =< SEG.SEQ+SEG.LEN-1 < RCV.NXT+RCV.WND Isn't the following equivalent? (Drop the -1 and adjust the comparisons) RCV.NXT =< SEG.SEQ < RCV.NXT+RCV.WND or RCV.NXT < SEG.SEQ+SEG.LEN =< RCV.NXT+RCV.WND ----MESSAGE-END---- ----MESSAGE-BEGIN---- <1983113011272500> Return-Path: Received: from USC-ISIF.ARPA by SRI-NIC with TCP; Wed 30 Nov 83 19:31:10-PST Date: 30 Nov 1983 19:27:25 PST From: POSTEL@USC-ISIF Subject: re: Philosophy, Consistency, and Questions To: tcp-ip@SRI-NIC David C. Plummer: DCP: SND.NXT is the number of next byte I am about to send (if allowed). Fine. DCP: SEG.ACK is the number the other side is waiting to receive. It does NOT ack the number it says, but acks everything below it. Therefore, this number is EXCLUSIVE. Correct? JBP: Yes. DCP: Reference the Diagram on page 20: SND.UNA points to the first byte of unacknowledged sent data. SND.UNA+SND.WND is the first byte I am NOT allowed to send. This is an EXCLUSIVE number of what I CAN send. Correct? JBP: Yes. DCP: Similarly, RCV.NXT+RCV.WND is an EXCLUSIVE limit of what the other side is allowed to send. Correct? JBP: Yes. DCP: Reference Page 56, SEND call, the point of this message: "If the urgent flag is set, then SND.UP <- SND.NXT-1 and set the urgent pointer in outgoing segments." When I receive urgent data, everything up to AND INCLUDING the urgent byte number is urgent. Therefore, this is an INCLUSIVE number? Yes? No? Why the inconsistency? Or don't others see it this way? JBP: Yes. I don't know that any one else ever gave much thought to the inclusive/exclusive catagorization of these numbers. It may be philosophically inconsistent from that point of view, but it does not seem to have any implementation problems. DCP: Because it is inclusive, "to send an urgent indication the user must also send at least one data octect" [page 42]. JBP: Not because it is inclusive or exclusive, but because you have to send something sequence numbered (i.e., a data octet) to get it delivered reliably. DCP: However, page 69 states "If the RCV.WND is zero, no segments will be acceptable, but special allowance should be made to accept valid ACKS, URGs, and RSTs". As far as I can tell, URGs will only get processed with a zero window when the segment length is zero and SEG.SEQ = RCV.NXT = RCV.NXT+RCV.WND. Therefore, the other side's SND.NXT = SEG.SEQ (both exclusive) and the urgent pointer is a inclusive number that can no longer be reached (unless the urgent field is a signed 16 bit number). Yes? No? JBP: The Urgent Pointer is an unsigned offset added to the sequence number of the segment that carries it. It would be pointless to try to make something already ACKed urgent retroactively. The sending TCP is supposed to send the urgent pointer in every segment it sends with data up to and including the urgent pointer until that data is acked. This includes retransmissions of data given to the TCP in earlier send calls. This leads to the segment sent with SEG.SEQ = SND.NXT having the the urgent pointer set. When the window is zero this sequence number is also the receiving TCPs RCV.NXT. When the receiving TCP, EVEN WITH A ZERO RECEIVE WINDOW, receives a segment with SEG.SEQ=RCV.NXT it must check to see if it has an urgent pointer and if so do the urgent pointer processing (as described in the last paragraph on page 73). This is the "special allowance" referred to on page 69. DCP: I think this inclusive/exclusive question makes the segment acceptability test look worse than it needs to be (diagrams on pages 26 and 69). It currently reads RCV.NXT =< SEG.SEQ < RCV.NXT+RCV.WND or RCV.NXT =< SEG.SEQ+SEG.LEN-1 < RCV.NXT+RCV.WND Isn't the following equivalent? (Drop the -1 and adjust the comparisons) RCV.NXT =< SEG.SEQ < RCV.NXT+RCV.WND or RCV.NXT < SEG.SEQ+SEG.LEN =< RCV.NXT+RCV.WND JBP: Yes. --jon. ------- ----MESSAGE-END----