...

Source file src/encoding/gob/doc.go

Documentation: encoding/gob

		 1  // Copyright 2009 The Go Authors. All rights reserved.
		 2  // Use of this source code is governed by a BSD-style
		 3  // license that can be found in the LICENSE file.
		 4  
		 5  /*
		 6  Package gob manages streams of gobs - binary values exchanged between an
		 7  Encoder (transmitter) and a Decoder (receiver). A typical use is transporting
		 8  arguments and results of remote procedure calls (RPCs) such as those provided by
		 9  package "net/rpc".
		10  
		11  The implementation compiles a custom codec for each data type in the stream and
		12  is most efficient when a single Encoder is used to transmit a stream of values,
		13  amortizing the cost of compilation.
		14  
		15  Basics
		16  
		17  A stream of gobs is self-describing. Each data item in the stream is preceded by
		18  a specification of its type, expressed in terms of a small set of predefined
		19  types. Pointers are not transmitted, but the things they point to are
		20  transmitted; that is, the values are flattened. Nil pointers are not permitted,
		21  as they have no value. Recursive types work fine, but
		22  recursive values (data with cycles) are problematic. This may change.
		23  
		24  To use gobs, create an Encoder and present it with a series of data items as
		25  values or addresses that can be dereferenced to values. The Encoder makes sure
		26  all type information is sent before it is needed. At the receive side, a
		27  Decoder retrieves values from the encoded stream and unpacks them into local
		28  variables.
		29  
		30  Types and Values
		31  
		32  The source and destination values/types need not correspond exactly. For structs,
		33  fields (identified by name) that are in the source but absent from the receiving
		34  variable will be ignored. Fields that are in the receiving variable but missing
		35  from the transmitted type or value will be ignored in the destination. If a field
		36  with the same name is present in both, their types must be compatible. Both the
		37  receiver and transmitter will do all necessary indirection and dereferencing to
		38  convert between gobs and actual Go values. For instance, a gob type that is
		39  schematically,
		40  
		41  	struct { A, B int }
		42  
		43  can be sent from or received into any of these Go types:
		44  
		45  	struct { A, B int }	// the same
		46  	*struct { A, B int }	// extra indirection of the struct
		47  	struct { *A, **B int }	// extra indirection of the fields
		48  	struct { A, B int64 }	// different concrete value type; see below
		49  
		50  It may also be received into any of these:
		51  
		52  	struct { A, B int }	// the same
		53  	struct { B, A int }	// ordering doesn't matter; matching is by name
		54  	struct { A, B, C int }	// extra field (C) ignored
		55  	struct { B int }	// missing field (A) ignored; data will be dropped
		56  	struct { B, C int }	// missing field (A) ignored; extra field (C) ignored.
		57  
		58  Attempting to receive into these types will draw a decode error:
		59  
		60  	struct { A int; B uint }	// change of signedness for B
		61  	struct { A int; B float }	// change of type for B
		62  	struct { }			// no field names in common
		63  	struct { C, D int }		// no field names in common
		64  
		65  Integers are transmitted two ways: arbitrary precision signed integers or
		66  arbitrary precision unsigned integers. There is no int8, int16 etc.
		67  discrimination in the gob format; there are only signed and unsigned integers. As
		68  described below, the transmitter sends the value in a variable-length encoding;
		69  the receiver accepts the value and stores it in the destination variable.
		70  Floating-point numbers are always sent using IEEE-754 64-bit precision (see
		71  below).
		72  
		73  Signed integers may be received into any signed integer variable: int, int16, etc.;
		74  unsigned integers may be received into any unsigned integer variable; and floating
		75  point values may be received into any floating point variable. However,
		76  the destination variable must be able to represent the value or the decode
		77  operation will fail.
		78  
		79  Structs, arrays and slices are also supported. Structs encode and decode only
		80  exported fields. Strings and arrays of bytes are supported with a special,
		81  efficient representation (see below). When a slice is decoded, if the existing
		82  slice has capacity the slice will be extended in place; if not, a new array is
		83  allocated. Regardless, the length of the resulting slice reports the number of
		84  elements decoded.
		85  
		86  In general, if allocation is required, the decoder will allocate memory. If not,
		87  it will update the destination variables with values read from the stream. It does
		88  not initialize them first, so if the destination is a compound value such as a
		89  map, struct, or slice, the decoded values will be merged elementwise into the
		90  existing variables.
		91  
		92  Functions and channels will not be sent in a gob. Attempting to encode such a value
		93  at the top level will fail. A struct field of chan or func type is treated exactly
		94  like an unexported field and is ignored.
		95  
		96  Gob can encode a value of any type implementing the GobEncoder or
		97  encoding.BinaryMarshaler interfaces by calling the corresponding method,
		98  in that order of preference.
		99  
	 100  Gob can decode a value of any type implementing the GobDecoder or
	 101  encoding.BinaryUnmarshaler interfaces by calling the corresponding method,
	 102  again in that order of preference.
	 103  
	 104  Encoding Details
	 105  
	 106  This section documents the encoding, details that are not important for most
	 107  users. Details are presented bottom-up.
	 108  
	 109  An unsigned integer is sent one of two ways. If it is less than 128, it is sent
	 110  as a byte with that value. Otherwise it is sent as a minimal-length big-endian
	 111  (high byte first) byte stream holding the value, preceded by one byte holding the
	 112  byte count, negated. Thus 0 is transmitted as (00), 7 is transmitted as (07) and
	 113  256 is transmitted as (FE 01 00).
	 114  
	 115  A boolean is encoded within an unsigned integer: 0 for false, 1 for true.
	 116  
	 117  A signed integer, i, is encoded within an unsigned integer, u. Within u, bits 1
	 118  upward contain the value; bit 0 says whether they should be complemented upon
	 119  receipt. The encode algorithm looks like this:
	 120  
	 121  	var u uint
	 122  	if i < 0 {
	 123  		u = (^uint(i) << 1) | 1 // complement i, bit 0 is 1
	 124  	} else {
	 125  		u = (uint(i) << 1) // do not complement i, bit 0 is 0
	 126  	}
	 127  	encodeUnsigned(u)
	 128  
	 129  The low bit is therefore analogous to a sign bit, but making it the complement bit
	 130  instead guarantees that the largest negative integer is not a special case. For
	 131  example, -129=^128=(^256>>1) encodes as (FE 01 01).
	 132  
	 133  Floating-point numbers are always sent as a representation of a float64 value.
	 134  That value is converted to a uint64 using math.Float64bits. The uint64 is then
	 135  byte-reversed and sent as a regular unsigned integer. The byte-reversal means the
	 136  exponent and high-precision part of the mantissa go first. Since the low bits are
	 137  often zero, this can save encoding bytes. For instance, 17.0 is encoded in only
	 138  three bytes (FE 31 40).
	 139  
	 140  Strings and slices of bytes are sent as an unsigned count followed by that many
	 141  uninterpreted bytes of the value.
	 142  
	 143  All other slices and arrays are sent as an unsigned count followed by that many
	 144  elements using the standard gob encoding for their type, recursively.
	 145  
	 146  Maps are sent as an unsigned count followed by that many key, element
	 147  pairs. Empty but non-nil maps are sent, so if the receiver has not allocated
	 148  one already, one will always be allocated on receipt unless the transmitted map
	 149  is nil and not at the top level.
	 150  
	 151  In slices and arrays, as well as maps, all elements, even zero-valued elements,
	 152  are transmitted, even if all the elements are zero.
	 153  
	 154  Structs are sent as a sequence of (field number, field value) pairs. The field
	 155  value is sent using the standard gob encoding for its type, recursively. If a
	 156  field has the zero value for its type (except for arrays; see above), it is omitted
	 157  from the transmission. The field number is defined by the type of the encoded
	 158  struct: the first field of the encoded type is field 0, the second is field 1,
	 159  etc. When encoding a value, the field numbers are delta encoded for efficiency
	 160  and the fields are always sent in order of increasing field number; the deltas are
	 161  therefore unsigned. The initialization for the delta encoding sets the field
	 162  number to -1, so an unsigned integer field 0 with value 7 is transmitted as unsigned
	 163  delta = 1, unsigned value = 7 or (01 07). Finally, after all the fields have been
	 164  sent a terminating mark denotes the end of the struct. That mark is a delta=0
	 165  value, which has representation (00).
	 166  
	 167  Interface types are not checked for compatibility; all interface types are
	 168  treated, for transmission, as members of a single "interface" type, analogous to
	 169  int or []byte - in effect they're all treated as interface{}. Interface values
	 170  are transmitted as a string identifying the concrete type being sent (a name
	 171  that must be pre-defined by calling Register), followed by a byte count of the
	 172  length of the following data (so the value can be skipped if it cannot be
	 173  stored), followed by the usual encoding of concrete (dynamic) value stored in
	 174  the interface value. (A nil interface value is identified by the empty string
	 175  and transmits no value.) Upon receipt, the decoder verifies that the unpacked
	 176  concrete item satisfies the interface of the receiving variable.
	 177  
	 178  If a value is passed to Encode and the type is not a struct (or pointer to struct,
	 179  etc.), for simplicity of processing it is represented as a struct of one field.
	 180  The only visible effect of this is to encode a zero byte after the value, just as
	 181  after the last field of an encoded struct, so that the decode algorithm knows when
	 182  the top-level value is complete.
	 183  
	 184  The representation of types is described below. When a type is defined on a given
	 185  connection between an Encoder and Decoder, it is assigned a signed integer type
	 186  id. When Encoder.Encode(v) is called, it makes sure there is an id assigned for
	 187  the type of v and all its elements and then it sends the pair (typeid, encoded-v)
	 188  where typeid is the type id of the encoded type of v and encoded-v is the gob
	 189  encoding of the value v.
	 190  
	 191  To define a type, the encoder chooses an unused, positive type id and sends the
	 192  pair (-type id, encoded-type) where encoded-type is the gob encoding of a wireType
	 193  description, constructed from these types:
	 194  
	 195  	type wireType struct {
	 196  		ArrayT					 *ArrayType
	 197  		SliceT					 *SliceType
	 198  		StructT					*StructType
	 199  		MapT						 *MapType
	 200  		GobEncoderT			*gobEncoderType
	 201  		BinaryMarshalerT *gobEncoderType
	 202  		TextMarshalerT	 *gobEncoderType
	 203  
	 204  	}
	 205  	type arrayType struct {
	 206  		CommonType
	 207  		Elem typeId
	 208  		Len	int
	 209  	}
	 210  	type CommonType struct {
	 211  		Name string // the name of the struct type
	 212  		Id	int		// the id of the type, repeated so it's inside the type
	 213  	}
	 214  	type sliceType struct {
	 215  		CommonType
	 216  		Elem typeId
	 217  	}
	 218  	type structType struct {
	 219  		CommonType
	 220  		Field []*fieldType // the fields of the struct.
	 221  	}
	 222  	type fieldType struct {
	 223  		Name string // the name of the field.
	 224  		Id	 int		// the type id of the field, which must be already defined
	 225  	}
	 226  	type mapType struct {
	 227  		CommonType
	 228  		Key	typeId
	 229  		Elem typeId
	 230  	}
	 231  	type gobEncoderType struct {
	 232  		CommonType
	 233  	}
	 234  
	 235  If there are nested type ids, the types for all inner type ids must be defined
	 236  before the top-level type id is used to describe an encoded-v.
	 237  
	 238  For simplicity in setup, the connection is defined to understand these types a
	 239  priori, as well as the basic gob types int, uint, etc. Their ids are:
	 240  
	 241  	bool				1
	 242  	int				 2
	 243  	uint				3
	 244  	float			 4
	 245  	[]byte			5
	 246  	string			6
	 247  	complex		 7
	 248  	interface	 8
	 249  	// gap for reserved ids.
	 250  	WireType		16
	 251  	ArrayType	 17
	 252  	CommonType	18
	 253  	SliceType	 19
	 254  	StructType	20
	 255  	FieldType	 21
	 256  	// 22 is slice of fieldType.
	 257  	MapType		 23
	 258  
	 259  Finally, each message created by a call to Encode is preceded by an encoded
	 260  unsigned integer count of the number of bytes remaining in the message. After
	 261  the initial type name, interface values are wrapped the same way; in effect, the
	 262  interface value acts like a recursive invocation of Encode.
	 263  
	 264  In summary, a gob stream looks like
	 265  
	 266  	(byteCount (-type id, encoding of a wireType)* (type id, encoding of a value))*
	 267  
	 268  where * signifies zero or more repetitions and the type id of a value must
	 269  be predefined or be defined before the value in the stream.
	 270  
	 271  Compatibility: Any future changes to the package will endeavor to maintain
	 272  compatibility with streams encoded using previous versions. That is, any released
	 273  version of this package should be able to decode data written with any previously
	 274  released version, subject to issues such as security fixes. See the Go compatibility
	 275  document for background: https://golang.org/doc/go1compat
	 276  
	 277  See "Gobs of data" for a design discussion of the gob wire format:
	 278  https://blog.golang.org/gobs-of-data
	 279  */
	 280  package gob
	 281  
	 282  /*
	 283  Grammar:
	 284  
	 285  Tokens starting with a lower case letter are terminals; int(n)
	 286  and uint(n) represent the signed/unsigned encodings of the value n.
	 287  
	 288  GobStream:
	 289  	DelimitedMessage*
	 290  DelimitedMessage:
	 291  	uint(lengthOfMessage) Message
	 292  Message:
	 293  	TypeSequence TypedValue
	 294  TypeSequence
	 295  	(TypeDefinition DelimitedTypeDefinition*)?
	 296  DelimitedTypeDefinition:
	 297  	uint(lengthOfTypeDefinition) TypeDefinition
	 298  TypedValue:
	 299  	int(typeId) Value
	 300  TypeDefinition:
	 301  	int(-typeId) encodingOfWireType
	 302  Value:
	 303  	SingletonValue | StructValue
	 304  SingletonValue:
	 305  	uint(0) FieldValue
	 306  FieldValue:
	 307  	builtinValue | ArrayValue | MapValue | SliceValue | StructValue | InterfaceValue
	 308  InterfaceValue:
	 309  	NilInterfaceValue | NonNilInterfaceValue
	 310  NilInterfaceValue:
	 311  	uint(0)
	 312  NonNilInterfaceValue:
	 313  	ConcreteTypeName TypeSequence InterfaceContents
	 314  ConcreteTypeName:
	 315  	uint(lengthOfName) [already read=n] name
	 316  InterfaceContents:
	 317  	int(concreteTypeId) DelimitedValue
	 318  DelimitedValue:
	 319  	uint(length) Value
	 320  ArrayValue:
	 321  	uint(n) FieldValue*n [n elements]
	 322  MapValue:
	 323  	uint(n) (FieldValue FieldValue)*n	[n (key, value) pairs]
	 324  SliceValue:
	 325  	uint(n) FieldValue*n [n elements]
	 326  StructValue:
	 327  	(uint(fieldDelta) FieldValue)*
	 328  */
	 329  
	 330  /*
	 331  For implementers and the curious, here is an encoded example. Given
	 332  	type Point struct {X, Y int}
	 333  and the value
	 334  	p := Point{22, 33}
	 335  the bytes transmitted that encode p will be:
	 336  	1f ff 81 03 01 01 05 50 6f 69 6e 74 01 ff 82 00
	 337  	01 02 01 01 58 01 04 00 01 01 59 01 04 00 00 00
	 338  	07 ff 82 01 2c 01 42 00
	 339  They are determined as follows.
	 340  
	 341  Since this is the first transmission of type Point, the type descriptor
	 342  for Point itself must be sent before the value. This is the first type
	 343  we've sent on this Encoder, so it has type id 65 (0 through 64 are
	 344  reserved).
	 345  
	 346  	1f	// This item (a type descriptor) is 31 bytes long.
	 347  	ff 81	// The negative of the id for the type we're defining, -65.
	 348  		// This is one byte (indicated by FF = -1) followed by
	 349  		// ^-65<<1 | 1. The low 1 bit signals to complement the
	 350  		// rest upon receipt.
	 351  
	 352  	// Now we send a type descriptor, which is itself a struct (wireType).
	 353  	// The type of wireType itself is known (it's built in, as is the type of
	 354  	// all its components), so we just need to send a *value* of type wireType
	 355  	// that represents type "Point".
	 356  	// Here starts the encoding of that value.
	 357  	// Set the field number implicitly to -1; this is done at the beginning
	 358  	// of every struct, including nested structs.
	 359  	03	// Add 3 to field number; now 2 (wireType.structType; this is a struct).
	 360  		// structType starts with an embedded CommonType, which appears
	 361  		// as a regular structure here too.
	 362  	01	// add 1 to field number (now 0); start of embedded CommonType.
	 363  	01	// add 1 to field number (now 0, the name of the type)
	 364  	05	// string is (unsigned) 5 bytes long
	 365  	50 6f 69 6e 74	// wireType.structType.CommonType.name = "Point"
	 366  	01	// add 1 to field number (now 1, the id of the type)
	 367  	ff 82	// wireType.structType.CommonType._id = 65
	 368  	00	// end of embedded wiretype.structType.CommonType struct
	 369  	01	// add 1 to field number (now 1, the field array in wireType.structType)
	 370  	02	// There are two fields in the type (len(structType.field))
	 371  	01	// Start of first field structure; add 1 to get field number 0: field[0].name
	 372  	01	// 1 byte
	 373  	58	// structType.field[0].name = "X"
	 374  	01	// Add 1 to get field number 1: field[0].id
	 375  	04	// structType.field[0].typeId is 2 (signed int).
	 376  	00	// End of structType.field[0]; start structType.field[1]; set field number to -1.
	 377  	01	// Add 1 to get field number 0: field[1].name
	 378  	01	// 1 byte
	 379  	59	// structType.field[1].name = "Y"
	 380  	01	// Add 1 to get field number 1: field[1].id
	 381  	04	// struct.Type.field[1].typeId is 2 (signed int).
	 382  	00	// End of structType.field[1]; end of structType.field.
	 383  	00	// end of wireType.structType structure
	 384  	00	// end of wireType structure
	 385  
	 386  Now we can send the Point value. Again the field number resets to -1:
	 387  
	 388  	07	// this value is 7 bytes long
	 389  	ff 82	// the type number, 65 (1 byte (-FF) followed by 65<<1)
	 390  	01	// add one to field number, yielding field 0
	 391  	2c	// encoding of signed "22" (0x2c = 44 = 22<<1); Point.x = 22
	 392  	01	// add one to field number, yielding field 1
	 393  	42	// encoding of signed "33" (0x42 = 66 = 33<<1); Point.y = 33
	 394  	00	// end of structure
	 395  
	 396  The type encoding is long and fairly intricate but we send it only once.
	 397  If p is transmitted a second time, the type is already known so the
	 398  output will be just:
	 399  
	 400  	07 ff 82 01 2c 01 42 00
	 401  
	 402  A single non-struct value at top level is transmitted like a field with
	 403  delta tag 0. For instance, a signed integer with value 3 presented as
	 404  the argument to Encode will emit:
	 405  
	 406  	03 04 00 06
	 407  
	 408  Which represents:
	 409  
	 410  	03	// this value is 3 bytes long
	 411  	04	// the type number, 2, represents an integer
	 412  	00	// tag delta 0
	 413  	06	// value 3
	 414  
	 415  */
	 416  

View as plain text