next up previous
Next: 3 MP low level Up: A Proposal for Syntactic Protocols Previous: 1 Syntactic data integration

  
2 The Multi Protocol

The purpose of the Multi Protocol is to support general and efficient communication of mathematical data among scientific computing systems. MP defines a set of basic types and mechanisms for constructing structured data. Numeric data (fixed and arbitrary precision floats and integers) are transmitted in a binary format (2's complement, IEEE float, etc.). Composite data (such as general expressions, polynomials, matrices, etc.) are represented as a linearized, annotated syntax tree (MP tree) which is transmitted as a sequence of node and annotation packets, where each node packet transmits a node from the syntax tree. The node packet has fields giving the type of the data carried in the packet, the number of arguments (for operators) that follow, a dictionary tag, the number of annotations attached to the node.

A data packet is a block of data unemcumbered by node packet headers and may refer to a single data item or a collection of, possibly heterogeneous, data items.

Annotations carry additional information which is either supplementary and can be safely ignored by the receiver, or may contain information essential to the proper decoding of the data. Each annotation is tagged in such a way that the receiver always knows whether it can safely ignore the annotation content or not.

In a layer above the data exchange portion of the protocol, MP supports collections of definitions for annotations and mathematical symbols (operators and symbolic constants) in dictionaries. Dictionaries address the problem of data integration by defining standardized representation(s) and semantics for mathematical objects. They are identified within packets through a dictionary tag field. Applications that communicate according to definitions provided in dictionaries do not need to have direct knowledge of each other.

Applications send (``Put'' interface) and receive (``Get'' interface) messages, each containing a MP tree which is (typically) created by calling routines from the MP Application Programming Interface (API). An application communicates with other applications through a MP link, which is simply an abstraction of an underlying data transport mechanism that is bound to the link at the time of its creation.

If MP's major goal were generality, then this goal would be achieved best by defining one standard representation for each basic and structured object and by requiring that each datum be communicated as a node packet (i.e., each datum is communicated together with type and other information). Consequently, a 100x100 integer matrix A which has zero entries everywhere except for A[1,1] = A[2,1] = A[2,4] = 1 should be communicated as a ``array of array of integer node packets'' as shown in 1.

  
Figure 1: A encoded as general expression tree matrix
\begin{figure}% latex2html id marker 378
\setlength{\tabcolsep}{1.0mm} \centerin...
...[2,4]=1\\
\multicolumn{5}{c}{....}\\ \hline\end{tabular}\end{small}\end{figure}

Notice that of the total 804,004 bytes communicated, 404,004 are used for node headers and that of the 400,000 data bytes communicated, only 12 are non-null - certainly not a very efficient format to communicate A.

On the other hand, if MP's major goal were efficiency, then this goal would be achieved best by having a large variety of statically defined representations for basic and structured objects. Consequently the matrix A would be communicated as a ``list of integer data packets'' as shown in Figure 2.

  
Figure 2: A encoded as statically defined sparse matrix
\begin{figure}% latex2html id marker 390
\setlength{\tabcolsep}{1.0mm} \centerin...
...& & 4 & & \\
& & 1 & & [2,4] = 1 \\ \hline\end{tabular}\end{small}\end{figure}

Notice that it only takes 48 bytes to encode A. Furthermore, notice that a correct decoding of A presupposes explicit (i.e., static) knowledge that the operator SpIntMat is followed by integer data packets (i.e., headerless integers) specifying the dimensions and non-zero entries of A - certainly not a very general format to communicate A.

Although somewhat simple, these examples illustrate important aspects of the tension of generality and efficiency within the context of MP. In the next two sections we will present our solutions to this problem: a restricted form of negotiation of basic data formats and powerful mechanisms to dynamically describe the type and structure of homogenous data using the prototype annotation. The main ideas behind these principles can be found in the first papers about MP [#!kn:GKW1!#,#!mp:jsc1!#], and in [#!sg:BSG1!#] we focused on solutions of the generality versus efficiency problem within the context of communicating polynomial data. Realizing that this problem is far more generally applicable and complex than in [#!sg:BSG1!#] led to the research and results described in this paper.


next up previous
Next: 3 MP low level Up: A Proposal for Syntactic Protocols Previous: 1 Syntactic data integration
| ZCA Home | Reports |