SIP - XMPP interoperability issues

Core

The discussed items are referenced in draft-saintandre-sip-xmpp-core

Address Mapping

We suggest that on the SIP side GRUU (RFC 5627) is a required component for implementing such a gateway. The reason is outlined below.

In XMPP a given user (bare JID) may be logged in with several devices and each will be identified with the resource part, forming the Full JID. In SIP this was not possible, but now there is a way to achieve the same: GRUU (RFC 5627). With GRUU a given SIP device can be addressed as follows: sip:;gr=hfje390 (the 'gr' parameter indicates the instance id).

These two concepts map one-to-one:

sip:;gr=hfje390 <--> xmpp:/hfje390

However, XMPP allows Unicode to be used as the resource, but in SIP URIs need to be ASCII, so here is a proposed translation procedure:

  • Take resource part and encode it in utf-8
  • Hex-encode the result

NOTE: The reason for proposing another translation mechanism than the one defined in (draft-saintandre-sip-xmpp-core-01) is related to presence, the section related to presence contains a more thorough exaplanation on this.

Architectural Assumptions

The XMPP-SIP draft (draft-saintandre-sip-xmpp-core-01) suggests that subdomains are used to indicate if a translation gateway needs to be used (x2s.domain and s2x.domain). In practice, users already have their address books full of URIs, so needing a specific subdomain for indicating that a protocol translation is needed requires user knowledge about the system, which is confusing and leads to zero adoption.

Instead, servers should be smart enough to decide if a request needs to be translated to a different protocol or not. This is application specific behavior, so it could be implemented by using a static list of domains, DNS lookups or by any other means.

Chat sessions

The discussed items are referenced in draft-saintandre-sip-xmpp-chat

Chat message reception acknowledgment

XEP-0184 defines a way to indicate the remote party that an acknowledgement receipt is desired for the current chat message. MSRP has a similar mechanism by using the REPORT chunk type, so they should probably be correlated in order to increase the communication reliability.

However, XEP-0184 suggests that endpoints supporting that XEP may choose not to send the receipt, and there are other reasons why the acknowledgement could be lost. How should the gateway act on the SIP side? A timer set to a reasonable value (and what is a 'reasonable' value?) should probably be used in order to decide if a message is considered as delivered or not. This could be implemented in two ways:

  1. The gateway server sends a REPORT chunk after the timer fires, with a new response code, which would basically indicate "I don't know".
  2. The gateway server would not send a REPORT chunk and the SIP endpoint would need to implement the timer and after it fires alert the user somehow by telling him that it's not clear if the remote endpoint got the message.

Since the trend in SIP is to push the intelligence to the endpoints option 2 seems more reasonable.

Presence

The discussed items are referenced in draft-saintandre-sip-xmpp-presence

SIP PIDF - XMPP stanza translation

There are a number of issues when translating a XMPP stanza to a SIP PIDF:

Availability state

It is suggested that the availability state is mapped to the Person element in PIDF. However, there can only be a single Person element per SIP user. In PIDF each device would add a Tuple element, so adding the availability information at that level would maintain the per-device semantics. In SIPSIMPLE SDK (core of SylkServer) we implemented a extension for PIDF which maps XMPP availability states to PIDF in a generic way. Example:

    <tuple id="id1234">
        <status>
            <basic>open</basic>
            <extended>away</extended>
        </status>
    </tuple>

(note the 'extended' element)

XMPP resource and PIDF tuple ID

The schema for PIDF defines the tuple ID as xs:ID, which has some constraints in the characters it accepts. Thus, a direct mapping between the JID resource and the the tuple ID is not possible. The proposed solution:

    tuple id = "ID-" + encoded_resource

The encoded_resource is the result of encoding the JID resource by following the rules stated in section 1 and prepending "ID-". xs:ID elements can't start with a number, so prepending "ID-" guarantees that the document would validate the schema. This principle can also be followed by SIP endpoints since the GRUU id is a UUID, so it's also not guaranteed to begin with a letter.

Conferencing

The discussed items are referenced in draft-saintandre-sip-xmpp-groupchat

Content-Type translation

In XMPP clients may exchange messages in plaintext and also in HTML. When HTML is being used the plaintext version is also present in the stanza, unlike MSRP, where the is a single content type (unless multipart is being used). If an MSRP endpoint sends HTML content is not obvious how the plaintext version should be generated. In order to make interoperability easier multipart could be used or some sort of HTML to plaintext translation.

Delivering Conference Information to Participants

SIP uses RFC4575 to deliver information to participants in a conference room. XEP-0298 defines a way to also use RFC4575 in Jingle conferences, thus making interoperability really easy. This should probably be the preferred way of delivering participants information in Jingle, at least when used in a mixed environment with SIP.