|
@@ -0,0 +1,42 @@
|
|
1
|
+# End-to-end encryption using Insertable Streams
|
|
2
|
+
|
|
3
|
+**NOTE** e2ee is work in progress.
|
|
4
|
+This document describes some of the high-level concepts and outlines the design.
|
|
5
|
+Please refer to the source code for details.
|
|
6
|
+
|
|
7
|
+## Deriving the key from the e2eekey url hash
|
|
8
|
+We take the key from the url hash. Unlike query parameters this does not get
|
|
9
|
+sent to the server so it is the right place for it. We use
|
|
10
|
+the window.location.onhashchange event to listen for changes in the e2ee
|
|
11
|
+key property.
|
|
12
|
+
|
|
13
|
+It is important to note that this key should not get exchanged via the server.
|
|
14
|
+There needs to be some other means of exchanging it.
|
|
15
|
+
|
|
16
|
+From this key we derive a 128bit key using PBKDF2. We use the room name as a salt in this key generation. This is a bit weak but we need to start with information that is the same for all participants so we can not yet use a proper random salt.
|
|
17
|
+
|
|
18
|
+We derive the same key and use it for encrypting and decrypting from all participants. We are working on including the MUC resource of the sender in this in order to switch to per-participant keys which is the model want to migrate to in the end.
|
|
19
|
+
|
|
20
|
+We plan to rotate the key whenever a participant joins or leaves. However, we need end-to-end encrypted signaling to exchange those keys so we are not doing this yet.
|
|
21
|
+
|
|
22
|
+## The encrypted frame
|
|
23
|
+The derived key is used in the transformations of the Insertable Streams API.
|
|
24
|
+These transformations use AES-GCM (with a 128 bit key; we could have used
|
|
25
|
+256 bits but since the keys are short-lived decided against it) and the
|
|
26
|
+webcrypto API:
|
|
27
|
+ https://developer.mozilla.org/en-US/docs/Web/API/SubtleCrypto/encrypt
|
|
28
|
+
|
|
29
|
+AES-GCM needs a 96 bit initialization vector which we construct
|
|
30
|
+based on the SSRC, the rtp timestamp and a frame counter which is similar to
|
|
31
|
+how the IV is constructed in SRTP with GCM
|
|
32
|
+ https://tools.ietf.org/html/rfc7714#section-8.1
|
|
33
|
+
|
|
34
|
+This IV gets sent along with the packet, adding 12 bytes of overhead. The GCM
|
|
35
|
+tag length is the default 128 bits or 16 bytes. For video this overhead is ok but
|
|
36
|
+for audio (where the opus frames are much, much smaller) we are considering shorter
|
|
37
|
+authentication tags.
|
|
38
|
+
|
|
39
|
+We do not encrypt the first few bytes of the packet that form the VP8 header or the Opus
|
|
40
|
+This allows the encoder to understand the frame a bit more and makes it generate the fun looking garbage we see in the video. This also means the SFU does not know (ideally) that the content is end-to-end encrypted and there are no changes in the SFU required at all.
|
|
41
|
+
|
|
42
|
+Decryption errors are handled by just forwarding the frame to the decoder. In particular that means that when receiving unencrypted video we will display it as is.
|