HTP.mdwn 7.19 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
Versions
========

T(A)ILS 0.5 uses NTP in a way that is described in our [[dedicated
design document|NTP]]. This document describes the solution
implemented in Git, which will be used in post-0.5 T(A)ILS releases.

Rationale
=========

On the one hand, Tor sometimes freaks out if it detects a too large
clock skew. On the other hand, having the desktop environment display
localized time would be a nice bonus.

That's why we want to *fix the system clock* using some kind of suitable
network protocol when a network interface is brought up and before the
Tor client is told it should connect to the network.

Moreover, we are worried about unauthenticated [[NTP]]. There probably
is a whole bunch of fingerprinting attacks an attacker could mount if
it could pose as the NTP server and mess with the user's time. We
therefore want to be able to *authenticate* the servers that provide
us with supposedly accurate time information.

## Why use HTP?

Home-made research [[demonstrated|todo/authenticate_time_servers]]
that NTPv4's server authentication features do not fit our usecase
yet. Using it is the long term goal but in the meantime we decided to
amnesia's avatar
amnesia committed
30
use [HTP](http://www.vervest.org/htp/).
31

amnesia's avatar
amnesia committed
32
[HTP](http://www.vervest.org/htp/) is not really a
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
protocol, but uses a feature from HTTP, aka web traffic. According the
specifications of HTTP/1.1 (RFC 2616) a web server needs to put a
timestamp in a response to a web browser request. In web browsers you
don't see the HTTP headers, but these headers contain a timestamp in
Greenwich Mean Time (GMT), accurate in seconds.

These timestamps can be used to get a pretty good estimate of the
current time, even though not to the same accuracy level as NTP.

Being based on HTTP, HTP can use its ready-made features related to
server authentication, such as X.509 certificates... for the time
being.

## Why use a custom program?

As what follows clearly shows, the upstream HTP has quite a few
drawbacks that make it unfit for our needs. That's why T(A)ILS uses a
custom version of the Perl HTP client into `/usr/local/sbin/htpdate`.
The repository we copied this script from can be found there:

    git://gaffer.ptitcanardnoir.org/htp.git

For reasons detailed bellow, this version of htpdate uses wget for all
of its HTTP operations.

Implementation
==============

## Integration into the system

A Network Manager hook runs the whole thing:
`/etc/NetworkManager/dispatcher.d/50-htp.sh`.

## Direct access to the network, DNS resolution

As highlighted above it is important to fix the system time before Tor
connects to the network. This explains why the HTP network
communication has to go in the clear, thanks to an exception in the
firewall configuration.

The HTP client communication is delegated to wget. Without anything
special being done, wget would try to resolve the HTP servers
hostnames using the system resolver, that itself forwards queries
through the Tor network... chicken and egg.

We could of course ask htpdate to perform IP-based HTTP requests
(`https://IP/`) but this is pretty uncommon a thing to do and would
thus offer too big possibilities of fingerprinting T(A)ILS users for
our taste. We really want wget to perform name-based HTTP requests:
`https://xxx.domain.tld/`. It is then needed to feed the system
resolver with the needed (hostname, IP) pairs *before* running
htpdate.

This is achieved by querying the nameservers provided by the DHCP
server, when possible. Else, we ask OpenDNS. The (hostname, IP) pairs
are written to `/etc/hosts`, then htpdate is run, and we eventually
remove the added entries from this file.

## Fingerprinting of T(A)ILS users

HTP probably isn't widely used, so if a connection can be identified
as HTP, it very likely was done by an T(A)ILS user.

Some preliminary effort was done to make it harder to identify
connections as HTP. This is only preliminary work and must **NOT** be
considered as a guarantee. See the "Limits" section below.

T(A)ILS developers still need to think thoroughly of these questions:
are such fingerprinting possibilities a serious problem? What kind of
efforts and compromise should be made to prevent these?

Fingerprinting of HTP connections can be tried at two locations:

* the servers we send HTTP requests to;
* the Internet connection being used by a T(A)ILS user.

### Fingerprinting via the servers set being used

If a subset of the queried webservers' admins share their logs, they
could do quite simple correlation search to fingerprint T(A)ILS users.
The pool of servers is then chosen in a way that makes it unlikely for
them to share user data. This should be enough of a protection against
this side of the threat.

On the other hand anyone who monitors a given Internet connection that
is used to run T(A)ILS would probably infer T(A)ILS usage from the
connection pattern (DNS then HTTP) on which our HTP implementation
currently relies on.

### Fingerprinting via unusual http behaviour of the HTP client

The custom HTP client T(A)ILS use is configured so that it pretends to
be the same user agent as iceweasel+torbutton, thanks to
`/usr/local/bin/getTorbuttonUserAgent`.

It was remarked that upstream HTP client's "connection pattern" is
pretty suspicious: a web-browser loading foo.com/index.html would
complete the whole exchange and download index.html + any referenced
resources whereas htpdate/htpd drops the connection once it's got the
first http header. That's why our custom HTP client provides a "full
request" mode, that we use: when run this way, the HTTP exchange is
completed and any needed resources that are normally needed to
display a page are fetched as well: images, CSS, etc.

### Miscellaneous

Using a larger HTP servers pool could help protect users against some
of the threats we described: N servers could be picked at random from
every category defined in the *Servers pool* section.

But still, the connection pattern would still be quite unique.
Moreover browser fingerprinting (see our [[iceweasel audit
page|todo/applications_audit/iceweasel]]) makes is easy to sort wget
apart of normal browsers e.g. because of its lack of JavaScript
support.

## Servers pool

What sources should be trusted? This is of course also a problem
with NTP.

The HTP pool used by T(A)ILS is be based on stable and reliable
webservers that get great amounts of traffic, including:

* two servers whose admins are likely to take great care of their
  visitors' privacy: <https://www.torproject.org/> and
  <https://mail.riseup.net/>
* one server managed by adversaries of the two "trusted" ones,
  in order to prevent identifying data to be shared:
  <https://www.google.com/> (!)
* one more or less "neutral" server: <https://secure.wikimedia.org/>

The web pages in the pool have been selected (quite quickly, this can
be improved for sure) using an additional criteria: weight, including
the resources the page depends on: images, CSS and scripts...

## Authentication of servers

The custom `/usr/local/sbin/htpdate` we use delegates certificate
verification to wget. It implements a "paranoid mode" that is enabled
in T(A)ILS: when one server cannot be reached, e.g. because of a
failed certificate checking, this custom version of htpdate considers
the servers pool consistency to not be secure enough and exits.

wget is also directed to only use TLSv1 as a "secure" protocol.