Commit 07680603 authored by amnesia's avatar amnesia
Browse files

Added HTP design documentation => pending.

parent 24a56af4
Versions
========
T(A)ILS 0.5 uses NTP in a way that is described in our [[dedicated
design document|NTP]]. This document describes the solution
implemented in Git, which will be used in post-0.5 T(A)ILS releases.
Rationale
=========
On the one hand, Tor sometimes freaks out if it detects a too large
clock skew. On the other hand, having the desktop environment display
localized time would be a nice bonus.
That's why we want to *fix the system clock* using some kind of suitable
network protocol when a network interface is brought up and before the
Tor client is told it should connect to the network.
Moreover, we are worried about unauthenticated [[NTP]]. There probably
is a whole bunch of fingerprinting attacks an attacker could mount if
it could pose as the NTP server and mess with the user's time. We
therefore want to be able to *authenticate* the servers that provide
us with supposedly accurate time information.
## Why use HTP?
Home-made research [[demonstrated|todo/authenticate_time_servers]]
that NTPv4's server authentication features do not fit our usecase
yet. Using it is the long term goal but in the meantime we decided to
use [HTP](http://www.clevervest.com/twiki/bin/view/HTP).
[HTP](http://www.clevervest.com/twiki/bin/view/HTP) is not really a
protocol, but uses a feature from HTTP, aka web traffic. According the
specifications of HTTP/1.1 (RFC 2616) a web server needs to put a
timestamp in a response to a web browser request. In web browsers you
don't see the HTTP headers, but these headers contain a timestamp in
Greenwich Mean Time (GMT), accurate in seconds.
These timestamps can be used to get a pretty good estimate of the
current time, even though not to the same accuracy level as NTP.
Being based on HTTP, HTP can use its ready-made features related to
server authentication, such as X.509 certificates... for the time
being.
## Why use a custom program?
As what follows clearly shows, the upstream HTP has quite a few
drawbacks that make it unfit for our needs. That's why T(A)ILS uses a
custom version of the Perl HTP client into `/usr/local/sbin/htpdate`.
The repository we copied this script from can be found there:
git://gaffer.ptitcanardnoir.org/htp.git
For reasons detailed bellow, this version of htpdate uses wget for all
of its HTTP operations.
Implementation
==============
## Integration into the system
A Network Manager hook runs the whole thing:
`/etc/NetworkManager/dispatcher.d/50-htp.sh`.
## Direct access to the network, DNS resolution
As highlighted above it is important to fix the system time before Tor
connects to the network. This explains why the HTP network
communication has to go in the clear, thanks to an exception in the
firewall configuration.
The HTP client communication is delegated to wget. Without anything
special being done, wget would try to resolve the HTP servers
hostnames using the system resolver, that itself forwards queries
through the Tor network... chicken and egg.
We could of course ask htpdate to perform IP-based HTTP requests
(`https://IP/`) but this is pretty uncommon a thing to do and would
thus offer too big possibilities of fingerprinting T(A)ILS users for
our taste. We really want wget to perform name-based HTTP requests:
`https://xxx.domain.tld/`. It is then needed to feed the system
resolver with the needed (hostname, IP) pairs *before* running
htpdate.
This is achieved by querying the nameservers provided by the DHCP
server, when possible. Else, we ask OpenDNS. The (hostname, IP) pairs
are written to `/etc/hosts`, then htpdate is run, and we eventually
remove the added entries from this file.
## Fingerprinting of T(A)ILS users
HTP probably isn't widely used, so if a connection can be identified
as HTP, it very likely was done by an T(A)ILS user.
Some preliminary effort was done to make it harder to identify
connections as HTP. This is only preliminary work and must **NOT** be
considered as a guarantee. See the "Limits" section below.
T(A)ILS developers still need to think thoroughly of these questions:
are such fingerprinting possibilities a serious problem? What kind of
efforts and compromise should be made to prevent these?
Fingerprinting of HTP connections can be tried at two locations:
* the servers we send HTTP requests to;
* the Internet connection being used by a T(A)ILS user.
### Fingerprinting via the servers set being used
If a subset of the queried webservers' admins share their logs, they
could do quite simple correlation search to fingerprint T(A)ILS users.
The pool of servers is then chosen in a way that makes it unlikely for
them to share user data. This should be enough of a protection against
this side of the threat.
On the other hand anyone who monitors a given Internet connection that
is used to run T(A)ILS would probably infer T(A)ILS usage from the
connection pattern (DNS then HTTP) on which our HTP implementation
currently relies on.
### Fingerprinting via unusual http behaviour of the HTP client
The custom HTP client T(A)ILS use is configured so that it pretends to
be the same user agent as iceweasel+torbutton, thanks to
`/usr/local/bin/getTorbuttonUserAgent`.
It was remarked that upstream HTP client's "connection pattern" is
pretty suspicious: a web-browser loading foo.com/index.html would
complete the whole exchange and download index.html + any referenced
resources whereas htpdate/htpd drops the connection once it's got the
first http header. That's why our custom HTP client provides a "full
request" mode, that we use: when run this way, the HTTP exchange is
completed and any needed resources that are normally needed to
display a page are fetched as well: images, CSS, etc.
### Miscellaneous
Using a larger HTP servers pool could help protect users against some
of the threats we described: N servers could be picked at random from
every category defined in the *Servers pool* section.
But still, the connection pattern would still be quite unique.
Moreover browser fingerprinting (see our [[iceweasel audit
page|todo/applications_audit/iceweasel]]) makes is easy to sort wget
apart of normal browsers e.g. because of its lack of JavaScript
support.
## Servers pool
What sources should be trusted? This is of course also a problem
with NTP.
The HTP pool used by T(A)ILS is be based on stable and reliable
webservers that get great amounts of traffic, including:
* two servers whose admins are likely to take great care of their
visitors' privacy: <https://www.torproject.org/> and
<https://mail.riseup.net/>
* one server managed by adversaries of the two "trusted" ones,
in order to prevent identifying data to be shared:
<https://www.google.com/> (!)
* one more or less "neutral" server: <https://secure.wikimedia.org/>
The web pages in the pool have been selected (quite quickly, this can
be improved for sure) using an additional criteria: weight, including
the resources the page depends on: images, CSS and scripts...
## Authentication of servers
The custom `/usr/local/sbin/htpdate` we use delegates certificate
verification to wget. It implements a "paranoid mode" that is enabled
in T(A)ILS: when one server cannot be reached, e.g. because of a
failed certificate checking, this custom version of htpdate considers
the servers pool consistency to not be secure enough and exits.
wget is also directed to only use TLSv1 as a "secure" protocol.
Versions
========
T(A)ILS 0.5 implements what this design document describes. Later
versions will switch to using HTP: see our [[HTP design document|HTP]].
Rationale
=========
......
......@@ -2,10 +2,13 @@
# Rationale
Also, in general I'm worried about unauthenticated NTP. There probably
In general I'm worried about unauthenticated NTP. There probably
is a whole bunch of fingerprinting attacks an attacker could mount if
it could pose as the NTP server and mess with the user's time.
Also see T(A)ILS design documents about [[contribute/design/NTP]] and
[[contribute/design/HTP]].
# Authenticated NTP
As of NTPv4 the Autokey protocol has been implemented which enables
......@@ -48,111 +51,7 @@ the time being. Let's (try to) use HTP in the meantime.
# HTP
[HTP](http://www.clevervest.com/twiki/bin/view/HTP) is not really a
protocol, but uses a feature from HTTP, aka web traffic. According the
specifications of HTTP (RFC 2616) a web server needs to put a
timestamp in a response to a web browser request. In web browsers you
don't see the HTTP headers, but these headers contain a timestamp in
Greenwich Mean Time (GMT), accurate in seconds.
These timestamps, from various web servers which you may specify, can
be used to extract a pretty good time.
So well, that's not as accurate as NTP, but maybe it could be a better
fit for our system than NTP…
We installed a hacked version of the Perl HTP client into
`/usr/local/sbin/htpdate`. Our development repository is at:
git://gaffer.ptitcanardnoir.org/htp.git
## Fingerprinting?
It would need to go in the clear with an exception in the firewall
config. Is that safe? Could this be used to fingerprint
amnesia/incognito users? Is that something we want to avoid?
> HTP probably isn't widely used, so if a connection can be identified
> as HTP, it very likely was done by an amnesia user.
### fingerprinting via the servers set being used
It could be used to fingerprint amnesia/incognito users if a subset
of the queried webservers' admins share their logs. Choosing a bunch
(HTP author suggests 4) of unrelated servers that are unlikely to
share user data should be enough of a protection against
this threat.
Anyone who monitors a given Internet connection that is used to run
T(A)ILS would probably infer T(A)ILS usage from this connection
pattern.
### fingerprinting via unusual http behaviour of the HTP client
Our hacked HTP client allows its user to choose the user agent it
exposes. We use the same user-agent as iceweasel/torbutton,
thanks to `/usr/local/bin/getTorbuttonUserAgent`.
It was remarked that upstream HTP client's "connection pattern" is
pretty suspicious: a web-browser loading foo.com/index.html would
complete the whole exchange and download index.html + any referenced
resources whereas htpdate/htpd drops the connection once it's got the
first http header. That's why our hacked HTP client has a "full
request" mode, that we use; when run this way, the http exchange is
completed, and any needed resources that are normally needed to
display a page are loaded as well.
## Sources?
What sources should be trusted? This is of course also a problem
with NTP.
amnesia's HTP pool could be based on stable and reliable webservers,
including:
* two servers whose admins are likely to take great care of their
visitors' privacy: <https://www.torproject.org/> and
<https://mail.riseup.net/>
* one (?) server managed by adversaries of the two "trusted" ones,
in order to prevent identifying data to be shared:
<https://www.google.com/> (!)
* one more or less "neutral" server: https://secure.wikimedia.org/
Such a pool should be made of lightweight web pages: images, CSS and
scripts will be downloaded; the above-mentioned sites quite fit.
## Certificate validation
Is the HTP client behaving as it should (e.g. exit with a loud
warning without setting the time) when there is a problem with the
certificate when using HTTPS?
Our hacked `/usr/local/sbin/htpdate` uses wget's own certificate
verification, and implements a "paranoid mode": when one server cannot
be reached, e.g. because of a failed certificate checking, this custom
version of htpdate considers the servers pool consistency to not be
secure enough and exits.
## Left to do
The main thing left to do is to integrate our hacked htpdate script
into the boot process
The manual DNS resolution tweaks we are using for NTP are still
relevant, but they will need to be adapted: to limit fingerprinting
possibilities, we do not want to send HTTP requests aimed at an
IP-based virtualhost (`htpdate https://IP/`), but rather to the usual
hostname (`htpdate https://www.eff.org/`) as any "normal" user would
do.
So we must hack something so that the system's resolver knows our HTP
target hosts' IPs when wget (run by htpdate) will ask for them: at
boot time, we could do the very same manual hostname resolution we
already do, write the results to `/etc/hosts`, run htpdate, and
eventually remove(?) these entries from `/etc/hosts`.
Our [[design document|contribute/design/HTP]] describes the
implemented solution.
> This has been implemented, here's what is left:
>
> * the [[design documentation|contribute/design/NTP]] should be
> updated [[!taglink todo/documentation]]
> * [[!taglink todo/test]]!
This item is now [[!taglink todo/pending]].
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment