Recover from buggy HTML
On my sid system, check-mirrors.rb fails on one mirror whose web page has invalid HTML:
$ ./check-mirrors.rb --channel ${DIST:?} --allow-multiple --debug --fast tails-amd64-${VERSION:?} --url-prefix https://ftp.nluug.nl/os/Linux/distr/tails/tails/
version: tails-amd64-4.15
fetch: https://tails.boum.org/inc/trace
trace: 1611584352
mirror: https://ftp.nluug.nl/os/Linux/distr/tails/tails/
fetch: https://ftp.nluug.nl/os/Linux/distr/tails/tails//project/trace
trace: 1611584352
fetch: https://ftp.nluug.nl/os/Linux/distr/tails/tails//stable/
Traceback (most recent call last):
7: from ./check-mirrors.rb:447:in `<main>'
6: from ./check-mirrors.rb:447:in `each'
5: from ./check-mirrors.rb:465:in `block in <main>'
4: from ./check-mirrors.rb:200:in `check_versions'
3: from ./check-mirrors.rb:187:in `scan_for_links'
2: from /usr/lib/ruby/vendor_ruby/nokogiri/html.rb:16:in `HTML'
1: from /usr/lib/ruby/vendor_ruby/nokogiri/html/document.rb:215:in `parse'
/usr/lib/ruby/vendor_ruby/nokogiri/html/document.rb:215:in `read_memory': Parser without recover option encountered error or warning: 30:8: ERROR: Opening and ending tag mismatch: font and b (Nokogiri::XML::SyntaxError)
zsh: exit 1 ./check-mirrors.rb --channel ${DIST:?} --allow-multiple --debug --fast
To workaround this, enable the Nokogiri RECOVER parser option, whose documentation reads "Recover from errors": https://nokogiri.org/rdoc/Nokogiri/XML/ParseOptions.html