Skip to content

Recover from buggy HTML

intrigeri requested to merge bugfix/recover-from-buggy-html into master

On my sid system, check-mirrors.rb fails on one mirror whose web page has invalid HTML:

$ ./check-mirrors.rb --channel ${DIST:?} --allow-multiple --debug --fast tails-amd64-${VERSION:?} --url-prefix https://ftp.nluug.nl/os/Linux/distr/tails/tails/
version:  tails-amd64-4.15
fetch:    https://tails.boum.org/inc/trace
trace:    1611584352

mirror:   https://ftp.nluug.nl/os/Linux/distr/tails/tails/
fetch:    https://ftp.nluug.nl/os/Linux/distr/tails/tails//project/trace
trace:    1611584352
fetch:    https://ftp.nluug.nl/os/Linux/distr/tails/tails//stable/
Traceback (most recent call last):
	7: from ./check-mirrors.rb:447:in `<main>'
	6: from ./check-mirrors.rb:447:in `each'
	5: from ./check-mirrors.rb:465:in `block in <main>'
	4: from ./check-mirrors.rb:200:in `check_versions'
	3: from ./check-mirrors.rb:187:in `scan_for_links'
	2: from /usr/lib/ruby/vendor_ruby/nokogiri/html.rb:16:in `HTML'
	1: from /usr/lib/ruby/vendor_ruby/nokogiri/html/document.rb:215:in `parse'
/usr/lib/ruby/vendor_ruby/nokogiri/html/document.rb:215:in `read_memory': Parser without recover option encountered error or warning: 30:8: ERROR: Opening and ending tag mismatch: font and b (Nokogiri::XML::SyntaxError)
zsh: exit 1     ./check-mirrors.rb --channel ${DIST:?} --allow-multiple --debug --fast

To workaround this, enable the Nokogiri RECOVER parser option, whose documentation reads "Recover from errors": https://nokogiri.org/rdoc/Nokogiri/XML/ParseOptions.html

Merge request reports