Fix non-well formatted HTML in order to process it with the XML Collector
Description
Some HTML pages are not well formatted in terms of a strict XML syntax, so in order to use the XML Collector with this kind of pages, we must fix the syntax before process it.
There is a library called jsoup that might help.
Acceptance / Success Criteria
None
Lucidchart Diagrams
Activity
Show:
Alejandro Galue July 11, 2013 at 4:02 PM
Fixed on revision 96211297ed17f317d05f1c640972d55cacbcf7cc for 1.12.
You must define the xml-source with a special parameter called pre-parse-html and assign to it a value of true, in order to process non well formatted XML, for example:
Some HTML pages are not well formatted in terms of a strict XML syntax, so in order to use the XML Collector with this kind of pages, we must fix the syntax before process it.
There is a library called jsoup that might help.