Open Source HTML Parsers in Java
10 projectsJava HTML parser: turns dirty/ill-formed HTML into well-formed XML. Browser-like tag balancing; custom tag and rule sets; HTML5 support.
Fast real-time Java HTML parser: extraction and transformation; filters, visitors, custom tags; lexer and nested parser modes.
Java library to parse HTML into a stream of tag objects or a searchable tree. Same project as HTML Parser (htmlparser. sourceforge.
Java bridge to Mozilla's HTML parser: raw or dirty HTML in, Java Document out. JNI to Mozilla classes.
Java library for parsing and modifying HTML; preserves unrecognised markup. Form analysis, extraction; LGPL/EPL/Apache dual-licensed options.
Pure Java HTML DOM parser (HTML 4. 01): fast, tag balancing, optional end tags. Part of binhgiang project (VietSpider extractor tools).