HTMLDecoder
¶Decoder for HTML documents.
Constructor:
__init__
()¶Initialize and reset this instance.
Methods:
decode_structured
(text, location)get_image
(filename)handle_charref
(name)handle_data
(data)handle_endtag
(tag)handle_entityref
(name)handle_starttag
(tag, case_attrs)prepare_for_data
()
Attributes:
CDATA_CONTENT_ELEMENTS
default_style
Default style attributes for unstyled text in the HTML document. entitydefs
font_sizes
Map HTML font sizes to actual font sizes, in points.
HTMLDecoder.
decode_structured
(text, location)¶HTMLDecoder.
get_image
(filename)¶HTMLDecoder.
handle_charref
(name)¶HTMLDecoder.
handle_data
(data)¶HTMLDecoder.
handle_endtag
(tag)¶HTMLDecoder.
handle_entityref
(name)¶HTMLDecoder.
handle_starttag
(tag, case_attrs)¶HTMLDecoder.
prepare_for_data
()¶HTMLDecoder.
default_style
= {'margin_bottom': '12pt', 'font_size': 12, 'font_name': 'Times New Roman'}¶Default style attributes for unstyled text in the HTML document.
Type: | dict |
---|
HTMLDecoder.
font_sizes
= {1: 8, 2: 10, 3: 12, 4: 14, 5: 18, 6: 24, 7: 48}¶Map HTML font sizes to actual font sizes, in points.
Type: | dict |
---|
Methods
HTMLDecoder.
add_element
(element)
HTMLDecoder.
add_text
(text)
HTMLDecoder.
check_for_whole_start_tag
(i)
HTMLDecoder.
clear_cdata_mode
()
HTMLDecoder.
close
()Handle any buffered data.
HTMLDecoder.
decode
(text, location=None)
HTMLDecoder.
error
(message)
HTMLDecoder.
feed
(data)Feed data to the parser.
Call this as often as you want, with as little or as much text as you want (may include ‘n’).
HTMLDecoder.
get_starttag_text
()Return full source of start tag: ‘<...>’.
HTMLDecoder.
getpos
()Return current line number and offset.
HTMLDecoder.
goahead
(end)
HTMLDecoder.
handle_comment
(data)
HTMLDecoder.
handle_decl
(decl)
HTMLDecoder.
handle_pi
(data)
HTMLDecoder.
handle_startendtag
(tag, attrs)
HTMLDecoder.
parse_bogus_comment
(i, report=1)
HTMLDecoder.
parse_comment
(i, report=1)
HTMLDecoder.
parse_declaration
(i)
HTMLDecoder.
parse_endtag
(i)
HTMLDecoder.
parse_html_declaration
(i)
HTMLDecoder.
parse_marked_section
(i, report=1)
HTMLDecoder.
parse_pi
(i)
HTMLDecoder.
parse_starttag
(i)
HTMLDecoder.
pop_style
(key)
HTMLDecoder.
push_style
(key, styles)
HTMLDecoder.
reset
()Reset this instance. Loses all unprocessed data.
HTMLDecoder.
set_cdata_mode
(elem)
HTMLDecoder.
unescape
(s)
HTMLDecoder.
unknown_decl
(data)
HTMLDecoder.
updatepos
(i, j)Attributes
HTMLDecoder.
CDATA_CONTENT_ELEMENTS
= ('script', 'style')
HTMLDecoder.
entitydefs
= None