selectolax.lexbor module¶
LexborHTMLParser¶
- class selectolax.lexbor.LexborHTMLParser(html: str | bytes, is_fragment: bool = False, fragment_tag: str = 'div', fragment_namespace: str = 'html')¶
The lexbor HTML parser.
Use this class to parse raw HTML.
This parser mimics most of the stuff from
HTMLParserbut not inherits it directly.- Parameters:
- htmlstr (unicode) or bytes
- any_css_matches(self, tuple selectors)¶
Return
Trueif any of the specified CSS selectors match.- Parameters:
- selectorstuple[str]
CSS selectors to evaluate.
- Returns:
- bool
Truewhen at least one selector matches.
- body¶
Return document body.
- Returns:
- LexborNode or None
<body>element when present, otherwiseNone.
- clone(self)¶
Clone the current document tree.
You can use to do temporary modifications without affecting the original HTML tree. It is tied to the current parser instance. Gets destroyed when the parser instance is destroyed.
- Returns:
- LexborHTMLParser
A parser instance backed by a deep-copied document.
- create_node(self, str tag)¶
Given an HTML tag name, e.g. “div”, create a single empty node for that tag, e.g. “<div></div>”.
- Parameters:
- tagstr
Name of the tag to create.
- Returns:
- LexborNode
Newly created element node.
- Raises
- SelectolaxError
If the element cannot be created.
Examples
>>> parser = LexborHTMLParser("<div></div>") >>> new_node = parser.create_node("span") >>> new_node.tag_name 'span' >>> parser.css_first("div").append_child(new_node) >>> parser.html '<html><head></head><body><div><span></span></div></body></html>'
- css(self, str query)¶
A CSS selector.
Matches pattern query against HTML tree. CSS selectors reference.
Special selectors:
parser.css(‘p:lexbor-contains(“awesome” i)’) – case-insensitive contains
parser.css(‘p:lexbor-contains(“awesome”)’) – case-sensitive contains
- Parameters:
- querystr
CSS selector (e.g. “div > :nth-child(2n+1):not(:has(a))”).
- Returns:
- selectorlist of Node objects
- css_first(self, str query, default=None, strict=False)¶
Same as css but returns only the first match.
- Parameters:
- querystr
- defaultAny, default None
Default value to return if there is no match.
- strict: bool, default False
Set to True if you want to check if there is strictly only one match in the document.
- Returns:
- selectorLexborNode object
- css_matches(self, str selector)¶
Return
Trueif the document matches the selector at least once.- Parameters:
- selectorstr
CSS selector to test.
- Returns:
- bool
Truewhen a match exists.
- head¶
Return document head.
- Returns:
- LexborNode or None
<head>element when present, otherwiseNone.
- html¶
Return HTML representation of the page.
- Returns:
- str or None
Serialized HTML of the current document.
- html_pretty(self, Py_ssize_t indent=0, bool skip_ws_nodes=False, bool skip_comment=False, bool raw=False, bool without_closing=False, bool tag_with_ns=False, bool without_text_indent=False, bool full_doctype=False, bool html5test=False)¶
Return pretty-printed HTML representation of the page.
- Parameters:
- indentint, optional
Initial indentation level passed to Lexbor. Defaults to
0.- skip_ws_nodesbool, optional
Skip text nodes that contain only whitespace.
- skip_commentbool, optional
Exclude HTML comment nodes from the serialized output.
- rawbool, optional
Serialize text and attribute values without HTML escaping.
- without_closingbool, optional
Omit closing tags for non-void elements.
- tag_with_nsbool, optional
Include namespace prefixes in serialized tag names when available.
- without_text_indentbool, optional
Disable extra indentation added around text and comment content.
- full_doctypebool, optional
Serialize the full document type declaration when a doctype node is present.
- html5testbool, optional
Serialize using Lexbor’s HTML5 test formatting mode.
- inner_html¶
LexborHTMLParser.inner_html: str
Return HTML representation of the child nodes.
Works similar to innerHTML in JavaScript. Unlike the .html property, does not include the current node. Can be used to set HTML as well. See the setter docstring.
- Returns:
- textstr | None
- inner_html_pretty(self, Py_ssize_t indent=0, bool skip_ws_nodes=False, bool skip_comment=False, bool raw=False, bool without_closing=False, bool tag_with_ns=False, bool without_text_indent=False, bool full_doctype=False, bool html5test=False)¶
Return pretty-printed HTML representation of the child nodes.
- Parameters:
- indentint, optional
Initial indentation level passed to Lexbor. Defaults to
0.- skip_ws_nodesbool, optional
Skip text nodes that contain only whitespace.
- skip_commentbool, optional
Exclude HTML comment nodes from the serialized output.
- rawbool, optional
Serialize text and attribute values without HTML escaping.
- without_closingbool, optional
Omit closing tags for non-void elements.
- tag_with_nsbool, optional
Include namespace prefixes in serialized tag names when available.
- without_text_indentbool, optional
Disable extra indentation added around text and comment content.
- full_doctypebool, optional
Serialize the full document type declaration when a doctype node is present.
- html5testbool, optional
Serialize using Lexbor’s HTML5 test formatting mode.
- merge_text_nodes(self)¶
Iterates over all text nodes and merges all text nodes that are close to each other.
This is useful for text extraction. Use it when you need to strip HTML tags and merge “dangling” text.
- Returns:
- None
Examples
>>> tree = LexborHTMLParser("<div><p><strong>J</strong>ohn</p><p>Doe</p></div>") >>> node = tree.css_first('div') >>> tree.unwrap_tags(["strong"]) >>> tree.text(deep=True, separator=" ", strip=True) "J ohn Doe" # Text extraction produces an extra space because the strong tag was removed. >>> node.merge_text_nodes() >>> tree.text(deep=True, separator=" ", strip=True) "John Doe"
- raw_html¶
raw_html: bytes
- root¶
Return the document root node.
- Returns:
- LexborNode or None
Root of the parsed document, or
Noneif unavailable.
- script_srcs_contain(self, tuple queries)¶
Return
Trueif any scriptsrccontains one of the strings.Caches values on the first call to improve performance.
- Parameters:
- queriestuple of str
Strings to look for inside
srcattributes.
- Returns:
- bool
Truewhen a matching source value is found.
- scripts_contain(self, str query)¶
Return
Trueif any script tag contains the given text.Caches script tags on the first call to improve performance.
- Parameters:
- querystr
Text to search for within script contents.
- Returns:
- bool
Truewhen a matching script tag is found.
- select(self, query=None)¶
Select nodes given a CSS selector.
Works similarly to the
cssmethod, but supports chained filtering and extra features.- Parameters:
- querystr or None
The CSS selector to use when searching for nodes.
- Returns:
- LexborSelector or None
Selector bound to the root node, or
Noneif the document is empty.
- selector¶
Return a lazily created CSS selector helper.
- Returns:
- LexborCSSSelector
Selector instance bound to this parser.
- strip_tags(self, list tags, bool recursive=False)¶
Remove specified tags from the node.
- Parameters:
- tagslist of str
List of tags to remove.
- recursivebool, default False
Whenever to delete all its child nodes
- Returns:
- None
Examples
>>> tree = LexborHTMLParser('<html><head></head><body><script></script><div>Hello world!</div></body></html>') >>> tags = ['head', 'style', 'script', 'xmp', 'iframe', 'noembed', 'noframes'] >>> tree.strip_tags(tags) >>> tree.html '<html><body><div>Hello world!</div></body></html>'
- tags(self, str name)¶
Return all tags that match the provided name.
- Parameters:
- namestr
Tag name to search for (e.g.,
"div").
- Returns:
- list of LexborNode
Matching elements in document order.
- Raises:
- ValueError
If
nameis empty or longer than 100 characters.- SelectolaxError
If Lexbor cannot locate the elements.
- text(self, deep: bool = True, separator: str = '', strip: bool = False, skip_empty: bool = False) str¶
Returns the text of the node including text of all its child nodes.
- Parameters:
- stripbool, default False
If true, calls
str.strip()on each text part to remove extra white spaces.- separatorstr, default ‘’
The separator to use when joining text from different nodes.
- deepbool, default True
If True, includes text from all child nodes.
- skip_emptybool, optional
Exclude text nodes whose content is only ASCII whitespace (space, tab, newline, form feed or carriage return) when
True. Defaults toFalse.
- Returns:
- textstr
Combined textual content assembled according to the provided options.
- unwrap_tags(self, list tags, delete_empty=False)¶
Unwraps specified tags from the HTML tree.
Works the same as the
unwrapmethod, but applied to a list of tags.- Parameters:
- tagslist
List of tags to remove.
- delete_emptybool
Whenever to delete empty tags.
- Returns:
- None
Examples
>>> tree = LexborHTMLParser("<div><a href="">Hello</a> <i>world</i>!</div>") >>> tree.body.unwrap_tags(['i','a']) >>> tree.body.html '<body><div>Hello world!</div></body>'
LexborNode¶
- class selectolax.lexbor.LexborNode¶
A class that represents HTML node (element).
- any_css_matches(self, tuple selectors)¶
Returns True if any of CSS selectors matches a node
- attributes¶
Get all attributes that belong to the current node.
The value of empty attributes is None.
- Returns:
- attributesdictionary of all attributes.
Examples
>>> tree = LexborHTMLParser("<div data id='my_id'></div>") >>> node = tree.css_first('div') >>> node.attributes {'data': None, 'id': 'my_id'}
- attrs¶
A dict-like object that is similar to the
attributesproperty, but operates directly on the Node data.Warning
Use
attributesinstead, if you don’t want to modify Node attributes.- Returns:
- attributesAttributes mapping object.
Examples
>>> tree = LexborHTMLParser("<div id='a'></div>") >>> node = tree.css_first('div') >>> node.attrs <div attributes, 1 items> >>> node.attrs['id'] 'a' >>> node.attrs['foo'] = 'bar' >>> del node.attrs['id'] >>> node.attributes {'foo': 'bar'} >>> node.attrs['id'] = 'new_id' >>> node.html '<div foo="bar" id="new_id"></div>'
- child¶
Alias for the first_child property.
Deprecated. Please use first_child instead.
- clone(self) LexborNode¶
Clone the current node.
You can use to do temporary modifications without affecting the original HTML tree.
It is tied to the current parser instance. Gets destroyed when parser instance is destroyed.
- comment_content¶
LexborNode.comment_content: str | None
Extract the textual content of an HTML comment node.
- Returns:
- str or None
Comment text with surrounding whitespace removed, or
Noneif the current node is not a comment or the comment markup cannot be parsed.
Examples
>>> parse_fragment("<!-- hello -->")[0].comment_content 'hello' >>> parse_fragment("<div>not a comment</div>")[0].comment_content is None True
- css(self, str query)¶
Evaluate CSS selector against current node and its child nodes.
Matches pattern query against HTML tree. CSS selectors reference.
Special selectors:
parser.css(‘p:lexbor-contains(“awesome” i)’) – case-insensitive contains
parser.css(‘p:lexbor-contains(“awesome”)’) – case-sensitive contains
- Parameters:
- querystr
CSS selector (e.g. “div > :nth-child(2n+1):not(:has(a))”).
- Returns:
- selectorlist of Node objects
- css_first(self, str query, default=None, bool strict=False)¶
Same as css but returns only the first match.
When strict=False stops at the first match. Works faster.
- Parameters:
- querystr
- defaultAny, default None
Default value to return if there is no match.
- strict: bool, default False
Set to True if you want to check if there is strictly only one match in the document.
- Returns:
- selectorLexborNode object
- css_matches(self, str selector)¶
Returns True if CSS selector matches a node.
- decompose(self, bool recursive=True)¶
Remove the current node from the tree.
- Parameters:
- recursivebool, default True
Whenever to delete all its child nodes
Examples
>>> tree = LexborHTMLParser(html) >>> for tag in tree.css('script'): >>> tag.decompose()
- first_child¶
Return the first child node.
- html¶
Return HTML representation of the current node including all its child nodes.
- Returns:
- textstr
- html_pretty(self, Py_ssize_t indent=0, bool skip_ws_nodes=False, bool skip_comment=False, bool raw=False, bool without_closing=False, bool tag_with_ns=False, bool without_text_indent=False, bool full_doctype=False, bool html5test=False)¶
Return pretty-printed HTML for the current node.
- Parameters:
- indentint, optional
Initial indentation level passed to Lexbor. Defaults to
0.- skip_ws_nodesbool, optional
Skip text nodes that contain only whitespace.
- skip_commentbool, optional
Exclude HTML comment nodes from the serialized output.
- rawbool, optional
Serialize text and attribute values without HTML escaping.
- without_closingbool, optional
Omit closing tags for non-void elements.
- tag_with_nsbool, optional
Include namespace prefixes in serialized tag names when available.
- without_text_indentbool, optional
Disable extra indentation added around text and comment content.
- full_doctypebool, optional
Serialize the full document type declaration when a doctype node is present.
- html5testbool, optional
Serialize using Lexbor’s HTML5 test formatting mode.
- id¶
Get the id attribute of the node.
Returns None if id does not set.
- Returns:
- textstr
- inner_html¶
LexborNode.inner_html: str | None
Return HTML representation of the child nodes.
Works similar to innerHTML in JavaScript. Unlike the .html property, does not include the current node. Can be used to set HTML as well. See the setter docstring.
- Returns:
- textstr | None
- inner_html_pretty(self, Py_ssize_t indent=0, bool skip_ws_nodes=False, bool skip_comment=False, bool raw=False, bool without_closing=False, bool tag_with_ns=False, bool without_text_indent=False, bool full_doctype=False, bool html5test=False)¶
Return pretty-printed HTML representation of the child nodes.
- Parameters:
- indentint, optional
Initial indentation level passed to Lexbor. Defaults to
0.- skip_ws_nodesbool, optional
Skip text nodes that contain only whitespace.
- skip_commentbool, optional
Exclude HTML comment nodes from the serialized output.
- rawbool, optional
Serialize text and attribute values without HTML escaping.
- without_closingbool, optional
Omit closing tags for non-void elements.
- tag_with_nsbool, optional
Include namespace prefixes in serialized tag names when available.
- without_text_indentbool, optional
Disable extra indentation added around text and comment content.
- full_doctypebool, optional
Serialize the full document type declaration when a doctype node is present.
- html5testbool, optional
Serialize using Lexbor’s HTML5 test formatting mode.
- insert_after(signatures, args, kwargs, defaults, _fused_sigindex={})¶
Insert a node after the current Node.
- Parameters:
- valuestr, bytes or Node
The text or Node instance to insert after the Node. When a text string is passed, it’s treated as text. All HTML tags will be escaped. Convert and pass the
Nodeobject when you want to work with HTML. Does not clone theNodeobject. All future changes to the passedNodeobject will also be taken into account.
Examples
>>> tree = LexborHTMLParser('<div>Get <img src="" alt="Laptop"></div>') >>> img = tree.css_first('img') >>> img.insert_after(img.attributes.get('alt', '')) >>> tree.body.child.html '<div>Get <img src="" alt="Laptop">Laptop</div>'
>>> html_parser = LexborHTMLParser('<div>Get <span alt="Laptop"><img src="/jpg"> <div></div></span></div>') >>> html_parser2 = LexborHTMLParser('<div>Test</div>') >>> img_node = html_parser.css_first('img') >>> img_node.insert_after(html_parser2.body.child) <div>Get <span alt="Laptop"><img src="/jpg"><div>Test</div> <div></div></span></div>'
- insert_before(signatures, args, kwargs, defaults, _fused_sigindex={})¶
Insert a node before the current Node.
- Parameters:
- valuestr, bytes or Node
The text or Node instance to insert before the Node. When a text string is passed, it’s treated as text. All HTML tags will be escaped. Convert and pass the
Nodeobject when you want to work with HTML. Does not clone theNodeobject. All future changes to the passedNodeobject will also be taken into account.
Examples
>>> tree = LexborHTMLParser('<div>Get <img src="" alt="Laptop"></div>') >>> img = tree.css_first('img') >>> img.insert_before(img.attributes.get('alt', '')) >>> tree.body.child.html '<div>Get Laptop<img src="" alt="Laptop"></div>'
>>> html_parser = LexborHTMLParser('<div>Get <span alt="Laptop"><img src="/jpg"> <div></div></span></div>') >>> html_parser2 = LexborHTMLParser('<div>Test</div>') >>> img_node = html_parser.css_first('img') >>> img_node.insert_before(html_parser2.body.child) <div>Get <span alt="Laptop"><div>Test</div><img src="/jpg"> <div></div></span></div>'
- insert_child(signatures, args, kwargs, defaults, _fused_sigindex={})¶
Insert a node inside (at the end of) the current Node.
- Parameters:
- valuestr, bytes or Node
The text or Node instance to insert inside the Node. When a text string is passed, it’s treated as text. All HTML tags will be escaped. Convert and pass the
Nodeobject when you want to work with HTML. Does not clone theNodeobject. All future changes to the passedNodeobject will also be taken into account.
Examples
>>> tree = LexborHTMLParser('<div>Get <img src=""></div>') >>> div = tree.css_first('div') >>> div.insert_child('Laptop') >>> tree.body.child.html '<div>Get <img src="">Laptop</div>'
>>> html_parser = LexborHTMLParser('<div>Get <span alt="Laptop"> <div>Laptop</div> </span></div>') >>> html_parser2 = LexborHTMLParser('<div>Test</div>') >>> span_node = html_parser.css_first('span') >>> span_node.insert_child(html_parser2.body.child) <div>Get <span alt="Laptop"> <div>Laptop</div> <div>Test</div> </span></div>'
- is_comment_node¶
LexborNode.is_comment_node: bool
Return True if the node represents a comment node.
- is_document_node¶
LexborNode.is_document_node: bool
Return True if the node represents a document node.
- is_element_node¶
LexborNode.is_element_node: bool
Return True if the node represents an element node.
- is_empty_text_node¶
LexborNode.is_empty_text_node: bool
Check whether the current node is an empty text node.
- Returns:
- bool
Truewhen the node is a text node whose character data consists only of ASCII whitespace characters (space, tab, newline, form feed or carriage return).
- is_text_node¶
LexborNode.is_text_node: bool
Return True if the node represents a text node.
- iter(self, bool include_text=False, bool skip_empty=False)¶
Iterate over direct children of this node.
- Parameters:
- include_textbool, optional
When
True, yield text nodes in addition to element nodes. Defaults toFalse.- skip_emptybool, optional
When
include_textisTrue, ignore text nodes made up solely of ASCII whitespace (space, tab, newline, form feed or carriage return). Defaults toFalse.
- Yields:
- LexborNode
Child nodes on the same tree level as this node, filtered according to the provided options.
- last_child¶
Return last child node.
- merge_text_nodes(self)¶
Iterates over all text nodes and merges all text nodes that are close to each other.
This is useful for text extraction. Use it when you need to strip HTML tags and merge “dangling” text.
Examples
>>> tree = LexborHTMLParser("<div><p><strong>J</strong>ohn</p><p>Doe</p></div>") >>> node = tree.css_first('div') >>> tree.unwrap_tags(["strong"]) >>> tree.text(deep=True, separator=" ", strip=True) "J ohn Doe" # Text extraction produces an extra space because the strong tag was removed. >>> node.merge_text_nodes() >>> tree.text(deep=True, separator=" ", strip=True) "John Doe"
- next¶
Return next node.
- parent¶
Return the parent node.
- parser¶
parser: selectolax.lexbor.LexborHTMLParser
- prev¶
Return previous node.
- raw_value¶
Return the raw (unparsed, original) value of a node.
Currently, works on text nodes only.
- Returns:
- raw_valuebytes
Examples
>>> html_parser = LexborHTMLParser('<div><test></div>') >>> selector = html_parser.css_first('div') >>> selector.child.html '<test>' >>> selector.child.raw_value b'<test>'
- remove(self, bool recursive=True)¶
An alias for the decompose method.
- replace_with(signatures, args, kwargs, defaults, _fused_sigindex={})¶
Replace current Node with specified value.
- Parameters:
- valuestr, bytes or Node
The text or Node instance to replace the Node with. When a text string is passed, it’s treated as text. All HTML tags will be escaped. Convert and pass the
Nodeobject when you want to work with HTML. Does not clone theNodeobject. All future changes to the passedNodeobject will also be taken into account.
Examples
>>> tree = LexborHTMLParser('<div>Get <img src="" alt="Laptop"></div>') >>> img = tree.css_first('img') >>> img.replace_with(img.attributes.get('alt', '')) >>> tree.body.child.html '<div>Get Laptop</div>'
>>> html_parser = LexborHTMLParser('<div>Get <span alt="Laptop"><img src="/jpg"> <div></div></span></div>') >>> html_parser2 = LexborHTMLParser('<div>Test</div>') >>> img_node = html_parser.css_first('img') >>> img_node.replace_with(html_parser2.body.child) '<div>Get <span alt="Laptop"><div>Test</div> <div></div></span></div>'
- script_srcs_contain(self, tuple queries)¶
Returns True if any of the script SRCs attributes contain on of the specified text.
Caches values on the first call to improve performance.
- Parameters:
- queriestuple of str
- scripts_contain(self, str query)¶
Returns True if any of the script tags contain specified text.
Caches script tags on the first call to improve performance.
- Parameters:
- querystr
The query to check.
- select(self, query=None)¶
Select nodes given a CSS selector.
Works similarly to the the
cssmethod, but supports chained filtering and extra features.- Parameters:
- querystr or None
The CSS selector to use when searching for nodes.
- Returns:
- selectorThe Selector class.
- strip_tags(self, list tags, bool recursive=False)¶
Remove specified tags from the HTML tree.
- Parameters:
- tagslist
List of tags to remove.
- recursivebool, default True
Whenever to delete all its child nodes
Examples
>>> tree = LexborHTMLParser('<html><head></head><body><script></script><div>Hello world!</div></body></html>') >>> tags = ['head', 'style', 'script', 'xmp', 'iframe', 'noembed', 'noframes'] >>> tree.strip_tags(tags) >>> tree.html '<html><body><div>Hello world!</div></body></html>'
- tag¶
Return the name of the current tag (e.g. div, p, img).
For for non-tag nodes, returns the following names:
-text - text node
-document - document node
-comment - comment node
This
- Returns:
- textstr
- text(self, bool deep=True, str separator='', bool strip=False, bool skip_empty=False)¶
Return concatenated text from this node.
- Parameters:
- deepbool, optional
When
True(default), include text from all descendant nodes; whenFalse, only include direct children.- separatorstr, optional
String inserted between successive text fragments.
- stripbool, optional
If
True, applystr.strip()to each fragment before joining to remove surrounding whitespace. Defaults toFalse.- skip_emptybool, optional
Exclude text nodes whose content is only ASCII whitespace (space, tab, newline, form feed or carriage return) when
True. Defaults toFalse.
- Returns:
- textstr
Combined textual content assembled according to the provided options.
- text_content¶
Returns the text of the node if it is a text node.
Returns None for other nodes. Unlike the
textmethod, does not include child nodes.- Returns:
- textstr or None.
- text_lexbor(self)¶
Returns the text of the node including text of all its child nodes.
Uses builtin method from lexbor.
- traverse(self, bool include_text=False, bool skip_empty=False)¶
Depth-first traversal starting at the current node.
- Parameters:
- include_textbool, optional
When
True, include text nodes in the traversal sequence. Defaults toFalse.- skip_emptybool, optional
Skip text nodes that contain only ASCII whitespace (space, tab, newline, form feed or carriage return) when
include_textisTrue. Defaults toFalse.
- Yields:
- LexborNode
Nodes encountered in depth-first order beginning with the current node, filtered according to the provided options.
- unwrap(self, bool delete_empty=False)¶
Replace node with whatever is inside this node.
Does nothing if you perform unwrapping second time on the same node.
- Parameters:
- delete_emptybool, default False
If True, removes empty tags.
Examples
>>> tree = LexborHTMLParser("<div>Hello <i>world</i>!</div>") >>> tree.css_first('i').unwrap() >>> tree.html '<html><head></head><body><div>Hello world!</div></body></html>'
Note: by default, empty tags are ignored, use “delete_empty” to change this.
- unwrap_tags(self, list tags, bool delete_empty=False)¶
Unwraps specified tags from the HTML tree.
Works the same as the
unwrapmethod, but applied to a list of tags.- Parameters:
- tagslist
List of tags to remove.
- delete_emptybool, default False
If True, removes empty tags.
Examples
>>> tree = LexborHTMLParser("<div><a href="">Hello</a> <i>world</i>!</div>") >>> tree.body.unwrap_tags(['i','a']) >>> tree.body.html '<body><div>Hello world!</div></body>'
Note: by default, empty tags are ignored, use “delete_empty” to change this.
Selector¶
- class selectolax.lexbor.LexborSelector(LexborNode node, query)¶
An advanced CSS selector that supports additional operations.
Think of it as a toolkit that mimics some of the features of XPath.
Please note, this is an experimental feature that can change in the future.
- any_attribute_longer_than(self, str attribute, int length, str start=None) bool¶
Returns True any href attribute longer than a specified length.
Similar to string-length in XPath.
- any_matches¶
LexborSelector.any_matches: bool
Returns True if there are any matches
- any_text_contains(self, str text, bool deep=True, str separator='', bool strip=False) bool¶
Returns True if any node in the current search scope contains specified text
- attribute_longer_than(self, str attribute, int length, str start=None) LexborSelector¶
Filter all current matches by attribute length.
Similar to string-length in XPath.
- css(self, str query)¶
Evaluate CSS selector against current scope.
- matches¶
LexborSelector.matches: list
Returns all possible matches
- text_contains(self, str text, bool deep=True, str separator='', bool strip=False) LexborSelector¶
Filter all current matches given text.