[lxml] lxml.etree.XPathEvalError: Invalid expression for correct XPath expression on large XML file

Holger Joukl Holger.Joukl at LBBW.de
Wed Nov 25 07:42:13 UTC 2015


Hi Dennis,

> I am using Python/lxml to process large (~300MB) XML files and
> extract information with XPath. I stumbled upon a strange error that
> I cannot make any sense of: All XPath expressions using a "where"
> clause (square brackets) fail with the error message
> "lxml.etree.XPathEvalError: Invalid expression" (see stack traces
> below). Happens with different versions, 32-bit and 64-bit, and different
OSs.

> I cannot reproduce this behaviour with small XML files, and I could
> not find any information about this. Has anyone experienced
> something similar? Can anybody determine some useful information
> from the stack trace?

> Many thanks, Dennis
>
> Tested environments:

> Linux
> =====
> >>> xml =  etree.parse('stammdaten.xml')
> >>> xml.xpath('//foo[@id]')
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "lxml.etree.pyx", line 2115, in lxml.etree._ElementTree.xpath
> (src/lxml/lxml.etree.c:57654)
>   File "xpath.pxi", line 370, in
> lxml.etree.XPathDocumentEvaluator.__call__ (src/lxml/lxml.etree.c:146564)
>   File "xpath.pxi", line 238, in
> lxml.etree._XPathEvaluatorBase._handle_result
(src/lxml/lxml.etree.c:144962)
>   File "xpath.pxi", line 224, in
> lxml.etree._XPathEvaluatorBase._raise_eval_error (src/lxml/
> lxml.etree.c:144817)
> lxml.etree.XPathEvalError: Invalid expression

I'm afraid you probably won't get much help unless you can provide
some minimal example to reproduce the error.

I suspect the "small" differ from the "large" XML files in a
way that your XPath predicates (the square brackets parts) won't even get
considered and thus you don't run into a problem there.

E.g. for

>>> etree.XPath('//foo[bar()]')

you won't see a problem with the predicate unless you run on an XML that
actually *has* foo elements:


>>> etree.XPath('//foo[bar()]')(etree.fromstring('<root/>')
... )
[]
>>> etree.XPath('//foo[bar()]')(etree.fromstring('<root><foo/></root>'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "xpath.pxi", line 445, in lxml.etree.XPath.__call__
(src/lxml/lxml.etree.c:153576)
  File "xpath.pxi", line 227, in
lxml.etree._XPathEvaluatorBase._handle_result
(src/lxml/lxml.etree.c:150914)
  File "xpath.pxi", line 212, in
lxml.etree._XPathEvaluatorBase._raise_eval_error
(src/lxml/lxml.etree.c:150713)
lxml.etree.XPathEvalError: Unregistered function
>>>

Best regards
Holger

Landesbank Baden-Wuerttemberg
Anstalt des oeffentlichen Rechts
Hauptsitze: Stuttgart, Karlsruhe, Mannheim, Mainz
HRA 12704
Amtsgericht Stuttgart




More information about the lxml mailing list