[lxml] xpath versus findall

Tim Arnold Tim.Arnold at sas.com
Tue Aug 9 13:21:12 CDT 2011

I recently ran into some performance problems using xpath on a large file (> 5mb). 
Suppose that
    tree = etree.parse('file.xml')
    xns = {'d':'http://docbook.org/ns/docbook'}

This takes minutes:
    indexterms = tree.xpath('*//d:indexterm', namespaces=xns)

and this takes a second:
    indexterms = tree.findall('*//d:indexterm', namespaces=xns)

I'm trying to extrapolate a rule from this: I suppose I should only use xpath when I actually need the extended capabilities xpath provides, like complex filtering or selection based on attributes for example.  Anytime I have simple selection like that above, I should stick to tree.find or tree.findall.

I've also seen that in some cases xpath with filtering is still slower than using findall and looping over the results to check for attribute values. 

My understanding is that xpath is powerful but expensive, so don't use it unless you have to.

Does anyone understand this better or see it differently?
--Tim Arnold

More information about the lxml mailing list