[lxml-dev] lxml 2.3 final released

Stefan Behnel stefan_ml at behnel.de
Sun Feb 6 14:02:52 CST 2011

Hi everyone,

I'm happy to announce the long awaited release of lxml 2.3 final. It is the 
first officially stable release of the 2.3 release series, which officially 
supports Python 3.1.2 and 3.2 (previous support in 2.2.x should be 
considered accidental).



Binary builds are expected to become available in the near future.

This release was built using Cython 0.14.1. It is recommended (although not 
required) to use at least libxml2 2.7.8 with lxml, which fixes a number of 
important bugs compared to the previous 2.7.x releases.

Updating from the 2.2 series is recommended. It should be relatively easy 
and will be rewarded by a higher resistance to potential crashes. I'm 
posting the complete 2.3 changelog below for your convenience. Please note 
that the 2.2 series will only receive critical bug fixes in the future that 
have not been superseded by changes in the 2.3 series.

If you are interested in commercial support or customisations for the lxml 
package, please contact me directly.

Have fun,


2.3 (2011-02-06)

Features added

* When looking for children, ``lxml.objectify`` takes '{}tag' as
   meaning an empty namespace, as opposed to the parent namespace.

Bugs fixed

* When finished reading from a file-like object, the parser
   immediately calls its ``.close()`` method.

* When finished parsing, ``iterparse()`` immediately closes the input

* Work-around for libxml2 bug that can leave the HTML parser in a
   non-functional state after parsing a severly broken document (fixed
   in libxml2 2.7.8).

* ``marque`` tag in HTML cleanup code is correctly named ``marquee``.

Other changes

* Some public functions in the Cython-level C-API have more explicit
   return types.

2.3beta1 (2010-09-06)

Features added

Bugs fixed

* Crash in newer libxml2 versions when moving elements between
   documents that had attributes on replaced XInclude nodes.

* ``XMLID()`` function was missing the optional ``parser`` and
   ``base_url`` parameters.

* Searching for wildcard tags in ``iterparse()`` was broken in Py3.

* ``lxml.html.open_in_browser()`` didn't work in Python 3 due to the
   use of os.tempnam.  It now takes an optional 'encoding' parameter.

Other changes

2.3alpha2 (2010-07-24)

Features added

Bugs fixed

* Crash in XSLT when generating text-only result documents with a
   stylesheet created in a different thread.

Other changes

* ``repr()`` of Element objects shows the hex ID with leading 0x
   (following ElementTree 1.3).

2.3alpha1 (2010-06-19)

Features added

* Keyword argument ``namespaces`` in ``lxml.cssselect.CSSSelector()``
   to pass a prefix-to-namespace mapping for the selector.

* New function ``lxml.etree.register_namespace(prefix, uri)`` that
   globally registers a namespace prefix for a namespace that newly
   created Elements in that namespace will use automatically.  Follows
   ElementTree 1.3.

* Support 'unicode' string name as encoding parameter in
   ``tostring()``, following ElementTree 1.3.

* Support 'c14n' serialisation method in ``ElementTree.write()`` and
   ``tostring()``, following ElementTree 1.3.

* The ElementPath expression syntax (``el.find*()``) was extended to
   match the upcoming ElementTree 1.3 that will ship in the standard
   library of Python 3.2/2.7.  This includes extended support for
   predicates as well as namespace prefixes (as known from XPath).

* During regular XPath evaluation, various ESXLT functions are
   available within their namespace when using libxslt 1.1.26 or later.

* Support passing a readily configured logger instance into
   ``PyErrorLog``, instead of a logger name.

* On serialisation, the new ``doctype`` parameter can be used to
   override the DOCTYPE (internal subset) of the document.

* New parameter ``output_parent`` to ``XSLTExtension.apply_templates()``
   to append the resulting content directly to an output element.

* ``XSLTExtension.process_children()`` to process the content of the
   XSLT extension element itself.

* ISO-Schematron support based on the de-facto Schematron reference
   'skeleton implementation'.

* XSLT objects now take XPath object as ``__call__`` stylesheet

* Enable path caching in ElementPath (``el.find*()``) to avoid parsing

* Setting the value of a namespaced attribute always uses a prefixed
   namespace instead of the default namespace even if both declare the
   same namespace URI.  This avoids serialisation problems when an
   attribute from a default namespace is set on an element from a
   different namespace.

* XSLT extension elements: support for XSLT context nodes other than
   elements: document root, comments, processing instructions.

* Support for strings (in addition to Elements) in node-sets returned
   by extension functions.

* Forms that lack an ``action`` attribute default to the base URL of
   the document on submit.

* XPath attribute result strings have an ``attrname`` property.

* Namespace URIs get validated against RFC 3986 at the API level
   (required by the XML namespace specification).

* Target parsers show their target object in the ``.target`` property
   (compatible with ElementTree).

Bugs fixed

* API is hardened against invalid proxy instances to prevent crashes
   due to incorrectly instantiated Element instances.

* Prevent crash when instantiating ``CommentBase`` and friends.

* Export ElementTree compatible XML parser class as
   ``XMLTreeBuilder``, as it is called in ET 1.2.

* ObjectifiedDataElements in lxml.objectify were not hashable.  They
   now use the hash value of the underlying Python value (string,
   number, etc.) to which they compare equal.

* Parsing broken fragments in lxml.html could fail if the fragment
   contained an orphaned closing '</div>' tag.

* Using XSLT extension elements around the root of the output document

* ``lxml.cssselect`` did not distinguish between ``x[attr="val"]`` and
   ``x [attr="val"]`` (with a space).  The latter now matches the
   attribute independent of the element.

* Rewriting multiple links inside of HTML text content could end up
   replacing unrelated content as replacements could impact the
   reported position of subsequent matches.  Modifications are now
   simplified by letting the ``iterlinks()`` generator in ``lxml.html``
   return links in reversed order if they appear inside the same text
   node.  Thus, replacements and link-internal modifications no longer
   change the position of links reported afterwards.

* The ``.value`` attribute of ``textarea`` elements in lxml.html did
   not represent the complete raw value (including child tags etc.). It
   now serialises the complete content on read and replaces the
   complete content by a string on write.

* Target parser didn't call ``.close()`` on the target object if
   parsing failed.  Now it is guaranteed that ``.close()`` will be
   called after parsing, regardless of the outcome.

Other changes

* Official support for Python 3.1.2 and later.

* Static MS Windows builds can now download their dependencies

* ``Element.attrib`` no longer uses a cyclic reference back to its
   Element object.  It therefore no longer requires the garbage
   collector to clean up.

* Static builds include libiconv, in addition to libxml2 and libxslt.

More information about the lxml mailing list