[lxml-dev] UTF-8 not supported

Sergio Monteiro Basto sergio at sergiomb.no-ip.org
Mon May 25 11:11:45 CDT 2009


Hi, 
I use to set the enconding to parser like this:

hparser = etree.HTMLParser(encoding='utf-8', remove_comments=True)
etree_document = etree.HTML(f, parser=hparser)


On Mon, 2009-05-25 at 20:01 +0400, Alexander Shigin wrote:
> В Пнд, 25/05/2009 в 10:54 -0500, Ovnicraft пишет:
> > Hi folks, when do this, encoding='iso-8859-1' write xml enconding ok,
> > but when try the same thing with 'UTF-8', not appears in my file, i
> > have 2.2 version.
> > How i can encoding my file with utf-8?
> 
> Can you give your code snippet? UTF-8 works fine for me.
> 
> Here is an example:
> In [6]: print etree.tostring(etree.Element(unicode('ыъъ', 'utf-8')), encoding='utf-8')
> <ыъъ/>
> 
> If you haven't got cyrillic letters:
> In [9]: etree.tostring(etree.Element(u'\u044b\u044a\u044a'), encoding='utf-8')
> Out[9]: '<\xd1\x8b\xd1\x8a\xd1\x8a/>'
> 
> 
> 
> _______________________________________________
> lxml-dev mailing list
> lxml-dev at codespeak.net
> http://codespeak.net/mailman/listinfo/lxml-dev
-- 
Sérgio M. B.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 2192 bytes
Desc: not available
Url : http://mailman-mail5.webfaction.com/pipermail/lxml/attachments/20090525/596f8e46/attachment.bin 


More information about the lxml mailing list