<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:syn="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/">
  <channel rdf:about="http://blog.gmane.org/gmane.comp.python.lxml.devel">
    <title>gmane.comp.python.lxml.devel</title>
    <link>http://blog.gmane.org/gmane.comp.python.lxml.devel</link>
    <description/>
    <syn:updatePeriod>hourly</syn:updatePeriod>
    <syn:updateFrequency>1</syn:updateFrequency>
    <syn:updateBase>1901-01-01T00:00+00:00</syn:updateBase>
    <items>
      <rdf:Seq>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.python.lxml.devel/6411"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.python.lxml.devel/6410"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.python.lxml.devel/6409"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.python.lxml.devel/6406"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.python.lxml.devel/6405"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.python.lxml.devel/6404"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.python.lxml.devel/6401"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.python.lxml.devel/6398"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.python.lxml.devel/6396"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.python.lxml.devel/6392"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.python.lxml.devel/6391"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.python.lxml.devel/6390"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.python.lxml.devel/6387"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.python.lxml.devel/6386"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.python.lxml.devel/6385"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.python.lxml.devel/6381"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.python.lxml.devel/6380"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.python.lxml.devel/6377"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.python.lxml.devel/6371"/>
        <rdf:li rdf:resource="http://comments.gmane.org/gmane.comp.python.lxml.devel/6369"/>
      </rdf:Seq>
    </items>
    <image rdf:resource="http://gmane.org/img/gmane-25t.png"/>
    <textinput rdf:resource=""/>
  </channel>
  <image rdf:about="http://gmane.org/img/gmane-25t.png">
    <title>Gmane</title>
    <url>http://gmane.org/img/gmane-25t.png</url>
    <link>http://gmane.org</link>
  </image>
  <item rdf:about="http://comments.gmane.org/gmane.comp.python.lxml.devel/6411">
    <title>xml install problems</title>
    <link>http://comments.gmane.org/gmane.comp.python.lxml.devel/6411</link>
    <description>&lt;pre&gt;Hi,
I am try to install xml on centos6.2 with command:
pip-python install xml==2.3

it gives me this error:

Downloading/unpacking lxml==2.3
  Running setup.py egg_info for package lxml
    Building lxml version 2.3.
    Building with Cython 0.14.1.
    Using build configuration of libxslt 1.1.26
    Building against libxml2/libxslt in the following directory: /usr/lib64
Installing collected packages: lxml
  Running setup.py install for lxml
    Building lxml version 2.3.
    Building with Cython 0.14.1.
    Using build configuration of libxslt 1.1.26
    Building against libxml2/libxslt in the following directory: /usr/lib64
    building 'lxml.etree' extension
    gcc -pthread -fno-strict-aliasing -O2 -g -pipe -Wall
-Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector
--param=ssp-buffer-size=4 -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv
-DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions
-fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic
-D_GNU_SOURCE -fPIC -fwrapv -fPIC -I/usr/include/libxml2
-I/usr/include/python2.6 -c src/lxml/lxml.etree.c -o
build/temp.linux-x86_64-2.6/src/lxml/lxml.etree.o -w
    {standard input}: Assembler messages:
    {standard input}:150508: Warning: end of file not at end of a line;
newline inserted
    gcc: Internal error: Killed (program cc1)
    Please submit a full bug report.
    See &amp;lt;http://bugzilla.redhat.com/bugzilla&amp;gt; for instructions.
    error: command 'gcc' failed with exit status 1
    Complete output from command /usr/bin/python -c "import
setuptools;__file__='/root/build/lxml/setup.py';execfile(__file__)" install
--single-version-externally-managed --record
/tmp/pip-g_IvOB-record/install-record.txt:
    Building lxml version 2.3.

Building with Cython 0.14.1.

Using build configuration of libxslt 1.1.26

Building against libxml2/libxslt in the following directory: /usr/lib64

running install

running build

running build_py

running build_ext

building 'lxml.etree' extension

gcc -pthread -fno-strict-aliasing -O2 -g -pipe -Wall
-Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector
--param=ssp-buffer-size=4 -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv
-DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions
-fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic
-D_GNU_SOURCE -fPIC -fwrapv -fPIC -I/usr/include/libxml2
-I/usr/include/python2.6 -c src/lxml/lxml.etree.c -o
build/temp.linux-x86_64-2.6/src/lxml/lxml.etree.o -w

{standard input}: Assembler messages:

{standard input}:150508: Warning: end of file not at end of a line; newline
inserted

gcc: Internal error: Killed (program cc1)

Please submit a full bug report.

See &amp;lt;http://bugzilla.redhat.com/bugzilla&amp;gt; for instructions.

error: command 'gcc' failed with exit status 1

----------------------------------------
Command /usr/bin/python -c "import
setuptools;__file__='/root/build/lxml/setup.py';execfile(__file__)" install
--single-version-externally-managed --record
/tmp/pip-g_IvOB-record/install-record.txt failed with error code 1
Storing complete log in /root/.pip/pip.log

I already installed libxml2-devel and libxslt-devel, can some body tell me
what's going on?

thanks in adv

Regards

&lt;/pre&gt;</description>
    <dc:creator>William Herry</dc:creator>
    <dc:date>2012-05-25T01:40:46</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.python.lxml.devel/6410">
    <title>gpx tostring for xml newbe</title>
    <link>http://comments.gmane.org/gmane.comp.python.lxml.devel/6410</link>
    <description>&lt;pre&gt;I wanted to do some analyzing of Garmin gpx track and waypoint files. So 
I thought of diving into lxml and start with pretty printing a garmin 
file. Garmin writes all info on a single line which is not pretty. When 
split into lines a waypoint looks like:
&amp;lt;wpt lat="52.151006" lon="15.373755"&amp;gt;
&amp;lt;ele&amp;gt;14.742697&amp;lt;/ele&amp;gt;
&amp;lt;time&amp;gt;2012-05-19T20:23:15Z&amp;lt;/time&amp;gt;
&amp;lt;name&amp;gt;Gazon&amp;lt;/name&amp;gt;
&amp;lt;sym&amp;gt;Park&amp;lt;/sym&amp;gt;
&amp;lt;extensions&amp;gt;
&amp;lt;wptx1:WaypointExtension&amp;gt;
&amp;lt;wptx1:Samples&amp;gt;3&amp;lt;/wptx1:Samples&amp;gt;
&amp;lt;/wptx1:WaypointExtension&amp;gt;
&amp;lt;/extensions&amp;gt;
&amp;lt;/wpt&amp;gt;

when I do:

wpt = '{%s}wpt' % xmlns

     waypoints = root.findall(wpt)

for point in waypoints:

print point.get('lon'), point.get('lat')

         print etree.tostring(point, pretty_print=True, with_tail=False)

I get a lot of info from the beginning of the file between "&amp;lt;wpt" and 
"lat=".

&amp;lt;wpt xmlns="http://www.topografix.com/GPX/1/1" 
xmlns:gpxx="http://www.garmin.com/xmlschemas/GpxExtensions/v3" 
xmlns:wptx1="http://www.garmin.com/xmlschemas/WaypointExtension/v1" 
xmlns:gpxtpx="http://www.garmin.com/xmlschemas/TrackPointExtension/v1" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" lat="52.151006" 
lon="15.373755"&amp;gt;
&amp;lt;ele&amp;gt;14.742697&amp;lt;/ele&amp;gt;
&amp;lt;time&amp;gt;2012-05-19T20:23:15Z&amp;lt;/time&amp;gt;
&amp;lt;name&amp;gt;Gazon&amp;lt;/name&amp;gt;
&amp;lt;sym&amp;gt;Park&amp;lt;/sym&amp;gt;
&amp;lt;extensions&amp;gt;
&amp;lt;wptx1:WaypointExtension&amp;gt;
&amp;lt;wptx1:Samples&amp;gt;3&amp;lt;/wptx1:Samples&amp;gt;
&amp;lt;/wptx1:WaypointExtension&amp;gt;
&amp;lt;/extensions&amp;gt;
&amp;lt;/wpt&amp;gt;

Why is that and how to avoid?

Thanks for helping this xml and lxml newbe,
Janwillem

_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml&amp;lt; at &amp;gt;lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
&lt;/pre&gt;</description>
    <dc:creator>janwillem</dc:creator>
    <dc:date>2012-05-20T12:35:03</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.python.lxml.devel/6409">
    <title>Re-announcing SOX: Sequential Output of XML</title>
    <link>http://comments.gmane.org/gmane.comp.python.lxml.devel/6409</link>
    <description>&lt;pre&gt;A correspondent has kindly pointed out that the Creative Commons
BY-NC-ND license does not allow anyone to actually use this
module.  I've changed the license to BY-NC.

John Shipman (john&amp;lt; at &amp;gt;nmt.edu), Applications Specialist
New Mexico Tech Computer Center, Speare 146, Socorro, NM 87801
(575) 835-5735, http://www.nmt.edu/~john
   ``Let's go outside and commiserate with nature.''  --Dave Farber

---------- Forwarded message ----------
Date: Thu, 17 May 2012 21:06:05 -0600 (MDT)
From: John W. Shipman &amp;lt;john&amp;lt; at &amp;gt;nmt.edu&amp;gt;
To: lxml-dev &amp;lt;lxml&amp;lt; at &amp;gt;lxml.de&amp;gt;
Cc: John Shipman &amp;lt;john&amp;lt; at &amp;gt;nmt.edu&amp;gt;
Subject: Announcing SOX: Sequential Output of XML

My CGI script that generates a huge XHTML table has been running
into storage limitations because it uses lxml to build the table.

So I build a Python module that applications can use to generate
any XML content as a stream, so you can generate arbitrarily
large XML files with a trivial amount of storage:

     http://www.nmt.edu/tcc/projects/sox/

Features:

- Like Fredrik Lundh's builder.py module, it takes very little
   code to generate a lot of XML.  If you're unfamiliar with this
   little gem, see here:

     http://effbot.org/zone/element-builder.htm

- The module insures that every start tag has a matching end tag.

Feel free to try this package out (Creative Commons BY-NC-ND
license).  I would greatly appreciate any feedback.

Best regards,
John Shipman (john&amp;lt; at &amp;gt;nmt.edu), Applications Specialist
New Mexico Tech Computer Center, Speare 146, Socorro, NM 87801
(575) 835-5735, http://www.nmt.edu/~john
   ``Let's go outside and commiserate with nature.''  --Dave Farber
_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml&amp;lt; at &amp;gt;lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
&lt;/pre&gt;</description>
    <dc:creator>John W. Shipman</dc:creator>
    <dc:date>2012-05-18T18:50:45</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.python.lxml.devel/6406">
    <title>Announcing SOX: Sequential Output of XML</title>
    <link>http://comments.gmane.org/gmane.comp.python.lxml.devel/6406</link>
    <description>&lt;pre&gt;My CGI script that generates a huge XHTML table has been running
into storage limitations because it uses lxml to build the table.

So I build a Python module that applications can use to generate
any XML content as a stream, so you can generate arbitrarily
large XML files with a trivial amount of storage:

     http://www.nmt.edu/tcc/projects/sox/

Features:

- Like Fredrik Lundh's builder.py module, it takes very little
   code to generate a lot of XML.  If you're unfamiliar with this
   little gem, see here:

     http://effbot.org/zone/element-builder.htm

- The module insures that every start tag has a matching end tag.

Feel free to try this package out (Creative Commons BY-NC-ND
license).  I would greatly appreciate any feedback.

Best regards,
John Shipman (john&amp;lt; at &amp;gt;nmt.edu), Applications Specialist
New Mexico Tech Computer Center, Speare 146, Socorro, NM 87801
(575) 835-5735, http://www.nmt.edu/~john
   ``Let's go outside and commiserate with nature.''  --Dave Farber
_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml&amp;lt; at &amp;gt;lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
&lt;/pre&gt;</description>
    <dc:creator>John W. Shipman</dc:creator>
    <dc:date>2012-05-18T03:06:05</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.python.lxml.devel/6405">
    <title>lxml.get_include() does not return system libxml paths</title>
    <link>http://comments.gmane.org/gmane.comp.python.lxml.devel/6405</link>
    <description>&lt;pre&gt;Hi Stefan,

we submitted a patch (that is currently sitting in git trunk) for distributing pxd interface files with lxml (including the static Windows package), to simplify writing cython code that class the lxml Cython API interface.

Unfortunately, we're not quite there yet, because lxml.get_include() still does not return the system include path for libxml (/usr/include/libxml2 or the like), in the cases in which the system ones were used (I think it does work in the case where libxml header files are being shipped within the lxml static package, like on Windows; I can't verify since there is no static Windows package for 2.4 yet).

This means that each and every users of the lxml Cython API must include a build system that duplicates lxml's one build system and finds the system libxml include path.

I was trying to get to a point where a lxml Cython API user should only call lxml.get_include() and get a list of pxd/h directories to be used for both cythoning (pxd) and compiling (h) the extension. Do you think it makes sense to get there?

Thanks!
&lt;/pre&gt;</description>
    <dc:creator>Giovanni Bajo</dc:creator>
    <dc:date>2012-05-17T12:59:06</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.python.lxml.devel/6404">
    <title>Custom XPath functions and "self" (or ".")</title>
    <link>http://comments.gmane.org/gmane.comp.python.lxml.devel/6404</link>
    <description>&lt;pre&gt;If I create a custom XPath function and use it in my XSL sheet using
`self`, `self::node()`, `.` or `current()` as the nodeset passed in,
for example:

&amp;lt;xsl:if test="custom:my-func(.)"&amp;gt;...

... the list of nodes is empty within the XPath function. If I use the
name of the node referenced by `self` then everything works as
expected and a list of nodes is passed into the function.

I've created a testcase to explain better than I can in an email:
http://pastebin.com/5jkBhRSD

Is this the expected behaviour?

&lt;/pre&gt;</description>
    <dc:creator>Phillip B Oldham</dc:creator>
    <dc:date>2012-05-12T18:21:19</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.python.lxml.devel/6401">
    <title>Better error messages from XSL translations?</title>
    <link>http://comments.gmane.org/gmane.comp.python.lxml.devel/6401</link>
    <description>&lt;pre&gt;Is there any way to get better error messages when performing XSL
translations? For instance, a line number or even a specific filename?

&lt;/pre&gt;</description>
    <dc:creator>Phillip B Oldham</dc:creator>
    <dc:date>2012-05-11T18:05:01</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.python.lxml.devel/6398">
    <title>XPathFunctionError on second call with user functions</title>
    <link>http://comments.gmane.org/gmane.comp.python.lxml.devel/6398</link>
    <description>&lt;pre&gt;Hi,

When using a custom XPath function (as lxml.cssselect does for 
lower-case) I get a strange XPathFunctionError error when using a 
compiled XPath object more than once.

Here is a minimal test case:


from lxml import etree, html

def _make_lower_case(context, s):
     return s.lower()

ns = etree.FunctionNamespace('http://codespeak.net/lxml/css/')
ns.prefix = 'css'
ns['lower-case'] = _make_lower_case

#doc = html.parse('/home/simon/css3-backgrounds.html')
doc = html.parse('http://www.w3.org/TR/css3-background/')

expr = etree.XPath('//*[css:lower-case(name(.)) = "li"]')
print('first call')
expr(doc)
print('second call')
expr(doc)
print('third call')
expr(doc)


And the output I get with lxml 2.3.4:

first call
second call
Traceback (most recent call last):
   File "xpath_lower.py", line 20, in &amp;lt;module&amp;gt;
     expr(doc)
   File "xpath.pxi", line 466, in lxml.etree.XPath.__call__ 
(src/lxml/lxml.etree.c:119238)
   File "xpath.pxi", line 235, in 
lxml.etree._XPathEvaluatorBase._handle_result (src/lxml/lxml.etree.c:117011)
   File "lxml.etree.pyx", line 280, in 
lxml.etree._ExceptionContext._raise_if_stored (src/lxml/lxml.etree.c:7454)
lxml.etree.XPathFunctionError: XPath function 
'{b'\xd0\x85?\x01'}b'lower-case'' not found


Note that the '\xd0\x85?\x01' string looks random and changes every time 
I run the script.

Is something wrong in my code?


Regards,
&lt;/pre&gt;</description>
    <dc:creator>Simon Sapin</dc:creator>
    <dc:date>2012-05-10T09:59:32</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.python.lxml.devel/6396">
    <title>Future of XSL-FO (fwd)</title>
    <link>http://comments.gmane.org/gmane.comp.python.lxml.devel/6396</link>
    <description>&lt;pre&gt;Sorry, sent this one to the old lxml mailing list.

John Shipman (john&amp;lt; at &amp;gt;nmt.edu), Applications Specialist
New Mexico Tech Computer Center, Speare 146, Socorro, NM 87801
(575) 835-5735, http://www.nmt.edu/~john
   ``Let's go outside and commiserate with nature.''  --Dave Farber

---------- Forwarded message ----------
Date: Wed, 9 May 2012 11:51:00 -0600 (MDT)
From: John W. Shipman &amp;lt;john&amp;lt; at &amp;gt;nmt.edu&amp;gt;
To: Carlos Araya &amp;lt;carlos.araya&amp;lt; at &amp;gt;gmail.com&amp;gt;
Cc: lxml-dev &amp;lt;lxml-dev&amp;lt; at &amp;gt;codespeak.net&amp;gt;,
     docbook-apps mailing list &amp;lt;docbook-apps&amp;lt; at &amp;gt;lists.oasis-open.org&amp;gt;
Subject: Future of XSL-FO

On Tue, 8 May 2012, Carlos Araya wrote:

+--
| Just curious as to why you think xsl-fo is a dead standard?
+--

A colleague (who does not wish to be quoted) says there's not
much action on the committee anymore.  Although it is still
officially supported, it seems moribund.

I was sort of hoping that the committee was working on adding
critical features not in the current standard that modern book
designers appreciate, e.g., constraints about positioning content
on facing pages, or text floated on both sides of a figure, or a
page model not as primitive as the five-region simple-page-master
model.  Apparently not.

I've heard a few people over the past few years insisting that
the replacement for XSL-FO (within W3C) is to use the "&amp;lt; at &amp;gt;media
print" feature of the CSS standard.  If I read that correctly,
that means that in order most ordinary people to get a decent
print rendering of their web page, the following must hold:

   (1) The page author has to write the CSS to define how to
       render it on a fixed page size;

   (2) Their browser supports the "&amp;lt; at &amp;gt;media print" rule and
       renders it correctly.

Then the user prints it using the browser's print function.

Am I right?  Is this practical?  Do browsers do the right thing?
Will this eventually obviate the need for XSL-FO?  What replaces
the Modular Style Sheets in my DocBook toolchain?

What about the important differences between Web and print
rendering?  Will there be a table of contents?  Will that and
cross-references display page numbers instead of useless,
unclickable underlining?  How does that work in CSS?

One of my pet peeves is Web pages that can't be printed because
they contain program source code with lines way too wide to fit
on the page.  Will CSS render such pages with the pages wrapped
and not truncated?  If they do, will I be able to tell the
hard line breaks from cases where the line got wrapped?

Best regards,
John Shipman (john&amp;lt; at &amp;gt;nmt.edu), Applications Specialist
New Mexico Tech Computer Center, Speare 146, Socorro, NM 87801
(575) 835-5735, http://www.nmt.edu/~john
   ``Let's go outside and commiserate with nature.''  --Dave Farber
_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml&amp;lt; at &amp;gt;lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
&lt;/pre&gt;</description>
    <dc:creator>John W. Shipman</dc:creator>
    <dc:date>2012-05-09T19:20:58</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.python.lxml.devel/6392">
    <title>[lxml-dev] XSL-FO generation tool available</title>
    <link>http://comments.gmane.org/gmane.comp.python.lxml.devel/6392</link>
    <description>&lt;pre&gt;I've just put the finishing touches on a Python module to assist
in the programmatic construction of XSL-FO files.  It's built on
top of lxml:

     http://www.nmt.edu/tcc/projects/fohelpers

If you find this useful or have any comments, I'd like to hear
about it.  I realize XSL-FO is a dead standard, but so long as
DocBook and DITA are around, it'll be supported.  Can anyone
suggest a better route from Python to PDF?

Best regards,
John Shipman (john&amp;lt; at &amp;gt;nmt.edu), Applications Specialist
New Mexico Tech Computer Center, Speare 146, Socorro, NM 87801
(575) 835-5735, http://www.nmt.edu/~john
   ``Let's go outside and commiserate with nature.''  --Dave Farber
&lt;/pre&gt;</description>
    <dc:creator>John W. Shipman</dc:creator>
    <dc:date>2012-05-08T21:17:43</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.python.lxml.devel/6391">
    <title>[lxml-dev] XML too big for memory</title>
    <link>http://comments.gmane.org/gmane.comp.python.lxml.devel/6391</link>
    <description>&lt;pre&gt;I have a CGI application that pulls data from a database and
builds an XHTML table displaying the results.  I'm currently
using lxml.

My problem is that the Web server has limited space, and some
queries generate a table that won't fit in that memory.

Has anyone written something like SAX, only for output?  I
have in mind something that serializes XML to an output file,
writing an element to that file as soon as it is complete.
So the primitives would be something like:

   - Constructor: designate the destination for the output
   - Start an element: writes the start tag, remembers it on
     a stack
   - Emit content
   - End an element: writes the end tag and pops the stack

Surely someone has already written something like this. If
so, please let me know.

Best regards,
John Shipman (john&amp;lt; at &amp;gt;nmt.edu), Applications Specialist
New Mexico Tech Computer Center, Speare 146, Socorro, NM 87801
(575) 835-5735, http://www.nmt.edu/~john
   ``Let's go outside and commiserate with nature.''  --Dave Farber
&lt;/pre&gt;</description>
    <dc:creator>John W. Shipman</dc:creator>
    <dc:date>2012-05-08T21:14:31</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.python.lxml.devel/6390">
    <title>xslt timeout problem (lxml 2.3.4)</title>
    <link>http://comments.gmane.org/gmane.comp.python.lxml.devel/6390</link>
    <description>&lt;pre&gt;Hello,

I am developping a web application where I use the xslt component of lxml.
response till a timeout (infinite loop?).

First, the system infos:
--
Python              : sys.version_info(major=2, minor=7, micro=3, 
releaselevel='final', serial=0)
lxml.etree          : (2, 3, 4, 0)
libxml used         : (2, 7, 8)
libxml compiled     : (2, 7, 8)
libxslt used        : (1, 1, 26)
libxslt compiled    : (1, 1, 26)
--

The code
--
...
parser = etree.XMLParser( no_network=False )
xslt_tree = etree.parse( 
StringIO.StringIO(contents.XSLT.generate(template_file)), parser )
transform = etree.XSLT( xslt_tree )
rdf_tree = etree.parse( rdf_file, parser )
...
--

The xslt generated by  
"StringIO.StringIO(contents.XSLT.generate(template_file))" contains url based 
includes as below:
--
&amp;lt;xsl:include href="http://example.org/xslt/xhtml/parts/footer.xsl" /&amp;gt;
--

When I remove all the includes from my xslt the request works well, so my 
guess is that there might be a problem with using url based xsl:include.
Commenting out the "transform = etree.XSLT( xslt_tree )" line also make the 
request complete (with an error), so I guess it is where something happens.

The code is working fine as it is with lxml 2.3.3.

Is there a problem with this code? (eg, a setting I have to pass to 
etree.XMLParser)

Thanks in advance,
Eric_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml&amp;lt; at &amp;gt;lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
&lt;/pre&gt;</description>
    <dc:creator>Eric Sagnes</dc:creator>
    <dc:date>2012-05-07T13:03:09</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.python.lxml.devel/6387">
    <title>threading + deepcopy(Element) = memory loss</title>
    <link>http://comments.gmane.org/gmane.comp.python.lxml.devel/6387</link>
    <description>&lt;pre&gt;Hello,

for a while we have seen extensive memory usage in our application.
Yesterday a colleague and I were able to find the cause of the memory
loss. A cache is caching etree instances and uses deepcopy() to return a
copy of the etree. The cache is filled and access from multiple threads,
hence the need for deepcopy.

The amount of lost memory depends on the amount of threads and -- to
some point -- on the amount of copy operations in each thread. I've
attached a sample script to reproduce the issue. I've utilized libxml2's
debug method mlMemUsed() to verify, that the memory isn't lost in
libxml2. I've also tested several thousand deepcopy() ops in one thread
and several thousand etree.parse() ops in several hundred threads. None
have shown a similar memory loss. According to my colleague Dirk Rothe,
there is no visible memory loss on Windows.

Example output with a 7kb document
----------------------------------
The first number is the RSS of the process in MB after the document has
been parsed. The second number is the RSS after all threads have stopped.

$ python etree_deepcopy.py 50 100
10.875
21.1796875
50 threads, 100 copy ops per thread
$ python etree_deepcopy.py 50 1000
10.875
21.1875
50 threads, 1000 copy ops per thread

$ python etree_deepcopy.py 100 100
10.87109375
30.86328125
100 threads, 100 copy ops per thread
$ python etree_deepcopy.py 100 200
10.875
31.19140625
100 threads, 200 copy ops per thread
$ python etree_deepcopy.py 100 300
10.875
31.2109375
100 threads, 300 copy ops per thread

$ python etree_deepcopy.py 200 100
10.875
40.46484375
200 threads, 100 copy ops per thread


Python:
  2.7.3 (self compiled with UCS-2)
Platform:
  Ubuntu 12.04 AMD64
(2, 3, 4, 0)
(2, 7, 8)
(1, 1, 26)

Christian
_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml&amp;lt; at &amp;gt;lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
&lt;/pre&gt;</description>
    <dc:creator>Christian Heimes</dc:creator>
    <dc:date>2012-05-04T13:00:21</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.python.lxml.devel/6386">
    <title>Problems with top level comments and meta tags</title>
    <link>http://comments.gmane.org/gmane.comp.python.lxml.devel/6386</link>
    <description>&lt;pre&gt;Hi,

while implementing a script that should extract a few things from
HTML pages, I've run into two problems with lxml.html. I'm not sure
if those are actually bugs or if I am doing something wrong, so I
haven't created entries in the bug tracker yet.

Problem 1: Top level comment has no parent

,----
| &amp;gt;&amp;gt;&amp;gt; html = "&amp;lt;!-- comment --&amp;gt;&amp;lt;html&amp;gt;&amp;lt;head&amp;gt;&amp;lt;title&amp;gt;foo&amp;lt;/title&amp;gt;&amp;lt;/head&amp;gt;&amp;lt;/html&amp;gt;"
| &amp;gt;&amp;gt;&amp;gt; tree = lxml.html.fromstring(html)
| &amp;gt;&amp;gt;&amp;gt; [tag.drop_tree() for tag in html.xpath("//comment()")]
| Traceback (most recent call last):
|   File "&amp;lt;stdin&amp;gt;", line 1, in &amp;lt;module&amp;gt;
|   File "/usr/lib/python2.7/dist-packages/lxml/html/__init__.py", line 169,
|   in drop_tree
|     assert parent is not None
`----

This method to remove some elements works for any comment or
element, just not if the comment is on top level (which
unfortunately happens on exactly those pages that I'm processing).

Is that behaviour related to the problems mentioned in the FAQ entry
"Why can't I just delete parents or clear the root node in
iterparse()"? I have tried to use tree.remove(tag) as well, but then
I get a ValueError: Element is not a child of this node.

The reason behind removing the comment elements (and others) is that
I would otherwise get them returned by itertext().

Problem 2: lxml.html.clean.Cleaner removes meta content

,----
| &amp;gt;&amp;gt;&amp;gt; html = "&amp;lt;html&amp;gt;&amp;lt;head&amp;gt;&amp;lt;meta name=\"keywords\" content=\"foo\"&amp;gt;&amp;lt;/head&amp;gt;&amp;lt;/html&amp;gt;"
| &amp;gt;&amp;gt;&amp;gt; cleaner = lxml.html.clean.Cleaner()
| &amp;gt;&amp;gt;&amp;gt; cleaner.page_structure = False
| &amp;gt;&amp;gt;&amp;gt; cleaner.meta = False
| &amp;gt;&amp;gt;&amp;gt; tree = lxml.html.fromstring(html)
| &amp;gt;&amp;gt;&amp;gt; cleaner(tree)
| &amp;gt;&amp;gt;&amp;gt; lxml.html.tostring(tree)
| '&amp;lt;html&amp;gt;&amp;lt;head&amp;gt;&amp;lt;meta name="keywords"&amp;gt;&amp;lt;/head&amp;gt;&amp;lt;/html&amp;gt;'
`----

To work around the above problem with the comments, I tried to use
the cleaner to get rid of the comments (which works fine), but this
would then kill the content attributes of the meta tags. Is that
intended?

The used version numbers (I've seen that there are slightly newer
versions available, but the changelogs didn't mention anything that
looked like those problems could be affected):

Python : sys.version_info(major=2, minor=7, micro=2, releaselevel='final', serial=0)
lxml.etree       : (2, 3, 0, 0)
libxml used      : (2, 7, 8)
libxml compiled  : (2, 7, 8)
libxslt used     : (1, 1, 26)
libxslt compiled : (1, 1, 26)



Thanks,
Adalbert

_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml&amp;lt; at &amp;gt;lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
&lt;/pre&gt;</description>
    <dc:creator>Adalbert Michelic</dc:creator>
    <dc:date>2012-05-04T07:37:22</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.python.lxml.devel/6385">
    <title>Memory debugging</title>
    <link>http://comments.gmane.org/gmane.comp.python.lxml.devel/6385</link>
    <description>&lt;pre&gt;Hello,

I'd like to suggest a trival feature for lxml. libxml2 and libxslt
support extended memory debugging when both libraries are compiled with
the the '--with-mem-debug' flag [1]. This is useful to debug how many
memory is actually used by the XML and XSTL code.

In order to show the amount of used memory and memory blocks, the two
functions xmlMemBlocks() [2] and xmlMemUsed() [3] needs to be wrapped.
Both functions are available in non-debug builds and return 0 in that case.

It might be worth to wrap the other debug helpers like
xmlMemDisplay(FILE *) [4], too.

Regards,
Christian

Example patch:

diff -ru lxml-2.3.4.orig/src/lxml/lxml.etree.pyx
lxml-2.3.4/src/lxml/lxml.etree.pyx
--- lxml-2.3.4.orig/src/lxml/lxml.etree.pyx     2012-03-26
12:28:58.000000000 +0200
+++ lxml-2.3.4/src/lxml/lxml.etree.pyx  2012-05-03 18:33:00.702130662 +0200
&amp;lt; at &amp;gt;&amp;lt; at &amp;gt; -2956,6 +2956,11 &amp;lt; at &amp;gt;&amp;lt; at &amp;gt;
     except _TargetParserResult, result_container:
         return result_container.result

+def memoryUsed():
+   return tree.xmlMemUsed()
+
+def memoryBlocks():
+   return tree.xmlMemBlocks()

 ################################################################################
 # Include submodules
diff -ru lxml-2.3.4.orig/src/lxml/tree.pxd lxml-2.3.4/src/lxml/tree.pxd
--- lxml-2.3.4.orig/src/lxml/tree.pxd   2011-09-25 18:58:05.000000000 +0200
+++ lxml-2.3.4/src/lxml/tree.pxd        2012-05-03 17:44:50.323798041 +0200
&amp;lt; at &amp;gt;&amp;lt; at &amp;gt; -322,6 +322,7 &amp;lt; at &amp;gt;&amp;lt; at &amp;gt;
 cdef extern from "libxml/xmlmemory.h":
     cdef void* xmlMalloc(size_t size) nogil
     cdef int xmlMemBlocks() nogil
+    cdef int xmlMemUsed() nogil

 cdef extern from "etree_defs.h":
     cdef bint _isElement(xmlNode* node) nogil


[1] http://xmlsoft.org/html/libxml-xmlmemory.html#DEBUG_MEMORY
[2] http://xmlsoft.org/html/libxml-xmlmemory.html#xmlMemBlocks
[3] http://xmlsoft.org/html/libxml-xmlmemory.html#xmlMemUsed
[4] http://xmlsoft.org/html/libxml-xmlmemory.html#xmlMemDisplay
_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml&amp;lt; at &amp;gt;lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
&lt;/pre&gt;</description>
    <dc:creator>Christian Heimes</dc:creator>
    <dc:date>2012-05-03T16:37:46</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.python.lxml.devel/6381">
    <title>How to count the number of times some tag occures in xlmfile?</title>
    <link>http://comments.gmane.org/gmane.comp.python.lxml.devel/6381</link>
    <description>&lt;pre&gt;The simplest way is:

counter=0
for i in tree.iter(tag='sometag'):
    counter += 1
But please note, I don't  use 'i' here. Any cleaner way?

P.
_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml&amp;lt; at &amp;gt;lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
&lt;/pre&gt;</description>
    <dc:creator>Piotr Oh</dc:creator>
    <dc:date>2012-04-27T09:35:13</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.python.lxml.devel/6380">
    <title>How to deal with XMLSchema IOError?</title>
    <link>http://comments.gmane.org/gmane.comp.python.lxml.devel/6380</link>
    <description>&lt;pre&gt;Hi

How to check for errors when loading XMLSchema file?

schema=etree.XMLSchema(file=polonSchemaFile)

and the polonSchemaFile does not exist.

lxml reports:
schema=etree.XMLSchema(file=polonSchemaFile)
  File "xmlschema.pxi", line 105, in lxml.etree.XMLSchema.__init__
(src/lxml\lxml.etree.c:132720)
lxml.etree.XMLSchemaParseError: failed to load external entity
"polon-v2/XSD.2.0/plikImp.xsd"

Exception handling does not help here I suppose?
P.
_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml&amp;lt; at &amp;gt;lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
&lt;/pre&gt;</description>
    <dc:creator>Piotr Oh</dc:creator>
    <dc:date>2012-04-27T07:58:54</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.python.lxml.devel/6377">
    <title>compiling a static lxml with Python3 on x86_64 linux</title>
    <link>http://comments.gmane.org/gmane.comp.python.lxml.devel/6377</link>
    <description>&lt;pre&gt;Hello,

I'm in a bit of a peculiar situation.  I'm attempting to compile lxml with
Python3 using statically-linked libraries on x86_64 linux (SLES 10, to be
exact).  I also do not have root access to the machine in question, so I
must compile all the dependencies from source and install them to a
location within my home directory.

I've tried to use the staticlxml recipe (
http://pypi.python.org/pypi/z3c.recipe.staticlxml), but it appears to not
work with buildout 2.0.0a1/Python3.  Using "python3 setup.py build
--static-deps" fails because when the dependencies are built, it does not
automatically include "-fPIC" in the CFLAGS, as is apparently required on
x86_64 linux platforms.

I believe I have successfully compiled all of the dependencies manually
(with -fPIC) and installed them to a location in my home directory.  If I
re-execute the "build --static-deps" command, the script will attempt to
re-download the source executables for the dependencies, which wipes out my
"-fPIC" change.  How do I use the "setup.py" script to statically link
against the libraries in my home directory?

Thanks,
Greg
_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml&amp;lt; at &amp;gt;lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
&lt;/pre&gt;</description>
    <dc:creator>Greg Fischer</dc:creator>
    <dc:date>2012-04-25T22:00:21</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.python.lxml.devel/6371">
    <title>lxml running on PyPy</title>
    <link>http://comments.gmane.org/gmane.comp.python.lxml.devel/6371</link>
    <description>&lt;pre&gt;Hi,

here's a little status update regarding lxml on PyPy. I got the basics of
lxml.etree working so far, mostly by patching up Cython and tracking down
bugs in PyPy's cpyext (CPython C-API compatibility) layer. I'm still
getting crashes during error reporting and didn't care much about XPath or
XSLT yet. But given that those do not have much Python interaction per se,
I don't expect major surprises on that front.

The results are very encouraging, given that PyPy lacks support for many of
the tweaks and hacks that are possible in CPython.

Here's a little parser benchmark:

$ python2.7 -m timeit -s 'import lxml.etree as et' 'et.parse("hamlet.xml")'
100 loops, best of 3: 4.61 msec per loop
$ pypy -m timeit -s 'import lxml.etree as et' 'et.parse("hamlet.xml")'
100 loops, best of 3: 5.74 msec per loop

Pretty acceptable. That makes lxml the fastest XML parser that currently
exists for PyPy.

And here's a worst case benchmark for element proxy instantiation and
iteration, likely the most heavily tuned parts of lxml when running in CPython:

$ python2.7 -m timeit -s 'import lxml.etree as et; \
                      t=et.parse("hamlet.xml")'   'list(t.iter())'
100 loops, best of 3: 2.71 msec per loop
$ pypy -m timeit -s 'import lxml.etree as et; \
                      t=et.parse("hamlet.xml")'   'list(t.iter())'
10 loops, best of 3: 28.2 msec per loop

That's about a factor of 10. Sounds huge, but it's actually not bad,
considering the amount of extra work that has to be done for PyPy here.
Certainly doesn't render it unusable, we are still talking milliseconds
after all. And so far, there hasn't gone any tuning into it, so it's not
the final word.

I'm pretty optimistic.

BTW, if you're interested in improvements on this front, you can help
getting this done faster by using the "donate" button on lxml's project
home page. Any donation will help in freeing some of my time for this.

Stefan
_________________________________________________________________
Mailing list for the lxml Python XML toolkit - http://lxml.de/
lxml&amp;lt; at &amp;gt;lxml.de
https://mailman-mail5.webfaction.com/listinfo/lxml
&lt;/pre&gt;</description>
    <dc:creator>Stefan Behnel</dc:creator>
    <dc:date>2012-04-21T22:46:40</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.python.lxml.devel/6369">
    <title>Allocating xml/xslt in shared memory</title>
    <link>http://comments.gmane.org/gmane.comp.python.lxml.devel/6369</link>
    <description>&lt;pre&gt;Hello Stefan,
   Do You think it is possible to allocate xml/xsl in shared memory?
   My wsgi application uses lxml to parse config files (xml) and xls
stylesheets. Both configs and xsl parsed at application start and do
not modified till application exists.
   Rss for 1 process just after xml/xsl parsing grows 15Mb (production
- 15 modules) and 35 Mb (debug - 25 modules). The modules numbler will
only grow in time.
   Currently we have 50 workers - so 15Mb * 50 = 750Mb for xml/xsl
that never changes. I suppose it is not shared across forks because of
python incref/decref nature but i may be wrong?

   Do you think it will be possible to use xmlMemSetup (
http://xmlsoft.org/html/libxml-xmlmemory.html#xmlMemSetup ) to setup
xmllib to use custom malloc that will be using mmap?
   How lxml will behave if i do so?
   Do any xslt object internal structures changed while i call
xslt(xml_doc) from python?

&lt;/pre&gt;</description>
    <dc:creator>Evgeny Turnaev</dc:creator>
    <dc:date>2012-04-19T09:32:43</dc:date>
  </item>
  <item rdf:about="http://comments.gmane.org/gmane.comp.python.lxml.devel/6363">
    <title>cssselect tokenizer and parser</title>
    <link>http://comments.gmane.org/gmane.comp.python.lxml.devel/6363</link>
    <description>&lt;pre&gt;Hi,

I’ve been reviewing the parser in cssselect and fixing a lot of bugs, as 
you can see in recent commits[1].

I’m about to make more significant changes to the tokenizer and the 
parser to fix the underlying problem behind lxml bug #19 [2] and many 
related bugs. In short, the tokenizer omits most whitespace, including 
some that represent a descendant combinator. To compensate, the parser 
just assumes a descendant combinator (sometimes incorrectly) whenever it 
can not find something else that it expected.

So I’m going to touch most of this code, but I’m still a bit uneasy with 
it. There are many corner cases where it does not matches the grammar in 
the spec[3]. In particular, the behavior in arguments for functional 
pseudo-classes it a bit strange, if not plain wrong.

I would like to write a tokenizer+parser more closely based on the 
grammar. Now, I just happen to have in tinycss[4] a CSS tokenizer that 
already handles all the gory details of backslash-escaping, parsing 
numbers, etc. I actually have two implementations of it: one in pure 
Python for compatibility, and one in Cython for speed. (The tokenizer 
performance is critical when parsing whole stylesheets; perhaps less so 
for comparatively small selectors.) However, tinycss currently only 
supports Python 2.6+.


So, what do you think of this whole situation? Should we re-use the 
tinycss tokenizer? Duplicate its code? Try and fix the buggy one?


[1] https://github.com/SimonSapin/cssselect/commits/master
[2] https://github.com/lxml/lxml/pull/19/files
[3] http://www.w3.org/TR/selectors/#w3cselgrammar
[4] http://packages.python.org/tinycss/


Regards,
&lt;/pre&gt;</description>
    <dc:creator>Simon Sapin</dc:creator>
    <dc:date>2012-04-17T09:43:36</dc:date>
  </item>
  <textinput rdf:about="http://search.gmane.org/?group=$group=gmane.comp.python.lxml.devel">
    <title>Search Engine</title>
    <description>Search the mailing list at Gmane</description>
    <name>query</name>
    <link>http://search.gmane.org/?group=$group=gmane.comp.python.lxml.devel</link>
  </textinput>
</rdf:RDF>

