Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Another way of profiling pyFF's memory usage is just following RES in top or htop for a long-running pyFF process, that has a 60s refresh interval. I normally use this pipeline




Code Block
- when update: 
 - load: 
   - edugain.xml 
- when request: 
 - select: 
 - pipe: 
   - when accept application/samlmetadata+xml application/xml: 
     - first 
     - finalize: 
         cacheDuration: PT12H 
         validUntil: P10D 
     - sign: 
         key: cert/sign.key 
         cert: cert/sign.crt 
     - emit application/samlmetadata+xml 
     - break 
   - when accept application/json: 
     - discojson 
     - emit application/json 
     - break


to feed the edugain feed that has been dowloaded using

...

Code Block
from lxml import etree, objectify
import pickle
# Create pickled datafile
source = open("edugain.xml", "r", encoding="utf-8")
sink = open("edugain.pkl", "w")

t = objectify.parse(source)
p = pickle.dumps(t).decode('latin1')
sink.write(p)

# Read pickled object back in pyFF
def parse_xml
	return pickle.loads(io.encode('latin1'))

In metadata parser:
t = parse_xml(content) #Instead of parse_xml(unicode_stream(content))

Using un/pickling, pyFF starts out using ~800Mb of RES that slowly extends to a steady 1.2-1.5G.

xml.sax etree.ElementTree parser

...

Code Block
import xml.sax
class XML(xml.sax.handler.ContentHandler):
  def __init__(self):
    self.current = etree.Element("root")
    self.nsmap = {}

  def startElement(self, name, attrs):
    attributes = {}
    for key, value in attrs.items():
        key = key.split(':')
        if len(key) == 2 and key[0] == 'xmlns':
            self.nsmap[key[-1]] = value
        else:
            attributes[key[-1]] = value
    name = name.split(':')
    if len(name) == 2:
        name = f"{{{ self.nsmap.get(name[0], name[0]) }}}{ name[-1] }"
    else:
        name = name[-1]
    self.current = etree.SubElement(self.current, name, attributes, nsmap=self.nsmap)

  def endElement(self, name):
    self.current = self.current.getparent()

  def characters(self, data):
    d = data.strip()
    if d:
      self.current.text = d

def parse_xml(io, base_url=None):
    parser = xml.sax.make_parser()
    handler = XML()
    parser.setContentHandler(handler)
    parser.parse(io)
    return etree.ElementTree(handler.current[0])

Using xml.sax parser pyFF starts out using ~800Mb of RES that slowly extends to a steady 1.2-1.5G.