Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
import xml.sax
class XML(xml.sax.handler.ContentHandler):
  def __init__(self):
    self.current = etree.Element("root")
    self.nsmap = { 'xml': 'http://www.w3.org/XML/1998/namespace'}
    self.buffer = ''

  def startElement(self, name, attrs):
    attributes = {}
    for key, value in attrs.items():
        key = key.split(':')
        if len(key) == 2:
            if key[0] == 'xmlns':
                self.nsmap[key[-1]] = value
            else:
                attributes[f"{{{ self.nsmap.get(key[0], key[0]) }}}{ key[-1] }"] = value
        elif elsevalue:
            attributes[key[-1]] = value

    name = name.split(':')
    if len(name) == 2:
        name = f"{{{ self.nsmap.get(name[0], name[0]) }}}{ name[-1] }"
    else:
        name = name[-1]
    self.current = etree.SubElement(self.current, name, attributes, nsmap=self.nsmap)

  def endElement(self, name):
    self.current.text = self.buffer
    self.current.tail = "\n"
    self.current = self.current.getparent()
    self.buffer = ''

  def characters(self, data):
    d = data.strip()
    if d:
        self.current.textbuffer += d


def parse_xml(io, base_url=None):
    parser = xml.sax.make_parser()
    handler = XML()
    parser.setContentHandler(handler)
    parser.parse(io)
    return etree.ElementTree(handler.current[0])

...

Code Block
			def process_handler():
			...

            # Only return request if md is valid?
            valid = True
            log.debug(f"Resource walk")
            for child in request.registry.md.rm.walk():
                log.debug(f"Resource {child.url}")
                valid = valid and child.is_valid()

            if len(request.registry.md.rm) == 0 or not valid:
                log.debug(f"Resource not valid")
				# 500: The server has either erred or is incapable of performing the requested operation.
                raise exc.exception_response(500)
            else:
                log.debug(f"Resource valid")

            return response

Performance-test branch

Incorporated the "store.py" changes in this branch https://github.com/IdentityPython/pyFF/compare/preformance-tests to see how that would change the memory consumption of pyFF, but it didn't change much. It ends up using ~1.8G of RES after several hours of continuously (60s) refreshing the edugain metadata feed.

The changes try to store entities as their serialized (tostring) version of the metadata, and re-parse it on demand. The idea being that we don't need to keep track of the whole parsed tree, but just the serialized entities.

Parked

https://tech.buzzfeed.com/finding-and-fixing-memory-leaks-in-python-413ce4266e7d

Size limitations

We plan to create a controlled mock metadata set containing multitudes of edugain metadata (e.g. 5k, 10k, 20k and 100k entities) to see how pyFF would cope with that amount of entities and metadata.