Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.


Code Block
import xml.sax
class XML(xml.sax.handler.ContentHandler):
  def __init__(self):
    self.current = etree.Element("root")
    self.nsmap = { 'xml': ''}
    self.buffer = ''

  def startElement(self, name, attrs):
    attributes = {}
    for key, value in attrs.items():
        key = key.split(':')
        if len(key) == 2:
            if key[0] == 'xmlns':
                self.nsmap[key[-1]] = value
                attributes[f"{{{ self.nsmap.get(key[0], key[0]) }}}{ key[-1] }"] = value
        elif elsevalue:
            attributes[key[-1]] = value

    name = name.split(':')
    if len(name) == 2:
        name = f"{{{ self.nsmap.get(name[0], name[0]) }}}{ name[-1] }"
        name = name[-1]
    self.current = etree.SubElement(self.current, name, attributes, nsmap=self.nsmap)

  def endElement(self, name):
    self.current.text = self.buffer
    self.current.tail = "\n"
    self.current = self.current.getparent()
    self.buffer = ''

  def characters(self, data):
    d = data.strip()
    if d:
        self.current.textbuffer += d

def parse_xml(io, base_url=None):
    parser = xml.sax.make_parser()
    handler = XML()
    return etree.ElementTree(handler.current[0])


Code Block
			def process_handler():

            # Only return request if md is valid?
            valid = True
            log.debug(f"Resource walk")
            for child in
                log.debug(f"Resource {child.url}")
                valid = valid and child.is_valid()

            if len( == 0 or not valid:
                log.debug(f"Resource not valid")
				# 500: The server has either erred or is incapable of performing the requested operation.
                raise exc.exception_response(500)
                log.debug(f"Resource valid")

            return response

Performance-test branch

Incorporated the "" changes in this branch to see how that would change the memory consumption of pyFF, but it didn't change much. It ends up using ~1.8G of RES after several hours of continuously (60s) refreshing the edugain metadata feed.

The changes try to store entities as their serialized (tostring) version of the metadata, and re-parse it on demand. The idea being that we don't need to keep track of the whole parsed tree, but just the serialized entities.


Size limitations

We plan to create a controlled mock metadata set containing multitudes of edugain metadata (e.g. 5k, 10k, 20k and 100k entities) to see how pyFF would cope with that amount of entities and metadata.