Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This page will contain the activity log of the pyFF+ experiments and endeavours.

Table of Contents

Memory profiling

This is the bare import and code usage of using heapy to print heap information while running python code.

...

Code Block
import xml.sax
class XML(xml.sax.handler.ContentHandler):
  def __init__(self):
    self.current = etree.Element("root")
    self.nsmap = { 'xml': 'http://www.w3.org/XML/1998/namespace'}
    self.buffer = ''

  def startElement(self, name, attrs):
    attributes = {}
    for key, value in attrs.items():
        key = key.split(':')
        if len(key) == 2 and:
            if key[0] == 'xmlns':
                self.nsmap[key[-1]] = value
            else:
                attributes[f"{{{ self.nsmap.get(key[0], key[0]) }}}{ key[-1] }"] = value
        elif value:
            attributes[key[-1]] = value

    name = name.split(':')
    if len(name) == 2:
        name = f"{{{ self.nsmap.get(name[0], name[0]) }}}{ name[-1] }"
    else:
        name = name[-1]
    self.current = etree.SubElement(self.current, name, attributes, nsmap=self.nsmap)

  def endElement(self, name):
    self.current.text = self.buffer
    self.current.tail = "\n"
    self.current = self.current.getparent()
    self.buffer = ''

  def characters(self, data):
    d = data.strip()
    if d:
        self.current.textbuffer += d


def parse_xml(io, base_url=None):
    parser = xml.sax.make_parser()
    handler = XML()
    parser.setContentHandler(handler)
    parser.parse(io)
    return etree.ElementTree(handler.current[0])

...

Code Block
			def process_handler():
			...

            # Only return request if md is valid?
            valid = True
            log.debug(f"Resource walk")
            for child in request.registry.md.rm.walk():
                log.debug(f"Resource {child.url}")
                valid = valid and child.is_valid()

            if len(request.registry.md.rm) == 0 or not valid:
                log.debug(f"Resource not valid")
				# 500: The server has either erred or is incapable of performing the requested operation.
                raise exc.exception_response(500)
            else:
                log.debug(f"Resource valid")

            return response

Performance-test branch

Incorporated the "store.py" changes in this branch https://github.com/IdentityPython/pyFF/compare/preformance-tests to see how that would change the memory consumption of pyFF, but it didn't change much. It ends up using ~1.8G of RES after several hours of continuously (60s) refreshing the edugain metadata feed.

The changes try to store entities as their serialized (tostring) version of the metadata, and re-parse it on demand. The idea being that we don't need to keep track of the whole parsed tree, but just the serialized entities.

Parked

https://tech.buzzfeed.com/finding-and-fixing-memory-leaks-in-python-413ce4266e7d

Size limitations

We plan to create a controlled mock metadata set containing multitudes of edugain metadata (e.g. 5k, 10k, 20k and 100k entities) to see how pyFF would cope with that amount of entities and metadata.