...

Panel

title	GN4-3 project team

Name	Organisation	Role
Mihaly	KIFU	PI, Team Member
Michael	LRZ	Scrum Master
Martin	SURF	Team Member
Halil	GRNET	Team Member

Panel

title	Stakeholders

Name	Organisation	Role
Christos	GÉANT	eduTEAMS
Leif	SUNET	pyID/pyFF community

Activity overview

Panel

title	Description

pyFF is widely used in our community to provide Discovery and Metadata Query services. This topic is about some optimizations of pyFF for operations.

When processing the eduGAIN metadata, pyFF memory usage balloons to the gigabytes, hereby inflicting some extra cost when running in procured VM-s like AWS. The startup/restart process speed, and service behavior while being started/restarted may also be improved. In particular, the service should never throw 5xx errors while in a normal startup/shutdown process.

The goal of this project is to optimize pyFF memory consumption and (re-)start behavior.

For the memory consumption, the underlying XML processing library may be swapped, or the memory-intensive part of the processing may be done on a short-lived cheap VM and the resulting in-memory representation serialized, transferred to the production instances and de-serialized.

For the in-(re)start behavior it must be established what is the right way of configuring pyFF so that it won’t take queries while its internal database is still incomplete.

Panel

title	Activity goals

#Please describe the goals of Activity, including what needs to be delivered, participants, the community(ies) that require a solution. Describe when the Activity is done and how to measure the success of it, in a SMART way. - delete this line after using the template#

<Enter here>The goal of the activity is to improve the performance of the existing pyFF in regard to memory consumption to enable metadata processing on machines with few resources.

Activity Details

Panel

title	Technical details

The SAML metadata appliance pyFF(https://pyff.io/) is widely used in the GÉANT community. PyFF - short for python Federation Feeder - is a simple, yet complete SAML metadata aggregator.

The source code is available on GitHub: https://github.com/IdentityPython/pyFF

Although the tool itself is pretty small and most task can be performed with few resources, the process of processing SAML metadata requires a lot of memory. For this reason, the behaviour of pyFF in terms of memory consumption shall be investigated. Perhaps the opportunity exists to improve the XML processing so that the consumption can be reduced. In the best case pyFF can then run on much smaller servers than before. This would, among other things, make it easier to use external servers, as this could drastically reduce costs.
Furthermore, it is to be examined whether the application can be modularized. In this way, different parts of the application could be encapsulated. This provides the capability to run resource intensive tasks on different servers. An outsourcing of the meta data processing to a serverless architecture at low costs is conceivable. This approach can be done with or without the previously described reduction of memory consumption.

Panel

title	Business case

Grant benefits to NRENs using pyFF:

Reduce resources for running pyFF
Reduce cost for hosted servers
Enable separating resource extensive parts of processing metadata to cloud services

#Please describe the technical details for the Activity. - delete this line after using the template#

Panel

title	Business case

#What is the business case for the Activity? Who would be beneficiaries of the results of the Activity and what would potential business case look like if applicable? - delete this line after using the template#

Panel

title	Risks

It might turn out that it is not possible to reduce memory consumption much

#Are there risks that influence either the implementation of the activity or its outcomes? - delete this line after using the template#

Panel

title	Data protection & Privacy

#How do

The activity does not affect data protection

and privacy impact the Activity? Think about e.g. handling of personal data of users - delete this line after using the template#
<Enter here>

or privacy

Panel

title	Definition of Done (DoD)

The activity is done once:

An investigation of memory consumption is conducted
Potential memory hot spots are identified
If there are hot spots, solutions are planned and implemented
pyFF is split into multiple modules to externalize the metadata processing
New implementation is committed to the official repository

#Please describe here the set of criteria that the product must meet in order to be considered finished. - delete this line after using the template#

Panel

title	Sustainability

#How are the results of the Activity intended to be used? If this requires further engagement, can you describe how you intent to sustain it? - delete this line after using the template#

After the end of the activity, the source code created will become part of the official repository and can then be used by every NREN interested.

Activity Results

Panel

title	Results

#Please provide pointers to completed and intermediary results of this activity - delete this line after using the template#

tbd

Meetings

Date	Activity	Owner	Minutes
	Kickoff meeting

...

Page tree

Versions Compared

Old Version 4

New Version 5

Key

Activity overview

Activity Details

Activity Results

Meetings

Page tree

Page History

Versions Compared

Old Version 4

New Version 5

Key

Activity overview

Activity Details

Activity Results

Meetings