General Clarifications

How do we combine subcategories into a score?

For example, OS1 has several subcomponents, many of which you may have or not to different levels.

I propose a mean of no combined score (Adam Slagell).

DaveK - I agree.

Hannah (meeting 8 July 16) - but what if is something essential marked zero? We could have mandatory and optional sub-requirements.

Adam - Since there is is not an audit, I don't believe it makes sense to talk about mandatory vs. optional. The whole guide is informational as I have viewed it. IF certain subcomponents are important to you as a consumer of the information, you would look at those individual scores.

Standardize Language

The spreadsheet and SCIv1 document have ambiguities. For example, one refers to service providers and another to service operators.

DaveK - yes - we need to check the whole document for this

Changes made by Sirtfi

Minutes of meeting on 13 May 2016 - Alf also notes that the Refeds Sirtfi activity has changed some of the wording in Incident Response. We should consider merging their changes back into SCI V2.

Meeting on 8 July 16 - Hannah - still try to avoid forking wherever possible, even if we do the full merging later. Was also noted that the scopes of SCIV2 and Sirtfi are different.

Base-level Examples

There are always questions of scope and completeness in filling out this evaluation form. While no implementation or documentation is ever exhaustive or covers every corner case, if there are significant holes then noting the scope that is covered is useful. For example, there may be centrally managed services for an infrastructure, while there are shared infrastructure at the resource providers that follow different policies. Or there may be different policies for different tiers of infrastructure worth noting.

From minutes of 1st June 2016 meeting

Adam suggests that we could see the section OS to be more of a "baseline standard". He will send a copy of the XSEDE Baseline Security document.
Eric points out that we need to include post-mortem analysis as a way of learning lessons. Do we expand IR2 or create a new bullet?

Adam- I think a new bullet makes more sense than stuffing it in IR2 IMO.

Operational Security

[OS1]

A security model addressing issues such as authentication, authorisation, access control, confidentiality, integrity and availability, together with compliance mechanisms ensuring its implementation.

Examples of an authentication model might be a Kerberos system or PKI use to identify users. Another piece that may be included in an authentication model is how one federates with other identity providers.

Authorization models might include something like VOMS or a central database to manage allocations and a corresponding process to decide which projects or communities get allocations. Another important process is how PIs authorize who can be on their projects.

Access control models may be as simple default file permissions for protecting user home directories and shared project spaces on filesystems.

Confidentiality models might describe how job and user details are hidden from the public or other users.

Integrity models may be as simple as providing tools for integrity at rest or in transit (e.g., encrypted GridFTP) that users can use to ensure data integrity. It does not require controls to be mandatory.

Examples of compliance mechanisms are top-level security policies, resource provider agreements, and terms of service that allow the organization to enforce policies for entities bypassing the model. These are the foundational organization commitments to compliance, not the technical mechanisms of enforcement itself.

[OS2]

A process that ensures that security patches in operating system and application software are applied in a timely manner, and that patch application is recorded and communicated to the appropriate contacts.

A simple patch management process might be regular vulnerability scans, with a process to assign tickets to owners, and regular reviews of tickets to ensure that they are resolved within timelines specified security policies. Sometimes this may be the responsibility of the the distributed infrastructure, but other times it may be the responsibility of service operators. Patch management policies may differ for different classes of resources, too.

Recording and communication could be as simple as assigning tickets to appropriate service operators.

[OS3]

A process to manage vulnerabilities (including reporting and disclosure) in any software distributed within the infrastructure. This process must be sufficiently dynamic to respond to changing threat environments.

This item differs from the patch management process in that it is about software owned or distributed by the infrastructure to the resource providers. In OS2 we might be talking about an XSS flaw in a central user portal or website for the infrastructure, whereas here we might be talking about accounting or job submission software pushed out to all the service operators.

This process could be as simple as a regular meeting to discuss new vulnerabilities, e.g., the latest OpenSSL flaws, to determine the impact on software distributed by the infrastructure along with an email list to distribute such information to each service operator.

Dave, I don't know what to say about "dynamic".

DaveK - Me neither! I guess that we meant that because of changing threats it may be necessary to modify the process and this should be possible

Meeting 8 July 16 - what about using the words "flexible and adaptive".

Adam - Can you give an example of being flexible to a changing threat environment or a process that is not?

[OS4]

The capability to detect possible intrusions and protect the infrastructure against significant and immediate threats on the infrastructure.

This does not mean the ability is there to detect all kinds of attacks or prevent them. It could be something as simple as detecting brute-force login attempts or compromised accounts and a mechanisms to lockout accounts manually or automatically.

Dave, I don't know how useful this is without agreeing on a few required threats and actions. Maybe you should be able to block IPs or networks, detect brute-force attacks, lockout accounts, and detect compromised accounts. I don't know what others count as significant.

DaveK - Minutes of 1st June 2016 meeting.

What about IDS? Do we mean host-based or network-based? Best practice would be to implement at least something in this area.
Eli: Can also be done after the event by analysing log files.
Questions like "can you detect brute-force SSH attacks? Do you have centralised logging? Can you correlate these logs?
We can put details in the guidance document. It doesn't all have to be done - the main document needs to stay light-weight.

Meeting 8 July 16 - Alf - Good to describe best practices and things that have been found to work. DaveK - main thrust is to gather evidence that an infrastructure has addressed the issue.

Adam - I find this far too broad to be useful. You could monitor syslogs, but have no host-based IDS on endpoints. You could monitor networks, but not host-data. You could monitor border traffic, but not internal. You could monitor central services run by the infrastructures, but the service operators at independent organizations vary. You might be able to detect brute-force SSH attempts, but not other scans. I imagine what is considered IDS by CERN vs. EGI is very different, too. I would consider scoping this to particular threats or changing it to something about maintaining the log reords necessary to investigate an intrusion.

[OS5]

The capability to regulate the access of authenticated users.

There simply needs to be a technical mechanism to suspend access and terminate existing sessions and jobs in an emergency.

[OS6]

The capability to identify and contact authenticated users, service providers and resource providers.

Identifying users could be as simple as having unique usernames tied to email addresses. Each resource provider should have a contact for security incidents recorded in a central place as well as the admin for each service. This could simply be a spreadsheet in a shared location.

[OS7]

The capability to enforce the implementation of the security policies, including an escalation procedure, and the powers to require actions as deemed necessary to protect resources from or contain the spread of an incident.

Enforcement may just be the ability to remove individuals and resource providers from the infrastructure for violating policies. Resource providers might locally still allow a user even if removed from the infrastructure.

An escalation procedure could simply be a chain of command to escalate noticed policy violations to senior levels of management with the authority to censure violators.

Emergency powers could simply be a way for incident response teams to disable accounts directly or remove authorizations for the infrastructure. Even if they cannot remove all access at a single resource provider, they should be able to remove users from centralized authentication, authorization and access control to limit the spread of an instance. For example, they might revoke certificates and access to a user portal for a user, while the individual resource providers retain control of local credentials to other services. Critically, an infrastructure should be able to contain a compromise to their infrastructure and from spreading to other infrastructures, .e.g, by revoking certificates or disabling accounts in their identity provider.

Incident Response

[IR1]

Security contact information for all service providers, resource providers and communities together with expected response times for critical situations.

A simple spreadsheet or wiki page with security contacts for the resource providers and the owners/operators of any services suffices.

Dave, what do we mean by communities?.

DaveK - A community is a grouping of end-users. Could be a Research Infrastructure, a Virtual Organisation or an application community, often this is the entity to which resources are allocated and access is granted. There is probably a definition in the SCI V1 document glossary - need to check

Expected incident response times for an infrastructure must be documented and shared, and do not necessarily need formal SLAs, MOUs, charters, etc.

Meeting 8 Jul 16 - Warren - LIGO has a hierarchy of contact points

I guess I am still confused. Let's take XSEDE as an example, anyone can create an account, and if a PI adds them to an allocation they are in our user community. If they are at a US university, I suppose we have security contacts through REN-ISAC and expected response times. If they aren't then I think we can only be guaranteed a method to contact the user. I would expect OSG and EGI to be similar w.r.t. end users.

[IR2]

A formal Incident Response procedure. This must address: roles and responsibilities, identification and assessment of an incident, minimizing damage, response & recovery strategies, communication tools and procedures.

Do you have answers to the following questions?

Who might be pulled into an incident response activity and what are their responsibilities?
What counts as a real incident? How do you rate the criticality?
How do you contain common kinds of incidents, such as, account compromise?
How do you determine when a service can be returned to normal operations or an account restored?
How do you securely communicate with everyone one who is investigating and responding to an incident?

[IR3]

The capability to collaborate in the handling of a security incident with affected service and resource providers, communities, and infrastructures.

I don't really know what is here that isn't already covered by procedures and communication channels. If this is about communicating with external infrastructures, then maybe all it is about is having a security point of contact and participating in relevant trust groups –Adam.

DaveK - I think IR2 was aimed at having the procedure to handle incidents inside your infrastructure. IR3 is more about the management backing and the policy and procedures to do the "collaboration" with others

Adam- Well, I think in v2 we might want to state that this is about collaborating with external infrastructures rather than within an organizational boundary like EGI or XSEDE.

[IR4]

Assurance of compliance with information sharing restrictions on incident data obtained during collaborative investigations. If no information sharing guidelines are specified, incident data will only be shared with site-specific security teams on a need to know basis, and will not be redistributed further without prior approval.

A good privacy policy would cover this, but so would an understanding that the security team has some autonomy and shares on a need-to-know basis.

Some explanations from Dave Kelsey (my personal views - recalling the history)

Space shortcuts

Page tree

General Clarifications

How do we combine subcategories into a score?

Standardize Language

Changes made by Sirtfi

Base-level Examples

Operational Security

[OS1]

[OS2]

[OS3]

[OS4]

[OS5]

[OS6]

[OS7]

Incident Response

[IR1]

[IR2]

[IR3]

Space shortcuts

Page tree

Guidance for SCI version 1

General Clarifications

How do we combine subcategories into a score?

Standardize Language

Changes made by Sirtfi

Base-level Examples

Operational Security

[OS1]

[OS2]

[OS3]

[OS4]

[OS5]

[OS6]

[OS7]

Incident Response

[IR1]

[IR2]

[IR3]