Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for sensitive data #187

Closed
14 tasks done
grassick opened this issue Apr 24, 2017 · 13 comments
Closed
14 tasks done

Add support for sensitive data #187

grassick opened this issue Apr 24, 2017 · 13 comments
Assignees

Comments

@grassick
Copy link
Member

grassick commented Apr 24, 2017

In order for mWater to be used in academic environments, we must be able to protect sensitive data such as the names of people in households and the exact locations.

The new "senstive information" feature would have the following features:

  • Individual questions can be marked as having sensitive data (exact phrasing to be decided)
  • In the case of location questions, the designer of the form can optionally specify a scrambling radius, which determines the size of the circle around the original point within which the redacted version will be randomly chosen within
  • If no radius is specified, the location is completely redacted
  • When a survey response is submitted, whether it goes to pending state or immediately to final, the data from all sensitive questions is redacted and stored securely separately
  • Redacted data is only available to the administrator of the form, not to managers, approvers or viewers
  • If a response is rejected, the redacted data is once again made available so that enumerators are able to correct any errors
  • The unredacted data is available through the dashboards, maps and datagrids, but only to administrators of the form. They must specifically select that they want the unredacted version and it is not included by default in any export

Here is the detailed list of what needs to be created:

  • Add sensitiveData field to responses table

  • Add boolean flag to questions in mwater-forms "sensitive" (+ UI in form designer)

  • Add optional sensitiveRadius, which is the distance in meters to scramble coordinates by (+ UI in form designer)

  • When a response is received by the server, if it has transitioned from draft/rejected to any other state, sensitive information should be moved from "data" to "sensitiveData" and a redacted version left in "data". Also, the flag "sensitive" should be set to true within the response-> data -> questionid structure.

  • Locations should either be scrambled within a radius or removed when they are copied out to the sensitiveData

  • When a response is received by the server, if it has transitioned the other way (to draft/rejected), the reverse procedure should happen, copying data from sensitiveData back to data

  • MWaterSchemaMap should block access to the sensitiveData field completely unless the user is an administrator of the form, not just the response

  • ResponseCollectionModel should block access to the sensitiveData field completely unless the user is an administrator of the form, not just the response

  • Add sensitiveData to the pgSchema

  • Add a section called "Sensitive Information" to the schema that gets generated by the FormSchemaBuilder that contains the unredacted sensitive information. It should only contain the questions that have been marked as sensitive. This should only be included in the schema if the person creating the schema is an administrator of the form. It should obtain its data from sensitiveData field and the column ids should generally be of the format sensitiveData:SOMEQUESTIONID:value

  • All columns added that are sensitive columns to the above schema should have sensitive: true property set (the original questions do not have it set, only the ones that point into sensitiveData)

  • Don't include any columns with the sensitive: true set by default in datagrids or in response exporting (both of which uses similar code to select all the columns to include)

  • Only display the sensitive option for questions in forms if the overall form has a special option set called sensitiveMode (stored at root of form design). Add an option in the form level options to turn it on

  • Only display the sensitive option for questions in forms if the overall form has a special option set called sensitiveMode (stored at root of form design). Add an option in the form level options to turn it on

This needs to be tested really thoroughly as it will protect people's privacy.

Let me know what questions you have, and please start each of the changes in a branch named "sensitive-data" in forms, forms-designer, portal and server.

@grassick
Copy link
Member Author

grassick commented May 5, 2017

As discussed, this is far more complex at the level of rosters. @rocketboy76 Will any of the sensitive data be in rosters for the initial release?

@rocketboy76
Copy link
Member

rocketboy76 commented May 5, 2017 via email

@grassick
Copy link
Member Author

grassick commented May 5, 2017

@broncha So, it will have to include rosters too. Here's how I think:

rosters store data as an array. e.g. for roster "r1" with questions q1, q2, it is stored:

data: { r1: [{ _id: "someid", data: { q1: { value: "xyz"}, q2: { value: "def" } }, { _id: "anotherid", data: { q1: { value: "abc" }, q2: ... } ] }

We can't keep sensitive data inside there, so we move it also to sensitiveData, making a parallel structure to data. If q2 is sensitive, sensitiveData will look like this:

sensitiveData: { r1: [{ _id: "someid", data: {q2: { value: "def" }  }, { _id: "anotherid", data: { q2: ... } ] }

The only really painful part is line 100 of the MWaterSchemaMap where we make a synthetic table out of the roster values. Not quite sure how to code that yet, but it's possible.

@grassick
Copy link
Member Author

grassick commented May 5, 2017

We also need to start thinking about how to tackle: mWater/mwater-visualization#389

@broncha broncha mentioned this issue May 11, 2017
@grassick
Copy link
Member Author

grassick commented May 11, 2017

@broncha Ok, some modifications to plans after talking with the (generally very happy) client:

In the user interface, rename "sensitive" to "confidential":

  • Rename in mwater-form-designer (already merged) (Clayton will do)

  • Rename section in FormSchemaBuilder (Rajesh)

  • Change underlying name and column from sensitiveData to confidentialData in forms + server (sorry! search and replace should do it) (Rajesh)

  • Change sensitiveMode and sensitive and sensitiveRadius to confidentialMode, confidential and confidentialRadius everywhere. (Rajesh does forms + server, Clayton does mwater-form-designer)

  • Include confidential flag in schema that FormSchemaBuilder generates in columns that are confidential (Rajesh)

  • Ensure that "add all columns" in mwater-visualization datagrid builder does not include confidential columns by default

  • https://github.com/mWater/mwater-portal/issues/981

  • https://github.com/mWater/mwater-form-designer/issues/167

  • Inform John S when all above are done (latest early afternoon Friday)

@grassick
Copy link
Member Author

Dies horribly right away when saving draft:

TypeError: Cannot read property '1527150e358149469cf43463591bb3a0' of undefined
  at ResponseCollectionModel.module.exports.ResponseCollectionModel.restoreItem (/home/clayton/dev/mWater/mwater-server/lib/models/ResponseCollectionModel.coffee:187:44)
  at ResponseCollectionModel.module.exports.ResponseCollectionModel.restoreConfidentialData (/home/clayton/dev/mWater/mwater-server/lib/models/ResponseCollectionModel.coffee:165:8)
  at ResponseCollectionModel.module.exports.ResponseCollectionModel.handleConfidentialData (/home/clayton/dev/mWater/mwater-server/lib/models/ResponseCollectionModel.coffee:118:8)
  at ResponseCollectionModel.module.exports.ResponseCollectionModel.beforeInsert (/home/clayton/dev/mWater/mwater-server/lib/models/ResponseCollectionModel.coffee:95:6)

Dies again when submitting:

ypeError: Cannot read property 'length' of undefined
  at ResponseCollectionModel.module.exports.ResponseCollectionModel.redactItem (/home/clayton/dev/mWater/mwater-server/lib/models/ResponseCollectionModel.coffee:143:9)
  at ResponseCollectionModel.module.exports.ResponseCollectionModel.redactConfidentialData (/home/clayton/dev/mWater/mwater-server/lib/models/ResponseCollectionModel.coffee:126:8)
  at ResponseCollectionModel.module.exports.ResponseCollectionModel.handleConfidentialData (/home/clayton/dev/mWater/mwater-server/lib/models/ResponseCollectionModel.coffee:120:8)
  at ResponseCollectionModel.module.exports.ResponseCollectionModel.beforeInsert (/home/clayton/dev/mWater/mwater-server/lib/models/ResponseCollectionModel.coffee:95:6)

@grassick
Copy link
Member Author

Redaction and unredaction should only happen on state transitions. If submitted directly as final, should redact. If draft is started, do not attempt to unredact.

@grassick
Copy link
Member Author

To reproduce, create a simple survey in the portal with confidential question. Run the survey using the app.

@grassick
Copy link
Member Author

Ok, so there are some issues with rosters. I've added failing tests. Trying it now with simpler non-roster test.

@grassick
Copy link
Member Author

Still crashing bad:

 "TypeError: Cannot read property 'fe39f8564449406bac49382efa294f11' of undefined",
  "  at ResponseCollectionModel.module.exports.ResponseCollectionModel.restoreItem (/home/mwater/mwater-server/lib/models/ResponseCollectionModel.coffee:184:56)",
  "  at ResponseCollectionModel.module.exports.ResponseCollectionModel.restoreItem (/home/mwater/mwater-server/lib/models/ResponseCollectionModel.coffee:176:12)",
  "  at ResponseCollectionModel.module.exports.ResponseCollectionModel.restoreConfidentialData (/home/mwater/mwater-server/lib/models/ResponseCollectionModel.coffee:167:8)",
  "  at ResponseCollectionModel.module.exports.ResponseCollectionModel.handleConfidentialData (/home/mwater/mwater-server/lib/models/ResponseCollectionModel.coffee:118:8)",
  "  at ResponseCollectionModel.module.exports.ResponseCollectionModel.beforeUpdate (/home/mwater/mwater-server/lib/models/ResponseCollectionModel.coffee:108:6)",
  "  at ResponseCollectionModel.module.exports.CollectionModel.update (/home/mwater/mwater-server/lib/models/CollectionModel.coffee:163:6)",
  "  at ResponseCollectionModel.module.exports.CollectionModel.upsert (/home/mwater/mwater-server/lib/models/CollectionModel.coffee:111:8)",

@grassick
Copy link
Member Author

I added:

  restoreConfidentialData: (doc, form) ->
    if not doc.confidentialData
      return

the if. Why are we restoring and redacting at all unless confidentialMode is on? Would really simplify if we didn't do anything unless it was on, but then I guess would need to have safeguards about turning off confidentialMode that is not allowed unless no quesiton has confidential = true. What do you think?

@broncha
Copy link
Contributor

broncha commented May 15, 2017

Can a form's confidentialMode be changed to false when there are responses already? If yes, then we will need to check that transition as well and restore confidential data.

@grassick
Copy link
Member Author

@broncha confidentialMode can no longer be turned off once deployed. That being said, let's keep the complete code for now as long as it's stable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants