Scalable Data Architecture

  • Checklist / Questions to ask your Development team

    • Are you collecting the absolute minimum amount of data needed ?

    • Have you reviewed the data to used and assessed it

    • If there are human subjects data involved eg. affective data have they given informed consent, where subjects affirmatively opt-in and have a clear understanding of the data uses to which they consent?

    • How did you confirm credibility of the data and its source.

    • Is the data source compliant with regulations like GDPR?

    • What is the type of source -static(files) or real-time streaming(sensors)?

    • How many sources are you using?

    • Have you identified the principles of legality, legitimacy, and necessity when collecting and using personal information.

    • Have you strengthen privacy protection for special data subjects such as minors.

    • Have you strengthen technical methods, ensure data security, and be on guard against risks such as data leaks.

    • What cloud services are you planning to use for data collection?

    • To what extent do you allow data subjects or others affected by affective computing to be in control over when and how tech is used to infer or influence their emotions or affective state, and whose responsibility is it
      to ensure that they are?
      • Do you let data subjects agency over inferences about their emotion?
      • When could telling data subjects inferences about their own emotion/affect be harmful?

    • Have you calculated bias in the different data set to be used and taken steps to mitigate those? (types of bias e.g., stereotype perpetuation, confirmation bias, imbalanced classes, or omitted confounding variables)?

    • Have you minimized exposure of personally identifiable information (PII) eg. removing sensitive data not relevant for analysis?

    • How did you decide on what training data to be used?

    • Do you plan to share your data collected or output prediction data with 3rd parties? How is consent transferred?

    • Do we have a plan to protect data (e.g., encryption, access controls on internal users and third parties access logs)

    • Do we have a mechanism through which an individual can request their personal information *eg affective can be removed?

    • Do you have an expiry / termination date associated with all your datasets?

    • Did you introduce bias in your training data or labels represents stereotypes (eg. gender, ethnic) . The result of the bias amplification means that simply leaving the datasets as-is because it represents “reality”… is not the right approach because AI distorts the already imbalanced perspective especially with affective data you have a method by which people can correct errors in input data, training data, or in output decisions.

    • describe the training data including how, when, and why it was collected and sampled.

    • describe how and when test data about an individual that is used to make a decision is collected or inferred.

    • What recommendations and assessments, can be used to verify the origin and quality of the data used in an AI system?

    • How can identifying gaps or discrepancies in the data help you build a more trustworthy model?

    • Disclose the sources of any data used and as much as possible about the specific attributes of the data. Explain how the data was cleaned or otherwise transformed.

    • If an individual has or should have the right to decide whether and when to reveal certain information,
      how does affective computing impact that right?

    • Does this use of affective computing impact an important right or opportunity, like access to jobs, housing or education?

    • Does the data being used or the inference being made, e.g. depression detection, reveal sensitive health information or other information that should have be treated with special protection?

    • Was the data collected, eg. ace images or be used in ways that threaten other rights like freedom of assembly, speech or religion?

    • Have you considered that using emotion or affect data can create risks to privacy even if the software never accessed identifying or identifiable information?

    • Are you using personally identifiable information (PII) data
      in a compliant way ie includes informed consent, and for data subjects to
      have the right to withdraw that consent at a future
      date (withdrawal being no more difficult than initial provision of consent).

    • Does your consent form describe: data retention period, data-sharing policy (including with which third parties this data will be shared) and process to protect, secure and store the data

    • Have you looked alternative approaches, such as ‘differential privacy’
      approaches that focus on gaining insight from a top - down aggregate view of data without needing specific details about every record, ie individual, in the data set

    • Are you using any synthetic data to train models

    • Who has access to the inferences about the data subject’s emotions

    2. Tools to be used?

    3. Key Measure Metrics

    • Data Drift evaluation

    • Fairness Evaluation

    4, Use Cases

    4. Related Information