As more and more organizations move into the cloud, protecting PII and compliance are a priority of Organizations. Cloud has been a reliable place for storing huge amounts of data, but the management of the storage of sensitive data, backed up by compliance regulations, has been a challenge for the organizations. This article discusses one of the AWS services, that allows organizations to automatically discover, classify and protect sensitive data in the Cloud.
Table of Contents Show
Personally identifiable information (PII)
PII stands for personally identifiable information — any data that can be used to identify a specific person. The most common forms of PII include things like Social Security numbers, email addresses, and phone numbers. PII can also refer to digital identifiers, such as biometric data, geolocation, user IDs, and an IP address.
What counts as PII?
Sensitive PII refers to any information that has “legal, contractual, or ethical requirements for restricted disclosure,”. This includes things like Social Security numbers, passport information, bank details or a credit card number, and medical records covered under HIPAA.
Non-sensitive PII is any information that can be found in the public record, such as in a phone book or in an online directory like LinkedIn. Non-sensitive PII includes things like date of birth, address, religion, ethnicity, or a business phone number.
Compliance requirements
Data compliance refers to the act of an organization following the rules and guidelines set forth by laws, regulations, and policies governing the collection, storage, processing, and sharing of data. This encompasses adherence to regulations related to data protection, such as GDPR and CCPA, as well as industry-specific regulations like HIPAA for healthcare. The organizations who gather sensitive PII of their clients are required by the Law to adhere to these compliance while collecting, storing, processing and sharing the data.
Storing and protecting sensitive data in Cloud
AWS Simple Storage Service
AWS S3 is a storage service where organizations of all sizes and industries store data of virtually any size and type. The 11 9’s SLA promised by AWS, virtually unlimited storage, high availability, and durability makes it an ideal place to store data in Cloud. AWS S3 stores data in containers called Buckets.
AWS Macie
A one-liner definition of AWS Macie is it is a security service that provides data protection in the cloud.
AWS defines Amazon Macie as a data security service that discovers sensitive data by using machine learning and pattern matching, provides visibility into data security risks, and enables automated protection against those risks. AWS Macie generates a finding to be reviewed and remediate as necessary if it detects a potential issue with the security or privacy of the data, such as a bucket that becomes publicly accessible.
AWS Macie Currently Only Supports S3 Or The Amazon Simple Storage Service and AWS Macie is a regional service. This ensures that the analyzed data doesn’t cross AWS regional boundaries and stays in the regions.
AWS macie helps answer following questions:
- In my AWS Simple Storage service (S3) buckets, what data do I have?
- What is its location?
- Data is exchanged and kept in two ways: publicly and privately.
- What methods can I use to classify data in real-time?
- What personally identifiable information (PII) or protected health information (PHI) could be made public?
- How do I create remediation workflows for my security and compliance requirements?
Amazon Macie Features
- Automate the discover sensitive data
- Discover a variety of sensitive data types
- Evaluate and monitor data for security and access control
- Review and analyze findings
- Monitor and process findings with other services and systems
- Centrally manage multiple Macie accounts
AWS Macie in Action
Setup and Environment Configuration
I have created a scenario, where I have few S3 buckets on my AWS account and I have enabled macie on my AWS account to demonstrate how AWS Macie works. I have arranged the experiment with two anomalies: 1. The buckets are not correctly tagged and 2. sensitive data in S3 buckets.
Services used:
- AWS Macie: Run automated sensitive data discovery
- S3: S3 is used as the storage unit. Amazon Macie runs sensitive data discovery jobs on S3 buckets.
- KMS: Used to encrypt the S3 buckets. The information on encrypted and unencrypted buckets are visible on Amazon Macie summary dashboard.
- Amazon Event Bridge: Formally known as CloudWatch Events, gathers the events in Amazon Macie and triggers a Lambda
- AWS Lambda: Function programmed to trigger a SNS topic
- and SNS: Send alert to subscriber
Best practice
To access the findings and enable long-term storage and retention of them, a S3 bucket could be configured and enabled encryption with the very purpose. After configuring the bukcet for long-term storage, Macie starts writing the discovery results to JSON Lines files and adds to the S3 bucket as GNU Zip (GZ) files. Consequently, the S3 bucket can serve as a definitive, long-term repository for all the discovery results. If the user doesn’t wish to configure this kind of repository for your discovery results, Macie stores the results for 90 days.
The detailed process and concept for configuring a repository for discovery results could be found here: Storing and retaining sensitive data discovery results with Amazon Macie
AWS Macie components
- Macie Summary dashboard
The Summary dashboard contains information on the total number of buckets in the AWS account, along with information on Sensitive and Non-sensitive data.
The summary dashboard also answers the following questions about the S3 buckets in your AWS account:
- Which buckets are publicly accessible?
- Which buckets are encrypted and using which method?
- Which buckets are shared inside and outside of your account?
- Macie jobs
A Macie jobis a series of automated processing and analysis tasks that Macie performs to detect and report sensitive data in S3 objects. Each job provides detailed reports of the sensitive data that Macie finds and the analysis that Macie performs. The jobs could be scheduled to perform daily, weekly and monthly. Additionally, a job could be run on demand labelled as “One-time”
- Macie findings
Once the Macie job has been triggered, Macie starts going deep into the S3 buckets to discover sensitive data and other anomalies. An example of sensitive data discovery with Macie is shown below:
Alert management with SNS
As soon as an anomaly is discovered by Amazon Macie, the event is gathered by Amazon EventBridge, which in turn triggers a Lambda function. The Lambda function is programmed to trigger a SNS topic, to forward the alert notification to a subscriber.
Manage multiple AWS Accounts
Amazon Macie allows central management of automated protection on multiple AWS accounts. With this configuration, a designated Macie administrator can assess and monitor the overall security posture of all of the AWS S3 buckets and discover sensitive data within an organization’s S3 buckets. The detailed description for managing multiple AWS accounts with Amazon Macie can be found on managing multiple Amazon Macie accounts
Similar services
Amazon GuardDuty
There are few other security services in AWS which use machine learning and pattern matching algorithms to detect anomalies for the data stored in Cloud. Amazon GuardDuty is a service that helps detect potential threats in AWS accounts and workloads by continuously monitoring them. It provides security findings that can be used to gain visibility into potential security risks and remediate them effectively. Amazon GuardDuty goes a step further and analyzes not only S3 buckets, but CloudTrail events, VPC and DNS Query logs.
Amazon GuardDuty performs activities such as monitoring for Abnormal API activity, attempts to disable AWS CloudTrail logging, potential unauthorized deployment and compromised instances, S3 bucket compromise and monitor EKS audit logs.
The difference here however is the use case. Amazon Macie is used to automatically discover, classify and protect sensitive data, whereas Amazon GuardDuty is used to discover potential threats to the environment by looking into CloudWatch logs, VPC logs and DNS Query logs.
Integration with AWS Security Hub
By integrating with AWS Security Hub, Macie can be utilized to consolidate and prioritize security alerts and findings from various AWS security services into a centralized location within the AWS environment. Security Hub is a single platform that helps aggregate, organize, and manage these alerts for improved visibility and effective threat response.
Conclusion
While there are a handful of security services in AWS, it is important to comprehend what each of the security services are used for. Amazon Macie helps organizations gain greater visibility and control over their sensitive data in AWS, and helps them meet compliance requirements and reduce the risk of data breaches.