In today's data-driven world, ensuring compliance with data privacy regulations like the General Data Protection Regulation (GDPR) is critical for organizations. One of the key rights under GDPR is Article 17, the "Right to Erasure" or the "Right to be Forgotten." This requires organizations to erase personal data upon request, especially when the data is no longer needed or the individual withdraws consent. For businesses hosting applications in the Azure cloud environment, this means purging user data from various services, including Azure Log Analytics Workspace. In this blog post, we will explore why and how you can purge user data from Azure Log Analytics Workspace to remain GDPR compliant.
The Importance of GDPR Compliance
The GDPR applies to all organizations that process personal data of EU citizens, regardless of where the company is based. One of its core principles is ensuring that individuals have control over their personal data, including the right to have that data erased when certain conditions are met, such as:
Withdrawal of consent: If a user withdraws consent to the processing of their data, the organization must erase their information.
Data no longer necessary: If the data collected is no longer necessary for the purpose it was originally gathered for.
Data processing is unlawful: If the processing of personal data is in violation of the GDPR.
Legal obligations: If there's a legal requirement to erase the data.
In such cases, failure to erase personal data could result in severe penalties, with fines reaching up to €20 million or 4% of global annual turnover, whichever is higher.
Why Purging Data in Azure Log Analytics is Critical
Azure Log Analytics Workspace is a powerful service that allows organizations to collect and query data from their Azure resources for monitoring, troubleshooting, and performance analysis. However, the logs stored in these workspaces often contain personal information such as user IDs, email addresses, or other identifiers. If a user requests data erasure, you must ensure that these logs are purged accordingly.
Some of the typical user data that could be captured in Log Analytics logs includes:
User email addresses
User IDs or application-specific identifiers
IP addresses
Purging this information is essential for ensuring that your organization respects user privacy and meets GDPR Article 17 requirements.
Automating the Process: A Python Script for Purging User Data
Manually locating and deleting data across various Azure Log Analytics Workspaces can be time-consuming and prone to errors. To streamline this process, you can use an automated approach by developing a Python script that:
Queries logs in Azure Log Analytics Workspace based on user email addresses.
Submits a purge request to delete those logs for GDPR compliance.
Checks the status of ongoing purge requests to ensure successful completion.
import argparse
from datetime import datetime, timezone
from azure.identity import DefaultAzureCredential
from azure.mgmt.loganalytics import LogAnalyticsManagementClient
from azure.monitor.query import LogsQueryClient, LogsQueryStatus
import pandas as pd
from azure.core.exceptions import HttpResponseError
def parse_args():
parser = argparse.ArgumentParser()
group = parser.add_mutually_exclusive_group(required=True)
group.add_argument("--getlogs", action="store_true", help="Get logs")
group.add_argument("--deletelogs", action="store_true", help="Delete logs")
group.add_argument("--status", action="store_true", help="Get the status of deletion request")
parser.add_argument("--subscriptionid", type=str, help="Subscription id")
parser.add_argument("--workspace", type=str, help="Log Analytics workspace name")
parser.add_argument("--resourcegroup", type=str, help="Resource group name")
parser.add_argument("--starttime", type=str, help="Start time in yyyy-mm-dd-hh-mm-ss format (UTC)")
parser.add_argument("--endtime", type=str, help="End time in yyyy-mm-dd-hh-mm-ss format (UTC)")
parser.add_argument("--useremail", type=str, help="User email")
parser.add_argument("--workspaceid", type=str, help="Add comma separated Log Analytics workspace ids")
parser.add_argument("--purgeid", type=str, help="Add comma separated purge ids")
return parser.parse_args()
def validate_datetime_format(date_str):
try:
return datetime.strptime(date_str, '%Y-%m-%d-%H-%M-%S')
except ValueError as e:
print(f"Invalid date format for {date_str}: {e}")
exit()
def convert_to_iso_format(date):
return date.strftime('%Y-%m-%dT%H:%M:%S')
def get_credential():
return DefaultAzureCredential()
def handle_get_logs(args, credential):
if not all([args.starttime, args.endtime]):
print("ERROR: Please provide all required parameters for getting logs.")
print("Usage: python3 main.py --getlogs --starttime <starttime> --endtime <endtime> --workspaceid <workspaceid> --useremail <useremail>")
exit()
client = LogsQueryClient(credential)
starttime = validate_datetime_format(args.starttime).replace(tzinfo=timezone.utc)
endtime = validate_datetime_format(args.endtime).replace(tzinfo=timezone.utc)
if args.workspaceid is None or args.useremail is None:
print("Please provide workspace id(s) and user email.")
exit()
workspaceid = args.workspaceid.split(",")
#Modify the query based on your log format.
query = f"""AppEvents | where tostring(Properties["email"]) =~ "{args.useremail}" """
run_queries(client, workspaceid, query, starttime, endtime)
def run_queries(client, workspaceids, query, starttime, endtime):
total_rows = 0
for workspaceid in workspaceids:
try:
response = client.query_workspace(workspace_id=workspaceid, query=query, timespan=(starttime, endtime))
if response.status == LogsQueryStatus.SUCCESS:
data = response.tables
else:
print(response.partial_error)
data = response.partial_data
for table in data:
df = pd.DataFrame(data=table.rows, columns=table.columns)
print(f"Logs in Workspace ID: {workspaceid}\n")
print(df.loc[:, ['TimeGenerated', 'Name', 'OperationName']])
total_rows += len(df)
except (HttpResponseError, Exception) as err:
print("Error:", err)
print(f"\nTotal rows with user data: {total_rows}\n")
def handle_delete_logs(args, credential):
if not all([args.subscriptionid, args.workspace, args.resourcegroup, args.starttime, args.endtime, args.useremail]):
print("ERROR: Please provide all required parameters for deleting logs.")
print("Usage: python3 main.py --deletelogs --subscriptionid <subscriptionid> --workspace <workspace> --resourcegroup <resourcegroup> --starttime <starttime> --endtime <endtime> --useremail <useremail>")
exit()
client = LogAnalyticsManagementClient(credential, subscription_id=args.subscriptionid)
starttime = convert_to_iso_format(validate_datetime_format(args.starttime))
endtime = convert_to_iso_format(validate_datetime_format(args.endtime))
try:
response = client.workspace_purge.purge(
resource_group_name=args.resourcegroup,
workspace_name=args.workspace,
#Modify the filters based on your log format.
body={
"filters": [
{"column": "TimeGenerated", "operator": ">", "value": starttime},
{"column": "TimeGenerated", "operator": "<", "value": endtime},
{"column": "Properties", "key": "email", "operator": "=~", "value": args.useremail},
],
"table": "AppEvents",
},
)
response_purgeid = response.operation_id
append_to_file("purgeid.txt", response_purgeid + "\n")
print(f"Purge request submitted. Purge ID: {response_purgeid}\n")
except (HttpResponseError, Exception) as err:
print("Error:", err)
def append_to_file(filename, text):
print(f"Saving purge id to {filename} file...")
with open(filename, "a") as f:
f.write(text)
def handle_status(args, credential):
if not all([args.purgeid, args.subscriptionid, args.workspace, args.resourcegroup]):
print("ERROR: Please provide all required parameters for checking the status.")
print("Usage: python3 main.py --status --subscriptionid <subscriptionid> --workspace <workspace> --resourcegroup <resourcegroup> --purgeid <purgeid>")
exit()
client = LogAnalyticsManagementClient(credential, subscription_id=args.subscriptionid)
purgeids = args.purgeid.split(",")
for purgeid in purgeids:
try:
response = client.workspace_purge.get_purge_status(
resource_group_name=args.resourcegroup,
workspace_name=args.workspace,
purge_id=purgeid,
)
print(f"Purge ID: {purgeid}, Status: {response.status}\n")
except (HttpResponseError, Exception) as err:
print("Error:", err)
def main():
print("\n INFO: Run 'az login' before running this script.\n")
args = parse_args()
credential = get_credential()
if args.getlogs:
handle_get_logs(args, credential)
elif args.deletelogs:
handle_delete_logs(args, credential)
elif args.status:
handle_status(args, credential)
print("Operation completed. Exiting...")
if __name__ == "__main__":
main()
Below, we outline the steps to use such a script for automating log queries and purging user data in Azure.
How the Script Works
The provided Python script serves three main purposes:
Get logs: It queries logs based on a specific user's email address.
Delete logs: It submits a purge request to delete logs matching a user’s email.
Check purge status: It checks the status of a previously submitted purge request.
Prerequisites
Before running the script, ensure you have the following:
Azure CLI installed and logged in with appropriate credentials.
Python 3.10 or higher.
The script user must have the 'Data Purger' role assigned in the subscription.
Modify the log queries and filters in the script to match the format of your logs.
Example Usage
Getting Logs for a Specific User: To fetch logs for a user between specific time ranges, you can run:
python3 main.py --getlogs \ --starttime 2024-10-04-08-00-00 \ --endtime 2024-10-04-10-00-00 \ --workspaceid b152f67b-5678-408f-b708-8943264822d6,18328920-8765-4521-9aa5-2fe0aaec1c6b \ --useremail user@domain.com
Deleting Logs for GDPR Compliance: To delete logs for a specific user and submit a purge request, run:
python3 main.py --deletelogs \ --starttime 2024-10-04-08-00-00 \ --endtime 2024-10-04-10-00-00 \ --subscriptionid 0e121138-80e8-1234-bacc-c7e0b3cd8aa \ --workspace workspace-name \ --resourcegroup rg-test \ --useremail user@domain.com
The purge request will return a unique purge ID for tracking, saved in a file named
purgeid.txt
.Checking the Status of a Purge Request: To check the status of a previously submitted purge request, use the following command:
python3 main.py --status \ --subscriptionid 0e641138-90e8-1234-bacc-c7e0b3c8d8aa \ --workspace workspace-name \ --resourcegroup rg-test \ --purgeid purge-97380f07-1235-4a21-8ff7-cc092303662a
Key Considerations
Data Retention Policies: Ensure that your data retention policies for logs align with GDPR's "Right to be Forgotten" principles. In some cases, log data might be subject to legal obligations requiring retention, which could affect whether you can fulfill deletion requests.
Access Control: Ensure that only authorized personnel (those with the 'Data Purger' role) can submit data purge requests.
Logging and Auditing: Keep a record of purge requests and their status to demonstrate compliance with data privacy regulations during audits.
Conclusion
As privacy regulations like GDPR become more critical, ensuring that personal data is properly managed and deleted is paramount. For organizations using Azure Log Analytics Workspaces, implementing an automated solution to handle log queries and data purging is essential to meet compliance requirements efficiently.
Reference: