Technical Solution for Secure Data Collection in K8s Scenario， 해시게임 Security data is the foundation of K8s security monitoring,
If there is no data, it is “a clever woman can’t cook without rice”,
Any advanced monitoring strategy is out of the question.
Therefore, first of all, we will introduce K8s-related security data sources and related collection technologies.
secure data source
In Kubernetes, API Server is the entry point for K8s cluster resource change and query.
All queries and modifications to the cluster state are made by sending requests to the API Server,
The source of requests to the API Server can be divided into 4 categories:
Control plane components, such as Scheduler, various Controllers, and Api Server itself.
Various agents on the node, such as Kubelet, Kube-proxy, etc.
Other services of the cluster, such as Cordns, Ingress-controller, various third-party Operators, etc.
External users, such as operators, go through Kubectl.
Kubernetes audit logs are structured logs generated by API Server.
Records access operations to Api Server (including time, source, operation result, user who initiated the operation, resource of the operation, and request/response details, etc.).
Through the audit log, changes to the cluster state can be traced;
Understand the running status of the cluster; troubleshoot exceptions;
Discover potential security, performance risks, and more for your cluster. Including but not limited to the following behaviors:
What change events have occurred to the cluster currently/historically?
Who are these change operators, system components, or users, and which system components/users?
What are the details of the important change event, such as which parameter in the POD was modified?
What is the outcome of the event, success or failure?
Where the operating users come from, inside or outside the cluster.
Events are mainly used to record state changes that occur in the K8s cluster.
As big as the cluster node is abnormal, as small as the Pod startup, the successful scheduling, etc.
The event details the time, component, level (Normal, Warning, Error), type, and detailed information of the cluster state change.
Through events, you can know the entire life cycle of application deployment, scheduling, running, stopping, etc.
It is also possible to understand some exceptions that are happening in the system through events.
K8s events are stored in etcd, which is only saved for 1 hour by default,
Since etcd does not support some complex analysis operations,
Only very simple filtering methods are provided, such as by Reason, time, type, etc.
At the same time, these events are only passively stored in etcd, and do not support the active push to other systems.
Usually, they can only be viewed manually.
In fact, we have a very high demand for the use of events. The more typical scenarios are as follows:
Real-time alarms for abnormal events in the system, such as Failed, Evicted, FailedMount, FailedScheduling, etc.
Usually, troubleshooting may look for historical data,
Therefore, it is necessary to query events over a longer time range (days or even months).
Events support categorization statistics, such as the ability to calculate the trend of events and compare it with the previous time period (yesterday/last week/before release),
In order to make judgments and decisions based on statistical indicators.
Support different people to filter and filter according to various dimensions.
Support custom subscription to these events for custom monitoring,
In order to integrate with the company’s internal deployment operation and maintenance platform.
By default, Kubernetes events only focus on container management-related issues,
No more detection capabilities are provided for hardware, operating system, container runtime, and dependent systems (network, storage, etc.).
NPD (node-problem-detector) is a tool for Kubernetes node diagnosis,
Node exceptions can be converted into Node events and pushed to APIServer,
It is handed over to APIServer for unified management of events.
NPD supports a variety of exception checks, such as:
Basic service problem: NTP service is not started.
Hardware problems: CPU, memory, disk, network card damage Kernel problems: Kernel hang, file system damage.
Container runtime issue: Docker hangs, Docker fails to start.
After that, with the help of the open-source event tool Kube-eventer,
Offline cluster events to DingTalk, SLS, Kafka, and other systems,
And provide different levels of filtering conditions, to achieve real-time event collection, directional alarm, and asynchronous archiving.
Ingress in K8s is just a declaration of an API resource. The specific implementation needs to install the corresponding Ingress Controller.
The Ingress Controller takes over the Ingress definition and forwards the traffic to the corresponding Service.
At present, there are many implementations of the Ingress Controller, and the most commonly used is Nginx Ingress Controller.
Logging and monitoring are the basic functions provided by all Ingress Controllers.
Logs generally include access log (Access Log), control log (Controller Log), and error log (Error Log),
Monitoring mainly extracts some Metric information from logs and Controllers.
Among these data, access logs have the largest magnitude,
the most information, and the highest value.
Generally, Layer 7 access logs include URL, source IP, UserAgent, status code, incoming traffic, outgoing traffic, response time, etc.
For forwarding logs such as the Ingress Controller, additional information such as the forwarded service name and service response time is also included.
From this information, we can analyze a lot of information, such as:
PV, UV of a website visit;
Geographical distribution and device-side distribution of access;
The error rate of website visits;
The response delay of the backend service;
Different URL access distribution.
CIS Kubernetes Benchmark is a series of security configuration recommendations for building a safe and reliable Kubernetes cluster launched by CIS.
K8s users can build secure K8s clusters based on these specifications.
However, it is obviously inappropriate to manually compare the recommendations of the security configuration rules one by one.
Usually combined with some inspection tools.
security-inspector is a multi-dimensional scanning tool for K8s Workload configuration.
You can view the inspection scan results in the inspection report, including a health check, mirroring, network, resource, security, and other scan information.
In addition, other open-source projects such as Kube-bench and Kube-hunter are also optional CIS rule inspection solutions.
Falco is a cloud-native runtime security open-source project for monitoring abnormal runtime activities of applications on Kubernetes.
Falco monitors file changes, network activity, process tables, and other data for suspicious behavior in kernel mode.
And can send alerts via a pluggable backend.
The following anomalies are easily detected with Falco:
Shell running inside the container;
The server process spawns a child process of an unexpected type;
Sensitive file reading (such as /etc/shadow);
non-device files are written to /dev;
The system’s standard binaries (such as ls) generate outbound traffic;
Features of K8s secure data source
Above we have listed some common data sources in K8s security monitoring scenarios.
And each log has different characteristics.
We can find that there are many types of security data, many sources, and different formats.
one. In summary, it has the following characteristics:
Security data types include logs, metrics, and events.
Security data may come from a file, from standard output or standard error,
Maybe even standard protocols like Syslog.
Secure text data may exist in files inside the container or in files on the host.
Logs involving data plane traffic, such as ingress access logs, often have a large amount of data.
As a necessary log for cluster security audits, the audit log is extremely important.
It needs to be stored for a long time span (at least 180 days is required to be stored for Equal Protection 2.0), and there must be no loss of collection.
In order to collect security data more comprehensively, it is necessary to have a security data collector with powerful performance, comprehensive ecological support, and native K8s support.
The collector needs to have the following capabilities:
The comprehensiveness of container runtime support can support Docker, Containers, and other runtimes.
K8s provides powerful dynamic expansion and contraction capabilities, but it also brings difficulties to data collection.
Therefore, the collector needs to adapt to the dynamic characteristics of the container.
Some security data is triggered by jobs, and this type of task has a short life cycle.
Collectors need to provide the collection capabilities of short-lifetime containers.
The collected data needs to have the ability to correlate the K8s context to provide convenience for subsequent analysis.
Powerful data processing capabilities can complete the processing requirements of security data without affecting performance, laying the foundation for subsequent analysis scenarios.
K8s cloud hosting services are becoming more and more popular, and it is necessary to support the collection scenarios of cloud services.