Hello, I'm Nizar, an experienced Linux system administrator with strong analytical skills

My Little Approach for Defending against Multiple-Domain Attack on a Single Server in a Multi-Tenant Environment


A multi-domain attack involves a single public IP address targeting multiple domains hosted on a single server within a multi-tenant environment. The goal of this attack is typically to exhaust the server's memory resources. This is particularly effective against servers utilizing persistent PHP handlers like FastCGI or lsphp (which uses LSAPI). This post will talk about the little approach to defend against Multiple-Domain Attack.

I'm going to share my experience with a multi-domain attack on a single server. It was in multi-tenant environment, which means a lot of different websites were running on that one server. I'll explain the approach I took. Let's start with understanding the meaning of multiple-domain attack in multi-tenant environment first.

What is Multiple-Domain Attack?

Understanding Multi-Domain Attacks

A multi-domain attack involves a single public IP address targeting multiple domains hosted on a single server within a multi-tenant environment. The goal of this attack is typically to exhaust the server's memory resources. This is particularly effective against servers utilizing persistent PHP handlers like FastCGI or lsphp (which uses LSAPI).

A Quick Reminder About PHP Handlers

Before delving deeper into multi-domain attacks, let's review the role of PHP handlers. A PHP handler is an Apache module that enables Apache to interpret PHP code after a request has been processed by the Multi-Processing Module (MPM).

Why is it causing out of memory?

Persistent PHP handlers keep PHP processes running for multiple executions, which can lead to a significant increase in memory usage by the PHP cache.

The Example of Multiple-Domain Attack

34.73.9.170 abcde.com - - [12/Nov/2024:15:21:17 +0700] "GET /cgi-sys/suspendedpage.cgi/test/wp-includes/wlwmanifest.xml HTTP/1.1" 200 5137 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36" 1.495 0.000 0.236 0.262 MISS
34.73.9.170 fjgk.com - - [12/Nov/2024:15:21:17 +0700] "GET /cgi-sys/suspendedpage.cgi/wp2/wp-includes/wlwmanifest.xml HTTP/1.1" 200 5137 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36" 1.249 0.000 0.249 0.262 MISS
34.73.9.170 asdasd.com - - [12/Nov/2024:15:21:18 +0700] "GET /cgi-sys/suspendedpage.cgi/media/wp-includes/wlwmanifest.xml HTTP/1.1" 200 5137 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36" 1.239 0.000 0.300 0.311 MISS
34.73.9.170 dassds.com - - [12/Nov/2024:15:21:18 +0700] "GET /cgi-sys/suspendedpage.cgi/media/wp-includes/wlwmanifest.xml HTTP/1.1" 200 5137 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36" 1.387 0.000 0.272 0.276 MISS
34.73.9.170 asdasdo.com - - [12/Nov/2024:15:21:18 +0700] "GET /cgi-sys/suspendedpage.cgi/media/wp-includes/wlwmanifest.xml HTTP/1.1" 200 5137 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36" 1.391 0.000 0.216 0.221 MISS

Note: the domain provided in the example are dummy domains

How to Detect Multiple-Domains Attack?

Step One: Pinpointing the IP Address with The Most Requests

The first step in detecting a multi-domain attack is pinpointing the IP address responsible for the most requests to your web server. You can achieve this in a few ways.

One method involves analyzing the access log. By extracting the first column (IP address) from the last 5,000 lines, you can sort the data, identify unique IPs and their corresponding request counts, sort by count, and then retrieve the final entries.

tail /var/log/nginx/access.log -n 5000 | cut -d ' ' -f1 | sort | uniq -c | sort -n | tail

Alternatively, you can modify the tail command to use either tail -1 or tail -n 1, depending on your specific needs. Additionally, you can utilize awk and sort to achieve the same result.

tail -n 5000 /var/log/nginx/access.log | awk '{print $1}' | sort | uniq -c | sort -n | tail

If you want to get it done as quickly as possible, just use awk.

tail -n 5000 /var/log/nginx/access.log | awk '{count[$1]++} END {for (ip in count) print count[ip], ip}' |  sort -n | tail

Step Two: Getting The Count of Domain Name Accessed by The IPs

The second step involves encapsulating the previously mentioned command within a function. This function will be called, and its results will be iterated through in a loop.

Note: I will never mention how it is writed in code, but i will explain in the human language

The second step involves iterating through the list of IPs identified in the first step. For each IP, we'll create a regular expression pattern using the format ^IP to match the IP address in the access log.

Next, we'll extract the last 4,000 or 5,000 lines from the access log, filtering out any irrelevant keywords, tokens, or data. We'll further refine the results by applying filters based on the current date and a specific time range.

Finally, we'll compare the result with the regex pattern for the IP address, extracting the second column (domain name), sorting the results numerically, and getting the number of lines. This process will provide a count of the domains accessed by each specific IP address.

Step Three: Suspect Detection Using Domain Access Counts

The third step involves defining a threshold for the number of domains accessed by an IP address. This threshold helps us determine if a multi-domain attack is occurring. A common range for this threshold is between 4 and 6 domain accesses.

Next, we compare the domain access count obtained in step two with this threshold. If the count exceeds the threshold, we suspect the corresponding IP address is involved in a multi-domain attack.

Based on this suspicion, we can then take appropriate action. A common practice is to block the IP address to mitigate further potential damage. However, you have the flexibility to decide on the most suitable course of action for your situation.

Then is this enough for defending against multiple-domain attack? No, we still have another thing that need to be done.

Checking Server Hostname Accessibility

After identifying a suspected IP involved in a multi-domain attack, it's crucial to verify that our server hostname, specifically the one associated with the PHP path file, remains accessible without timeouts. This verification is essential to simulate the impact of the multi-domain attack on the persistent PHP handler. By testing the hostname using the PHP path file, we can assess whether the attack has negatively affected the server's ability to process PHP requests.

Why Verify Hostname Accessibility?

Even after blocking the suspicious IP, Apache's child processes might take some time to gracefully restart. This delay can lead to temporary inaccessibility or timeouts. By checking the server hostname's accessibility, we can ensure that the server is fully responsive and functioning as expected.

How Can We Check the Server Hostname Accessibility?

curl is a powerful tool that can be used to assess the accessibility of our server hostname. It offers several options and attributes to help us achieve this.

  • The -w Option for Detailed Information The -w option in curl allows us to display specific information on the standard output after a completed transfer.

    According to the man curl documentation:

        - w, --write-out <format>
    
        Make curl display information on stdout after a completed transfer. The format is a string that may contain plain text mixed with
        any number of variables. The format can be specified as a literal "string", or you can have curl read the format from a file with
        "@filename" and to tell curl to read the format from stdin you write "@-".
    
        The variables present in the output format are substituted by the value or text that curl thinks fit,  as  described  below.  All
        variables are specified as %{variable_name} and to output a normal % you just write them as %%. You can output a newline by using
        \n, a carriage return with \r and a tab space with \t.
    
        The output is by default written to standard output, but can be changed with %{stderr} and %output{}.
    
        Output  HTTP  headers  from  the most recent request by using %header{name} where name is the case insensitive name of the header
        (without the trailing colon). The header contents are exactly as sent over the network,  with  leading  and  trailing  whitespace
        trimmed (added in 7.84.0).

    And what is the variable in -w that will be used?

    For our purpose, we'll use the following variables:

    • time_total: This variable provides the total time, in seconds, that the entire request process took, from establishing a connection to receiving the response.
    • http_code: This variable represents the numerical HTTP response code.

Timeout Handling with -m

While curl can provide valuable information, it can sometimes take an extended amount of time when dealing with unresponsive web servers. To avoid potential delays, we can set a timeout limit using the -m option. By specifying -m 1, we limit the request to 1 second. If the request exceeds this timeout, curl will exit with a timeout code of 28.

By combining the -w option, the specific variables time_total and http_code, and a timeout limit with -m, we can effectively assess the accessibility of our server hostname.

The full curl will be like this:

    curl -s -o /dev/null -m 1 -w "time_total: %{time_total}, status_code: %{http_code}\n" "$(hostname)"/x.php -H "Host: $(hostname)"

To determine if a web server restart is necessary, we can establish the following conditions:

  • Non-200 Status Code: If the HTTP status code returned by the curl request is not 200 (indicating a successful response), the web server should be restarted.

  • Time Limit Exceeded: If the time_total returned by the curl request exceeds 2 seconds, it suggests a performance issue, and the web server should be restarted.

  • Timeout: If the curl command exits with a code of 28, indicating a timeout, the web server should be restarted. By implementing these conditions, we can proactively address performance issues caused by the multi-domain attack and ensure the stability of our web server.

Is this sufficient? No, because in the case of a multi-domain attack, the child processes of the web server will trigger the spawning of multiple PHP processes (in the case of LSPHP and Apache). Additionally, PHP-FPM may not be restarted or terminated. Both will poses a risk of running out of memory, we will talk about this in the next section.

Out of Memory Condition Due to Excessive Spawning of Persistent PHP Processes

Out of Memory Handling in cgroup

In a multi-tenant environment, cgroups are typically used to limit CPU and memory usage for each user ID. When an out-of-memory (OOM) condition occurs, cgroups trigger the OOM killer to terminate the process of the specific user ID with the highest OOM score. However, in the case of a multi-domain attack involving multiple user IDs, the cgroup may be slow to calculate the scores, resulting in a delayed response from the OOM killer. Consequently, an OOM condition can still occur. But beside cgroup, linux still has a mean to handle OOM.

Checking PHP Persistent Memory Usage

For your information, if you run the command ps auxf | grep "lsphp\|php", you will not see the total memory statistics that include the cache usage of persistent PHP, as the ps command does not aggregate all of that data. However, if you are using CloudLinux other than a solo plan, you can obtain memory usage statistics that include the cache usage of persistent PHP, particularly for lsphp. This is because the memory usage is aggregated from /proc/[pid]/task/[pid]/statm, which you can verify by checking the execution of lveps with strace. If you want to calculate the total memory usage, you can use the following command:

lveps -p | grep -v MEM | awk '$1 ~ /^[a-zA-Z0-9]+$/ { sum += $8 } END { print sum }'

You will obtain a number that is closer to the buffer usage shown in top.

Out of Memory Handling by Using Kernel Variable in Linux

In Linux, there is three kernel variable related to virtual memory subsystem that handling Out of Memory, which are:

  • vm.oom_dump_tasks (Default value is 1) [1]

    Enabling a system-wide task dump when the kernel performs OOM-killing.

    • If this is set to zero, this information is suppressed
    • If this is set to non zero, this information is shown whenever the OOM-killer actually kills a memory-hogging task
  • vm.oom_kill_allocating_task (Default value is 0) [2]

    Enabling or disabling killing the OOM-triggering task in out-of-memory situations. If this is set to zero, the OOM killer will scan through the entire tasklist and select a task based on heuristics to kill. If this is set to non-zero, the OOM killer simply kills the task that triggered the out-of-memory condition which avoids the expensive tasklist scan.

  • vm.panic_on_oom (Default value is 0) [3]

    This enables or disables panic on out-of-memory feature. If this is set to zero, the kernel will kill some rogoue process with oom_killer so the system will survive. If this is set to one, the kernel panics when out-of-memory happens. This will take precedence over whatever value is used in oom_kill_allocating_task.

Based on the explanation provided, with the default value of vm.panic_on_oom set to 0, the kernel will terminate certain processes that are causing an out-of-memory (OOM) condition using the OOM killer, allowing the system to survive. As we discussed earlier regarding the OOM killer in cgroups, it operates based on OOM scores. The Linux kernel employs the same method, so in the case of a multi-domain attack involving multiple user IDs, it may also experience delays in calculating these scores. This can lead to a delayed response from the OOM killer, and as a result, an OOM condition can still occur.

All of this is based on my personal experience with out-of-memory (OOM) issues during multi-domain attacks in a multi-tenant environment. So, what is the solution?

My solution involves manually terminating processes after detecting a multi-domain attack based on the predefined threshold we established earlier. We need to kill all PHP processes before restarting the web server. You can do this using the following commands:

    kill -9 -f php

or

    kill -9 -f lsphp

In the case of PHP-FPM, you may also need to restart the PHP-FPM service.

That’s my approach, and I will conclude here. However, there is a limitation: curl can only accept non-float values for the -m option, which means there will still be a minimum downtime of one minute. Nevertheless, this is an improvement compared to the ten minutes of downtime experienced during an out-of-memory (OOM) situation. But this is also not perfect, so you can improve on your own.

And finally, I would like to reiterate that this article does not contain detailed code, allowing each of us the freedom to exercise our creativity in developing our own code

[1] "oom_dump_tasks," Sysctl Explorer. [Online]. Available: https://sysctl-explorer.net/vm/oom_dump_tasks/. [Accessed: 12-Nov-2024].

[2] "oom_kill_allocating_task," Sysctl Explorer. [Online]. Available: https://sysctl-explorer.net/vm/oom_kill_allocating_task/. [Accessed: 12-Nov-2024].

[3] "panic_on_oom," Sysctl Explorer. [Online]. Available: https://sysctl-explorer.net/vm/panic_on_oom/. [Accessed: 12-Nov-2024].

Date: November 13th at 7:08am

PREVIOUS