Shellcode

Shellcode is executable code intended to be used as a payload for exploiting a software vulnerability. The term includes shell because the attack originally described an attack that opens a command shell that the attacker can use to control the target machine, but any code that is injected to gain access that is otherwise not allowed can be called shellcode. For this reason, some consider the name shellcode to be inaccurate.^[1]

An attack commonly injects data that consists of executable code into a process before or as it exploits a vulnerability to gain control. The program counter is set the shellcode entry point so that that the shellcode runs. Deploying shellcode is often accomplished by including the code in a file that a vulnerable process downloads and then loads into its memory.

Common wisdom dictates that to maximum effectiveness, a shellcode payload should be small.^[2] Machine code provides the flexibility needed to accomplish the goal. Shellcode authors leverage small opcodes to create compact shellcode.^[3]^[4]

Types

Local

A local shellcode attack allows an attacker to gain elevated access privilege on their computer. In some cases, exploiting a vulnerability can be achieved by causing an error such as buffer overflow. If successful, the shellcode enables access to the machine via the elevated privileges granted to the targeted process.

Remote

A remote shellcode attack targets a process running on a remote machine – on the same local area network, intranet, or on the internet. If successful, the shellcode provides access to the target machine across the network. The shellcode normally opens a TCP/IP socket connection to allow access to a shell on the target machine.

A remote shellcode attack can be categorized by its behavior. If the shellcode establishes the connection it is called a reverse shell, or a connect-back shellcode. On the other hand, if the attacker establishes the connection, the shellcode is called a bindshell because the shellcode binds to a certain port on the victim's machine. A bindshell random port skips the binding part and listens on a random port.^[a] A socket-reuse shellcode is an exploit that establishes a connection to the vulnerable process that is not closed before the shellcode runs so that the shellcode can re-use the connection to allow remote access. Socket re-using shellcode is more elaborate, since the shellcode needs to find out which connection to re-use and the machine may have many open connections.^[5]

A firewall can detect outgoing connections made by connect-back shellcode as well as incoming connections made by bindshells, and therefore, offers some protection against an attack. Even if the system is vulnerable, a firewall can prevent the attacker from connecting to the shell created by the shellcode. One reason why socket re-using shellcode is used is that it does not create new connections and, therefore, is harder to detect and block.

Download and execute

A download and execute shellcode attack downloads and executes malware on the target system. This type of shellcode does not spawn a shell, but rather instructs the machine to download a certain executable file from the network and execute it. Nowadays, it is commonly used in drive-by download attacks, where a victim visits a malicious webpage that in turn attempts to run such a download and execute shellcode in order to install software on the victim's machine.

A variation of this attack downloads and loads a library.^[6]^[7] Advantages of this technique are that the code can be smaller, that it does not require the shellcode to spawn a new process on the target system, and that the shellcode does not need code to clean up the targeted process as this can be done by the library loaded into the process.

Staged

When the amount of data that an attacker can inject into the target process is too limited to achieve the desired effect, it may be possible to deploy shellcode in stages that progressively provide more access. The first stage might do nothing more than download the second stage than then provides the desired access.

Egg-hunt

An egg-hunt shellcode attack is a staged attack in which the attacker can inject shellcode into a process but does not know where in the process it is. A second-stage shellcode, generally smaller than the first, is injected into the process to search the process's address space for the first shellcode (the egg) and executes it.^[8]

Omelet

An omelet shellcode attack, similar to egg-hunt, looks for multiple small blocks of data (eggs) and combines them into a larger block (omelet) that is then executed. This is used when an attacker is limited on the size of injected code but can inject multiple.^[9]

Encoding

Shellcode is often written in order to work around the restrictions on the data that a process will allow. General techniques include:

Optimize for size

Optimize the code to decrease its size.

Self-modifying code

Modify its own code before executing it to use byte values that are otherwise restricted.

Encryption

To avoid intrusion detection, encode as self-decrypting or polymorphic.

Character encoding

An attack that targets a browser might obfuscate shellcode in a JavaScript string using an expanded character encoding.^[10] For example, on the IA-32 architecture, here's two unencoded no-operation instructions (used in a NOP slide):

90             NOP
90             NOP

As encoded:

Percent encoded: unescape("%u9090")
Unicode literal: \u9090
HTML/XML character reference : 邐 or 邐

Null-free

Shellcode must be written without zero-value bytes when it is intended to be injected into a null-terminated string that is copied in the target process via the usual algorithm (i.e. strcpy) of ending the copy at the first zero byte – called the null character in common character sets. If the shellcode contained a null, the copy would be truncated and not function properly. To produce null-free code from code that contains nulls, one can replace machine instructions that contain zeroes with instructions that don't. For example, on the IA-32 architecture the instruction to set register EAX to 1 contains zeroes as part of the literal (1 expands to 0x00000001).

B8 01000000    MOV EAX,1

The following instructions accomplish the same goal (EAX containing 1) without embedded zero bytes by first setting EAX to 0, then incrementing EAX to 1:

33C0           XOR EAX,EAX
40             INC EAX

Text

An alphanumeric shellcode consists of only alphanumeric characters (0–9, A–Z and a–z).^[11]^[12] This type of encoding was created by hackers to obfuscate machine code inside what appears to be plain text. This can be useful to avoid detection of the code—to allow the code to pass through filters that scrub non-alphanumeric characters from strings.^[b] A similar type of encoding is called printable code and uses all printable characters (alphanumeric plus symbols like !@#%^&*). A similarly restricted variant is ECHOable code not containing any characters which are not accepted by the ECHO command. It has been shown that it is possible to create shellcode that looks like normal text in English.^[13] Writing such shellcode requires in-depth understanding of the instruction set architecture of the target machines. It has been demonstrated that it is possible to write alphanumeric code that is executable on more than one machine,^[14] thereby constituting multi-architecture executable code.

A work-around was published by Rix in Phrack 57^[11] in which he shows that it is possible to turn any code into alphanumeric code. Often, self-modifying code is leveraged because it allows the code to have byte values that otherwise are not allowed by replacing coded values at runtime. A self-modifying decoder can be created that initially uses only allowed bytes. The main code of the shellcode is encoded, also only using bytes in the allowed range. When the output shellcode is run, the decoder modifies its code to use instructions it requires and then decodes the original shellcode. After decoding the shellcode, the decoder transfers control to it. It has been shown that it is possible to create arbitrarily complex shellcode that looks like normal English text.^[13]

Modern software uses Unicode to support Internationalization and localization. Often, input ASCII text is converted to Unicode before processing. When an ASCII (Latin-1 in general) character is transformed to UTF-16 (16-bit Unicode), a zero byte is inserted after each byte (character) of the original text. Obscou proved in Phrack 61^[12] that it is possible to write shellcode that can run successfully after this transformation. Programs that can automatically encode any shellcode into alphanumeric UTF-16-proof shellcode exist, based on the same principle of a small self-modifying decoder that decodes the original shellcode.

Compatibility

Generally, shellcode is deployed as machine code since it affords relatively unprotected access to the target process. Since machine code is compatible within a relatively narrow computing context (processor, operating system and so on), a shellcode fragment has limited compatibility. Also, since a shellcode attack tends to work best when the code is small and targeting multiple exploits increases the size, typically the code targets only one exploit. None the less, a single shellcode fragment can work for multiple contexts and exploits.^[15]^[16]^[17] Versatility can be achieved by creating a single fragment that contains an implementation for multiple contexts. Common code branches to the implementation for the runtime context.

Analysis

As shellcode is generally not executable on its own, in order to study what it does, it is typically loaded into a special process. A common technique is to write a small C program that contains the shellcode as data (i.e. in a byte buffer), and transfers control to the instructions encoded in the data function pointer or inline assembly code). Another technique is to use an online tool, such as shellcode_2_exe, to embed the shellcode into a pre-made executable husk which can then be analyzed in a standard debugger. Specialized shellcode analysis tools also exist, such as the iDefense sclog project (originally released in 2005 in the Malcode Analyst Pack). Sclog is designed to load external shellcode files and execute them within an API logging framework. Emulation-based shellcode analysis tools also exist such as the sctest application which is part of the cross-platform libemu package. Another emulation-based shellcode analysis tool, built around the libemu library, is scdbg which includes a basic debug shell and integrated reporting features.

Notes

^ The bindshell random port is the smallest stable bindshell shellcode for x86_64 available to date.
^ in part, such filters were a response to non-alphanumeric shellcode exploits

References

^ Foster, James C.; Price, Mike (2005-04-12). Sockets, Shellcode, Porting, & Coding: Reverse Engineering Exploits and Tool Coding for Security Professionals. Elsevier Science & Technology Books. ISBN 1-59749-005-9.
^ Anley, Chris; Koziol, Jack (2007). The shellcoder's handbook: discovering and exploiting security holes (2 ed.). Indianapolis, Indiana, UA: Wiley. ISBN 978-0-470-19882-7. OCLC 173682537.
^ Foster, James C. (2005). Buffer overflow attacks: detect, exploit, prevent. Rockland, MA, USA: Syngress. ISBN 1-59749-022-9. OCLC 57566682.
^ "Tiny Execve sh - Assembly Language - Linux/x86". GitHub. Retrieved 2021-02-01.
^ BHA (2013-06-06). "Shellcode/Socket-reuse". Retrieved 2013-06-07.
^ SkyLined (2010-01-11). "Download and LoadLibrary shellcode released". Archived from the original on 2010-01-23. Retrieved 2010-01-19.
^ "Download and LoadLibrary shellcode for x86 Windows". 2010-01-11. Retrieved 2010-01-19.
^ Skape (2004-03-09). "Safely Searching Process Virtual Address Space" (PDF). nologin. Retrieved 2009-03-19.
^ SkyLined (2009-03-16). "w32 SEH omelet shellcode". Skypher.com. Archived from the original on 2009-03-23. Retrieved 2009-03-19.
^ "JavaScript large number of unescape patterns detected". Archived from the original on 2015-04-03.
^ ^a ^b rix (2001-08-11). "Writing ia32 alphanumeric shellcodes". Phrack. 0x0b (57). Phrack Inc. #0x0f of 0x12. Archived from the original on 2022-03-08. Retrieved 2022-05-26.
^ ^a ^b obscou (2003-08-13). "Building IA32 'Unicode-Proof' Shellcodes". Phrack. 11 (61). Phrack Inc. #0x0b of 0x0f. Archived from the original on 2022-05-26. Retrieved 2008-02-29.
^ ^a ^b Mason, Joshua; Small, Sam; Monrose, Fabian; MacManus, Greg (November 2009). English Shellcode (PDF). Proceedings of the 16th ACM conference on Computer and Communications Security. New York, NY, USA. pp. 524–533. Archived (PDF) from the original on 2022-05-26. Retrieved 2010-01-10. (10 pages)
^ "Multi-architecture (x86) and 64-bit alphanumeric shellcode explained". Blackhat Academy. Archived from the original on 2012-06-21.
^ eugene (2001-08-11). "Architecture Spanning Shellcode". Phrack. Phrack Inc. #0x0e of 0x12. Archived from the original on 2021-11-09. Retrieved 2008-02-29.
^ nemo (2005-11-13). "OSX - Multi arch shellcode". Full disclosure. Archived from the original on 2022-05-26. Retrieved 2022-05-26.
^ Cha, Sang Kil; Pak, Brian; Brumley, David; Lipton, Richard Jay (2010-10-08) [2010-10-04]. Platform-Independent Programs (PDF). Proceedings of the 17th ACM conference on Computer and Communications Security (CCS'10). Chicago, Illinois, USA: Carnegie Mellon University, Pittsburgh, Pennsylvania, USA / Georgia Institute of Technology, Atlanta, Georgia, USA. pp. 547–558. doi:10.1145/1866307.1866369. ISBN 978-1-4503-0244-9. Archived (PDF) from the original on 2022-05-26. Retrieved 2022-05-26. [1] (12 pages) (See also: [2])

External links

Shell-Storm Database of shellcodes Multi-Platform.
An introduction to buffer overflows and shellcode
The Basics of Shellcoding (PDF) An overview of x86 shellcoding by Angelo Rosiello
An introduction to shellcode development
Contains x86 and non-x86 shellcode samples and an online interface for automatic shellcode generation and encoding, from the Metasploit Project
a shellcode archive, sorted by Operating system.
Microsoft Windows and Linux shellcode design tutorial going from basic to advanced.
Windows and Linux shellcode tutorial containing step by step examples.
Designing shellcode demystified
ALPHA3 A shellcode encoder that can turn any shellcode into both Unicode and ASCII, uppercase and mixedcase, alphanumeric shellcode.
Writing Small shellcode by Dafydd Stuttard A whitepaper explaining how to make shellcode as small as possible by optimizing both the design and implementation.
Writing IA32 Restricted Instruction Set Shellcode Decoder Loops by SkyLined Archived 2015-04-03 at the Wayback Machine A whitepaper explaining how to create shellcode when the bytes allowed in the shellcode are very restricted.
BETA3 A tool that can encode and decode shellcode using a variety of encodings commonly used in exploits.
Shellcode 2 Exe - Online converter to embed shellcode in exe husk
Sclog - Updated build of the iDefense sclog shellcode analysis tool (Windows)
Libemu - emulation based shellcode analysis library (*nix/Cygwin)
Scdbg - shellcode debugger built around libemu emulation library (*nix/Windows)

[5] The bindshell random port is the smallest stable bindshell shellcode for x86_64 available to date.

[14] rt, such filters were a response to non-alphanumeric shellcode exploits

[1] Foster, James C.; Price, Mike (2005-04-12). Sockets, Shellcode, Porting, & Coding: Reverse Engineering Exploits and Tool Coding for Security Professionals. Elsevier Science & Technology Books. ISBN 1-59749-005-9.

[anley_koziol_2007-2] Anley, Chris; Koziol, Jack (2007). The shellcoder's handbook: discovering and exploiting security holes (2 ed.). Indianapolis, Indiana, UA: Wiley. ISBN 978-0-470-19882-7. OCLC 173682537.

[3] Foster, James C. (2005). Buffer overflow attacks: detect, exploit, prevent. Rockland, MA, USA: Syngress. ISBN 1-59749-022-9. OCLC 57566682.

[4] "Tiny Execve sh - Assembly Language - Linux/x86". GitHub. Retrieved 2021-02-01.

[6] BHA (2013-06-06). "Shellcode/Socket-reuse". Retrieved 2013-06-07.

[7] SkyLined (2010-01-11). "Download and LoadLibrary shellcode released". Archived from the original on 2010-01-23. Retrieved 2010-01-19.

[8] "Download and LoadLibrary shellcode for x86 Windows". 2010-01-11. Retrieved 2010-01-19.

[9] Skape (2004-03-09). "Safely Searching Process Virtual Address Space" (PDF). nologin. Retrieved 2009-03-19.

[10] SkyLined (2009-03-16). "w32 SEH omelet shellcode". Skypher.com. Archived from the original on 2009-03-23. Retrieved 2009-03-19.

[11] "JavaScript large number of unescape patterns detected". Archived from the original on 2015-04-03.

[Rix_2001-12] rix (2001-08-11). "Writing ia32 alphanumeric shellcodes". Phrack. 0x0b (57). Phrack Inc. #0x0f of 0x12. Archived from the original on 2022-03-08. Retrieved 2022-05-26.

[Obscou_2003-13] scou (2003-08-13). "Building IA32 'Unicode-Proof' Shellcodes". Phrack. 11 (61). Phrack Inc. #0x0b of 0x0f. Archived from the original on 2022-05-26. Retrieved 2008-02-29.

[Mason-Small-Monrose-MacManus_2009-15] Mason, Joshua; Small, Sam; Monrose, Fabian; MacManus, Greg (November 2009). English Shellcode (PDF). Proceedings of the 16th ACM conference on Computer and Communications Security. New York, NY, USA. pp. 524–533. Archived (PDF) from the original on 2022-05-26. Retrieved 2010-01-10. (10 pages)

[16] "Multi-architecture (x86) and 64-bit alphanumeric shellcode explained". Blackhat Academy. Archived from the original on 2012-06-21.

[Eugene_2001-17] ugene (2001-08-11). "Architecture Spanning Shellcode". Phrack. Phrack Inc. #0x0e of 0x12. Archived from the original on 2021-11-09. Retrieved 2008-02-29.

[Nemo_2005-18] (2005-11-13). "OSX - Multi arch shellcode". Full disclosure. Archived from the original on 2022-05-26. Retrieved 2022-05-26.

[Cha-Pak-Brumley-Lipton_2010-19] Cha, Sang Kil; Pak, Brian; Brumley, David; Lipton, Richard Jay (2010-10-08) [2010-10-04]. Platform-Independent Programs (PDF). Proceedings of the 17th ACM conference on Computer and Communications Security (CCS'10). Chicago, Illinois, USA: Carnegie Mellon University, Pittsburgh, Pennsylvania, USA / Georgia Institute of Technology, Atlanta, Georgia, USA. pp. 547–558. doi:10.1145/1866307.1866369. ISBN 978-1-4503-0244-9. Archived (PDF) from the original on 2022-05-26. Retrieved 2022-05-26. [1] (12 pages) (See also: [2])

[1]

[2]

[3]

[4]

[a]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[b]

[13]

[14]

[15]

[16]

[17]

v t e Information security
Threats	Adware Advanced persistent threat Arbitrary code execution Backdoors Bombs Fork Logic Time Zip Hardware backdoors Code injection Crimeware Cross-site scripting Cross-site leaks DOM clobbering History sniffing Cryptojacking Botnets Data breach Drive-by download Browser Helper Objects Viruses Data scraping Denial-of-service attack Eavesdropping Email fraud Email spoofing Exploits Fraudulent dialers Hacktivism Infostealer Insecure direct object reference Keystroke loggers Malware Payload Phishing Voice Polymorphic engine Privilege escalation Ransomware Rootkits Scareware Shellcode Spamming Social engineering Spyware Software bugs Trojan horses Hardware Trojans Remote access trojans Vulnerability Web shells Wiper Worms SQL injection Rogue security software Zombie	vectorial version
Defenses	Application security Secure coding Secure by default Secure by design Misuse case Computer access control Authentication Multi-factor authentication Authorization Computer security software Antivirus software Security-focused operating system Data-centric security Software obfuscation Data masking Encryption Firewall Intrusion detection system Host-based intrusion detection system (HIDS) Anomaly detection Information security management Information risk management Security information and event management (SIEM) Runtime application self-protection Site isolation
Related security topics	Computer security Automotive security Cybercrime Cybersex trafficking Computer fraud Cybergeddon Cyberterrorism Cyberwarfare Electronic warfare Information warfare Internet security Mobile security Network security Copy protection Digital rights management