pefile is a multi-platform Python module to read and work with Portable Executable (aka PE) files. Most of the information in the PE Header is accessible, as well as all the sections, section's information and data.
pefile requires some basic understanding of the layout of a PE file. Armed with it it's possible to explore nearly every single feature of the file.
Some of the tasks that pefile makes possible are:
- Modifying and writing back to the PE image
- Header Inspection
- Sections analysis
- Retrieving data
- Warnings for suspicious and malformed values
- Packer detection with PEiD’s signatures
- PEiD signature generation
Please, refer to UsageExamples for starting points on how to use pefile
Latest changes
Version: 1.2.10-63
- Fixed an "index out of range" problem when parsing some unusual import tables
- Fixed struct module's types to work properly on 64bit architectures. As it was reported by James on the pefile googlegroup, the 'L' type tried to decode 8 bytes into a 64bit long instead of the expected 4 bytes for a dword. 'I' behaves as expected decoding 4 bytes when pefile runs in both 32bit and 64bit architectures
Version: 1.2.10-60
- Besides some small bugfixes in this release I've added functionality to parse the LOAD_CONFIG data directory. Now one can access this structure's fields like, for instance, pe.DIRECTORY_ENTRY_LOAD_CONFIG.struct.SecurityCookie or pe.DIRECTORY_ENTRY_LOAD_CONFIG.struct.SEHandlerTable
Version: 1.2.10-56
- Fixed bug in contains_offset(). The end of the section's data on disk was being calculated as VirtualAddress + SizeOfRawData instead of the correct: PointerToRawData + SizeOfRawData
- Improved the redering when dumping the file's contents in textual form. The performance of the operation has greatly improved
- get_data() calls now use a fixed size argument when possible. Improves the speed of those calls in large files. Fix suggested by Paul, barnabas79 link
- get_memory_mapped_image() can now properly return rebased images. The rebased image data is temporary and will be discarded (won't be saved in the instance). To achieve this one should call relocate_image() which will make the changes permanent
- Improved parsing of import table for PEI-format DLLs generated with MingW
- Added methods to handle the updating of the section's data upon modification of values in the image's data. (Section's and image's data are kept separately)
- generate_checksum() now makes sure it processes the image with all modifications made to it
- The write() method now only returns the file data if no filename is provided, which is a more intuitive behavior
- parse_data_directories() now supports an optional argument to specify with directories to parse. For instance:
# 'fast_load' makes pefile to not load any directory # pe = pefile.PE(filepath, fast_load=True) # the following line will tell pefile to only process the # resource directory, where the version information is located # pe.parse_data_directories( directories=[ DIRECTORY_ENTRY['IMAGE_DIRECTORY_ENTRY_RESOURCE'] ] )
Version: 1.2.9.1
- Fixed parsing problem on files specifying a FileAligment of zero
- Fixed problem parsing the Bound Imports directory when it contained invalid data. In some instances pefile would get caught up trying to make sense of arbitrary data. Now when empty strings are found as module names in the Bound Import structures the parsing is aborted
Version: 1.2.9
- Now it's possible to modify the version information by directly assigning new values to the keys, for instance pe.FileInfo[0].StringTable[0].entries['OriginalFilename'] = 'NewName.exe'
- Other common keys are: LegalCopyright, InternalName, FileVersion, CompanyName, ProductName, ProductVersion, FileDescription, OriginalFilename
- Added __str__() and __repr__() methods to pefile's structures. Now it's possible to navigate through the contents much more comfortably from an interactive Python command line. Just typing the name of a structure or doing a print on it will return all the fields and their contents
- Bugs fixed when parsing the resource information
- Improved parsing of imported symbols. Import by ordinal and name is much more clear now. The ImportData instances have a new attribute, 'import_by_ordinal', indicating whether a symbol is imported by name, in that case the 'ordinal' attributes will contain the ordinal. Otherwise the attribute 'name' will contain the name of the imported symbol.
- Added CheckSum verification and generation methods. verify_checksum() will return True/False indicating whether the value in the file's OptionalHeader CheckSum field contains the real CheckSum of the file. generate_checksum() will calculate the checksum over the file's data. If one modifies fields and writes the changes to disk it's possible to update the checksum by reloading the modified field and setting the CheckSum field to generate_checksum()'s result.
- Other minor fixes
- Added missing information when parsing import directory entries. Now the RVA of the Hint/Name entries is reported as an attribute named hint_name_table_rva; as well the hint, if present, will be exposed as the attribute hint
- Fixed a minor bug retrieving the relative virtual address of the Hint/Name entries. Only the lower 16 bits where being fetched as opposed to the 31 that had to be read. It seldom was the case that the entries where farther then 64KiB, but it could have happened. Thanks to Halvar for spotting this one
- Added computation of MD5, SHA-1, SHA-256 and SHA-512 on a per-section basis. The results are always reported when invoking the dump_info() method in the PE instance. SHA-256 and SHA-512 are calculated only in Python 2.5 onwards which includes them in the hashlib module. The SectionStructure instances now sport the following methods: get_hash_sha1(), get_hash_sha256(), get_hash_sha512(), get_hash_md5()
Version: 1.2.8
- As suggested by Jim Clausing. Added computation of MD5, SHA-1, SHA-256 and SHA-512 on a per-section basis. The results are always reported when invoking the dump_info() method in the PE instance. SHA-256 and SHA-512 are calculated only in Python 2.5 onwards which includes them in the hashlib module. The SectionStructure instances now sport the following methods: get_hash_sha1(), get_hash_sha256(), get_hash_sha512(), get_hash_md5()
- Faster entropy calculation by Gergely Erdelyi
- Added some intelligence handling unicode strings in the resources information. Strings in the resources seem to always be Pascal style, added support for those
- Changed some loops iterating using range() to use xrange() instead. It will make the code more robust/faster whenever invalid large numbers of elements are specified in different arrays
- As per c1de0x suggestion, added set_data() method to SectionStructure
- Added get_entropy() method to SectionStructure. Now it's only calculated on demand or when doing a dump_info()
- c1de0x pointed out a redundant length check in __unpack_data__ and __unpack__. Now the exception raised by the latter is caught by the former and a warning added if a structure can't be parsed because of missing data
- Fixed bug parsing export directory. Warning messages are added if it's found to be invalid
- Fixed bug parsing the IAT. Some broken samples could crash pefile. The invalid IAT is now reported in the warnings
- New method: relocate_image(new_ImageBase) will apply the relocation information, if any, to the image
- get_memory_mapped_image() now supports and additional keyword argument, ImageBase. By specifying an address it will return a data relocated (if the PE contains relocation information) as if it had been relocated to the new ImageBase
- Added full family of bytes/word/dword/qword manipulation methods (needed by the relocation functionality):
- get_data_from_dword(dword), get_dword_from_data(data, offset), get_dword_at_rva(rva), get_dword_from_offset(offset), set_dword_at_rva(rva, dword), set_dword_at_offset(offset, dword)
- get_data_from_word(word), get_word_from_data(data, offset), get_word_at_rva(rva), get_word_from_offset(offset), set_word_at_rva(rva, word), set_word_at_offset(offset, word)
- get_data_from_qword(qword), get_qword_from_data(data, offset), get_qword_at_rva(rva), get_qword_from_offset(offset), set_qword_at_rva(rva, qword), set_qword_at_offset(offset, qword)
- set_bytes_at_rva(rva, data), set_bytes_at_offset(offset, data)
Projects and products using pefile
- Exe Dump Utility a web-based pefile
- VirusTotal
- bbfreeze
- pyemu: download, whitepaper
- Offensive Computing
- Immunity Debugger 1.1
Additional resources
Posters depicting the PE file format:
- Portable Executable Format shows the full view of the headers and structures defined by the Portable Executable format
- Portable Executable Format. A File Walkthrough Shows a walkthrough over the raw view of an executable file with the PE format fields laid out over the corresponding areas
A PDF file that I put together depicting the PE file format. (Hosted in OpenRCE) (The poster just mentioned is based on this).
The following links provide extended information on the PE format and its structures.
- An In-Depth Look into the Win32 Portable Executable File Format
- An In-Depth Look into the Win32 Portable Executable File Format, Part 2
- Peering Inside the PE: A Tour of the Win32 Portable Executable File Format
- The Portable Executable File Format
- Portable Executable File Format
- Get icons from Exe or DLL the PE way
- Tutorial 6: Import Table
- Solar Eclipse's Tiny PE page