|
|
pefile is a multi-platform Python module to read and work with Portable Executable (aka PE) files. Most of the information in the PE Header is accessible, as well as all the sections, section's information and data.
pefile requires some basic understanding of the layout of a PE file. Armed with it it's possible to explore nearly every single feature of the file.
Some of the tasks that pefile makes possible are:
- Modifying and writing back to the PE image
- Header Inspection
- Sections analysis
- Retrieving data
- Warnings for suspicious and malformed values
- Packer detection with PEiD’s signatures
- PEiD signature generation
Please, refer to UsageExamples for starting points on how to use pefile
Latest changes
- Version bumped up to: 1.2.9.1
- Fixed parsing problem on files specifying a FileAligment of zero
- Fixed problem parsing the Bound Imports directory when it contained invalid data. In some instances pefile would get caught up trying to make sense of arbitrary data. Now when empty strings are found as module names in the Bound Import structures the parsing is aborted
- Version bumped up to: 1.2.9
- Now it's possible to modify the version information by directly assigning new values to the keys, for instance pe.FileInfo[0].StringTable[0].entries['OriginalFilename'] = 'NewName.exe'
- Other common keys are: LegalCopyright, InternalName, FileVersion, CompanyName, ProductName, ProductVersion, FileDescription, OriginalFilename
- Added __str__() and __repr__() methods to pefile's structures. Now it's possible to navigate through the contents much more comfortably from an interactive Python command line. Just typing the name of a structure or doing a print on it will return all the fields and their contents
- Bugs fixed when parsing the resource information
- Improved parsing of imported symbols. Import by ordinal and name is much more clear now. The ImportData instances have a new attribute, 'import_by_ordinal', indicating whether a symbol is imported by name, in that case the 'ordinal' attributes will contain the ordinal. Otherwise the attribute 'name' will contain the name of the imported symbol.
- Added CheckSum verification and generation methods. verify_checksum() will return True/False indicating whether the value in the file's OptionalHeader CheckSum field contains the real CheckSum of the file. generate_checksum() will calculate the checksum over the file's data. If one modifies fields and writes the changes to disk it's possible to update the checksum by reloading the modified field and setting the CheckSum field to generate_checksum()'s result.
- Other minor fixes
- Added missing information when parsing import directory entries. Now the RVA of the Hint/Name entries is reported as an attribute named hint_name_table_rva; as well the hint, if present, will be exposed as the attribute hint
- Fixed a minor bug retrieving the relative virtual address of the Hint/Name entries. Only the lower 16 bits where being fetched as opposed to the 31 that had to be read. It seldom was the case that the entries where farther then 64KiB, but it could have happened. Thanks to Halvar for spotting this one
- Added computation of MD5, SHA-1, SHA-256 and SHA-512 on a per-section basis. The results are always reported when invoking the dump_info() method in the PE instance. SHA-256 and SHA-512 are calculated only in Python 2.5 onwards which includes them in the hashlib module. The SectionStructure instances now sport the following methods: get_hash_sha1(), get_hash_sha256(), get_hash_sha512(), get_hash_md5()
- Bumped version number to 1.2.8
- As suggested by Jim Clausing. Added computation of MD5, SHA-1, SHA-256 and SHA-512 on a per-section basis. The results are always reported when invoking the dump_info() method in the PE instance. SHA-256 and SHA-512 are calculated only in Python 2.5 onwards which includes them in the hashlib module. The SectionStructure instances now sport the following methods: get_hash_sha1(), get_hash_sha256(), get_hash_sha512(), get_hash_md5()
- Faster entropy calculation by Gergely Erdelyi
- Added some intelligence handling unicode strings in the resources information. Strings in the resources seem to always be Pascal style, added support for those
- Changed some loops iterating using range() to use xrange() instead. It will make the code more robust/faster whenever invalid large numbers of elements are specified in different arrays
- As per c1de0x suggestion, added set_data() method to SectionStructure
- Added get_entropy() method to SectionStructure. Now it's only calculated on demand or when doing a dump_info()
- c1de0x pointed out a redundant length check in __unpack_data__ and __unpack__. Now the exception raised by the latter is caught by the former and a warning added if a structure can't be parsed because of missing data
- Fixed bug parsing export directory. Warning messages are added if it's found to be invalid
- Fixed bug parsing the IAT. Some broken samples could crash pefile. The invalid IAT is now reported in the warnings
- New method: relocate_image(new_ImageBase) will apply the relocation information, if any, to the image
- get_memory_mapped_image() now supports and additional keyword argument, ImageBase. By specifying an address it will return a data relocated (if the PE contains relocation information) as if it had been relocated to the new ImageBase
- Added full family of bytes/word/dword/qword manipulation methods (needed by the relocation functionality):
- get_data_from_dword(dword), get_dword_from_data(data, offset), get_dword_at_rva(rva), get_dword_from_offset(offset), set_dword_at_rva(rva, dword), set_dword_at_offset(offset, dword)
- get_data_from_word(word), get_word_from_data(data, offset), get_word_at_rva(rva), get_word_from_offset(offset), set_word_at_rva(rva, word), set_word_at_offset(offset, word)
- get_data_from_qword(qword), get_qword_from_data(data, offset), get_qword_at_rva(rva), get_qword_from_offset(offset), set_qword_at_rva(rva, qword), set_qword_at_offset(offset, qword)
- set_bytes_at_rva(rva, data), set_bytes_at_offset(offset, data)
Projects and products using pefile
- Exe Dump Utility a web-based pefile
- VirusTotal
- bbfreeze
- pyemu: download, whitepaper
- Offensive Computing
- Immunity Debugger 1.1
Additional resources
Posters depicting the PE file format:
- Portable Executable Format shows the full view of the headers and structures defined by the Portable Executable format
- Portable Executable Format. A File Walkthrough Shows a walkthrough over the raw view of an executable file with the PE format fields laid out over the corresponding areas
A PDF file that I put together depicting the PE file format. (Hosted in OpenRCE) (The poster just mentioned is based on this).
The following links provide extended information on the PE format and its structures.
- An In-Depth Look into the Win32 Portable Executable File Format
- An In-Depth Look into the Win32 Portable Executable File Format, Part 2
- Peering Inside the PE: A Tour of the Win32 Portable Executable File Format
- The Portable Executable File Format
- Portable Executable File Format
- Get icons from Exe or DLL the PE way
- Tutorial 6: Import Table
- Solar Eclipse's Tiny PE page
