My favorites | Sign in
Project Home Downloads Wiki Issues Source
Search
for
fileformat  
The NK2 file format explained.
Featured, Phase-Implementation
Updated Mar 2, 2010 by havard.g...@gmail.com

Introduction

Microsoft Outlook stores its autocomplete data in a specially-crafted, binary file (NK2). Its format is not documented, presumably because it is regarded as an internal data structure to the program. Still, the contents of that file are often valuable, and are definately worth keeping.

The python library of this project, _nk2parser.py_ successfully pulls out names and email addresses from the file. Here's how.

Details

Contents

At the time of writing, the author is aware of the following resources withing the file:

  • Display name: name, or e-mail address if no name was supplied
  • Delivery Method: SMTP or Exchange
  • Address: SMTP address or MS addressing

According to internet sources(1), there's more:

  • Record type: SMTP, Exchange, Fax, or MAPIPDL (Distribution List)
  • Alias (and smtp email address or GAL reference)
  • Exchange x500 address
  • x509 PKI (Public Key Infrastructure)
  • Various other data which may be populated from the Active Directory

There is no reason to expect this to be untrue, but this program doesn't care about any of that.

Unraveling the stuff

1. Luckily, the most important data is also most easy to pull out. The email addresses are stored as ascii strings, so by just running strings on the file, you'll get all the addresses.

2. To get the names assosciated with the addresses, you have to dig deeper.

The following is an attempt of describing what works for debunk2.

Record separators

Looking at the NK2 file in an hex viewer or something that can print unprintable bytes so that a human can interpret it is time-consuming, exhausting and not something that someone should have to do when there is plenty of green left in the world. It is also the only way of examining the blasted thing.

Since there is so much green yet to be experienced, your author took some sharp turns and elbow-greased some dirty code into chopping the NK2 file into records.

Thus, debunk2 splits at the following byte strings to make sense out of the NK2 files:

sep1 = '\x04H\xfe\x13' # record separator
sep2 = '\x00\xdd\x01\x0fT\x02\x00\x00\x01'  # record separator

After the file is split at both these byte sequences, the program runs through all records. A lot of the records are duplicates (they aren't really, but since we're only after the name and email addresses, they're redundant).

This isn't perfect, but it gets the job done.

NUL-byte separated strings

Most record fields are separated by triple NUL-bytes (ASCII 0). Some fields are separated by single NUL bytes.

  split1 = [x.replace(NUL, '') for x in y.split(NUL*3)] # SPLIT1: split record into something useful by separating at triple NULs
  if split1[1] != 'SMTP': # SPLIT1 failed
    split2 = [x.replace(NUL, '') for x in y.split(NUL*1)] # SPLIT2: split again, this time using single NULs as delimiter

Most field strings are sprinkled with NUL-bytes (every second byte, actually). This is how nk2parser.py does it:

  def strp(self, s):
	  "Return string stripped of NUL bytes and unprintable characters"
	  return s.replace(NUL, '')#.replace('\x00', '')

SMTP addresses

prefixed with "SMTP:"

File location

Little help, please?

References

Comment by daveh...@gmail.com, Dec 19, 2007

Probably this is a dual-byte character encoding, so that you can store non-ascii characters

http://en.wikipedia.org/wiki/DBCS

Comment by project member havard.g...@gmail.com, Mar 2, 2010

davehowe,

Thanks for the tip and the link. It seems you are quite right.

Sorry it took so long to reply.

Comment by scawisb...@gmail.com, Apr 19, 2011

... and another year passed :) look here, MS finally opens up: http://msdn.microsoft.com/en-us/library/ff625291.aspx


Sign in to add a comment
Powered by Google Project Hosting