My favorites | Sign in
Project Home Downloads Wiki Issues Source
Search
for
MergekeysUserDocs  
mergekeys - merges two sorted flat files with some different columns
Updated Feb 4, 2010 by jeremy.h...@gmail.com

Usage

mergekeys <options> file1 file2

Input files must be sorted by key fields.

If -a and -b are not specified, the first line of each file will be examined to determine common fields. In this case, all key fields must precede all mergeable fields. A header line in each file is required in either case.

Options

Short OptLong OptArgumentDescription
-h--help print this message and exit
-V--version print version info and exit
-v--verbose print verbose messages during execution
-a--left-keys
<left_keys>
list of key fields in left-hand file
-A--left-labels
<left_key_labels>
list of key labels in left-hand file
-b--right-keys
<right_keys>
list of key fields in right-hand file
-B--right-labels
<right_key_labels>
list of key labels in right-hand file
-i--inner inner join - i.e. drop lines that do not have a match in both files
-r--right right outer join - i.e. drop lines in the first file that do not have a match in the second file
-l--left left outer join - i.e. drop lines in the second file that do not have a match in the first file
-D--default
<merge_default>
for outer joins, the defaut value to put in unmatched merge fields
-d--delim
<delim>
delimiting string for both input files
-o--outfile
<outfile>
name of file for output


back to UserDocs

Comment by adulau, Jul 16, 2008

How would you merge different dataset a and b (example below) with crush-tool and merging only the key having a larger value (expected result in dataset c)?

a :

key1 12 key2 13 key3 14

b :

key1 10 key2 9 key3 20

c: key1 12 key2 13 key3 20

Comment by project member jeremy.h...@gmail.com, Jan 27, 2009

adulau: [wiki:DeltaforceUserDocs deltaforce] might do what you are wanting to do, but only if the larger values all exist in one of the files. More likely, you would need to label the fields differently so they will end up in separate columns in the merged output, and then use [wiki:CalcfieldUserDocs calcfield] to create a field with the max value:

$ echo -e "Field-From-A|Field-From-B\n10|13" |
    calcfield  -d '|' -c 'Max' \
        -e '[Field-From-A] > [Field-From-B] ? [Field-From-A] : [Field-From-B]'

Sign in to add a comment
Powered by Google Project Hosting