Remove Duplicates on multiple Key Columns and get the Latest Record from Date/Time Column Post: 302799171

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove duplicates based on the two key columns

Hi All, I needs to fetch unique records based on a keycolumn(ie., first column1) and also I needs to get the records which are having max value on column2 in sorted manner... and duplicates have to store in another output file. Input : Input.txt 1234,0,x 1234,1,y 5678,10,z 9999,10,k...

2. Shell Programming and Scripting

Search based on 1,2,4,5 columns and remove duplicates in the same file.

Hi, I am unable to search the duplicates in a file based on the 1st,2nd,4th,5th columns in a file and also remove the duplicates in the same file. Source filename: Filename.csv "1","ccc","information","5000","temp","concept","new" "1","ddd","information","6000","temp","concept","new"...

3. Shell Programming and Scripting

need to remove duplicates based on key in first column and pattern in last column

Given a file such as this I need to remove the duplicates. 00060011 PAUL BOWSTEIN ad_waq3_921_20100826_010517.txt 00060011 PAUL BOWSTEIN ad_waq3_921_20100827_010528.txt 0624-01 RUT CORPORATION ad_sade3_10_20100827_010528.txt 0624-01 RUT CORPORATION ...

4. Shell Programming and Scripting

finding duplicates in csv based on key columns

Hi team, I have 20 columns csv files. i want to find the duplicates in that file based on the column1 column10 column4 column6 coulnn8 coulunm2 . if those columns have same values . then it should be a duplicate record. can one help me on finding the duplicates, Thanks in advance. ...

5. Shell Programming and Scripting

Removing duplicates in fixed width file which has multiple key columns

Hi All , I have a requirement where I need to remove duplicates from a fixed width file which has multiple key columns .Also , need to capture the duplicate records into another file . File has 8 columns. Key columns are col1 and col2. Col1 has the length of 8 col 2 has the length of 3. ...

6. Shell Programming and Scripting

Remove the time from the date column

Hi, I have file named file1.txt with below contents cat file1.txt 1/29/2014 0:00,706886 1/30/2014 0:00,791265 1/31/2014 0:00,987087 2/1/2014 0:00,1098572 2/2/2014 0:00,572477 2/3/2014 0:00,701715 I want to display as below 1/29/2014,706886 1/30/2014,791265 1/31/2014,987087...

7. UNIX for Dummies Questions & Answers

Display latest record from file based on multiple columns combination

I have requirement to print latest record from file based on multiple columns combination. EWAPE EW1SLE0000 EW1SOMU01 ABORTED 03/16/2015 100004 03/16/2015 100005 001 EWAPE EW1SLE0000 EW1SOMU01 ABORTED 03/18/2015 140003 03/18/2015 140004 001 EWAPE EW1SLE0000 EW1SOMU01 ABORTED 03/18/2015 220006...

8. UNIX for Beginners Questions & Answers

Sort and remove duplicates in directory based on first 5 columns:

I have /tmp dir with filename as: 010020001_S-FOR-Sort-SYEXC_20160229_2212101.marker 010020001_S-FOR-Sort-SYEXC_20160229_2212102.marker 010020001-S-XOR-Sort-SYEXC_20160229_2212104.marker 010020001-S-XOR-Sort-SYEXC_20160229_2212105.marker 010020001_S-ZOR-Sort-SYEXC_20160229_2212106.marker...

9. Shell Programming and Scripting

awk to Sum columns when other column has duplicates and append one column value to another with Care

Hi Experts, Please bear with me, i need help I am learning AWk and stuck up in one issue. First point : I want to sum up column value for column 7, 9, 11,13 and column15 if rows in column 5 are duplicates.No action to be taken for rows where value in column 5 is unique. Second point : For...

10. UNIX for Beginners Questions & Answers

Remove duplicates in a dataframe (table) keeping all the different cells of just one of the columns

Hello all, I need to filter a dataframe composed of several columns of data to remove the duplicates according to one of the columns. I did it with pandas. In the main time, I need that the last column that contains all different data ( not redundant) is conserved in the output like this: A ...

LEARN ABOUT LINUX

dict

dict(3erl)						     Erlang Module Definition							dict(3erl)

NAME

       dict - Key-Value Dictionary

DESCRIPTION

       Dict implements a Key - Value dictionary. The representation of a dictionary is not defined.

       This module provides exactly the same interface as the module orddict . One difference is that while this module considers two keys as dif-
       ferent if they do not match ( =:= ), orddict considers two keys as different if and only if they do not compare equal ( == ).

DATA TYPES

       dictionary()
	 as returned by new/0

EXPORTS

       append(Key, Value, Dict1) -> Dict2

	      Types  Key = Value = term()
		     Dict1 = Dict2 = dictionary()

	      This function appends a new Value to the current list of values associated with Key . An exception is generated if the initial value
	      associated with Key is not a list of values.

       append_list(Key, ValList, Dict1) -> Dict2

	      Types  ValList = [Value]
		     Key = Value = term()
		     Dict1 = Dict2 = dictionary()

	      This  function appends a list of values ValList to the current list of values associated with Key . An exception is generated if the
	      initial value associated with Key is not a list of values.

       erase(Key, Dict1) -> Dict2

	      Types  Key = term()
		     Dict1 = Dict2 = dictionary()

	      This function erases all items with a given key from a dictionary.

       fetch(Key, Dict) -> Value

	      Types  Key = Value = term()
		     Dict = dictionary()

	      This function returns the value associated with Key in the dictionary Dict . fetch assumes that the Key is present in the dictionary
	      and an exception is generated if Key is not in the dictionary.

       fetch_keys(Dict) -> Keys

	      Types  Dict = dictionary()
		     Keys = [term()]

	      This function returns a list of all keys in the dictionary.

       filter(Pred, Dict1) -> Dict2

	      Types  Pred = fun(Key, Value) -> bool()
		     Key = Value = term()
		     Dict1 = Dict2 = dictionary()

	      Dict2 is a dictionary of all keys and values in Dict1 for which Pred(Key, Value) is true .

       find(Key, Dict) -> {ok, Value} | error

	      Types  Key = Value = term()
		     Dict = dictionary()

	      This function searches for a key in a dictionary. Returns {ok, Value} where Value is the value associated with Key , or error if the
	      key is not present in the dictionary.

       fold(Fun, Acc0, Dict) -> Acc1

	      Types  Fun = fun(Key, Value, AccIn) -> AccOut
		     Key = Value = term()
		     Acc0 = Acc1 = AccIn = AccOut = term()
		     Dict = dictionary()

	      Calls Fun on successive keys and values of Dict together with an extra argument Acc (short for accumulator). Fun must return  a  new
	      accumulator which is passed to the next call. Acc0 is returned if the list is empty. The evaluation order is undefined.

       from_list(List) -> Dict

	      Types  List = [{Key, Value}]
		     Dict = dictionary()

	      This function converts the Key - Value list List to a dictionary.

       is_key(Key, Dict) -> bool()

	      Types  Key = term()
		     Dict = dictionary()

	      This function tests if Key is contained in the dictionary Dict .

       map(Fun, Dict1) -> Dict2

	      Types  Fun = fun(Key, Value1) -> Value2
		     Key = Value1 = Value2 = term()
		     Dict1 = Dict2 = dictionary()

	      map calls Func on successive keys and values of Dict to return a new value for each key. The evaluation order is undefined.

       merge(Fun, Dict1, Dict2) -> Dict3

	      Types  Fun = fun(Key, Value1, Value2) -> Value
		     Key = Value1 = Value2 = Value3 = term()
		     Dict1 = Dict2 = Dict3 = dictionary()

	      merge  merges  two  dictionaries, Dict1 and Dict2 , to create a new dictionary. All the Key - Value pairs from both dictionaries are
	      included in the new dictionary. If a key occurs in both dictionaries then Fun is called with the key and both values to return a new
	      value. merge could be defined as:

	      merge(Fun, D1, D2) ->
		  fold(fun (K, V1, D) ->
			       update(K, fun (V2) -> Fun(K, V1, V2) end, V1, D)
		       end, D2, D1).

	      but is faster.

       new() -> dictionary()

	      This function creates a new dictionary.

       size(Dict) -> int()

	      Types  Dict = dictionary()

	      Returns the number of elements in a Dict .

       store(Key, Value, Dict1) -> Dict2

	      Types  Key = Value = term()
		     Dict1 = Dict2 = dictionary()

	      This  function  stores  a Key - Value pair in a dictionary. If the Key already exists in Dict1 , the associated value is replaced by
	      Value .

       to_list(Dict) -> List

	      Types  Dict = dictionary()
		     List = [{Key, Value}]

	      This function converts the dictionary to a list representation.

       update(Key, Fun, Dict1) -> Dict2

	      Types  Key = term()
		     Fun = fun(Value1) -> Value2
		     Value1 = Value2 = term()
		     Dict1 = Dict2 = dictionary()

	      Update a value in a dictionary by calling Fun on the value to get a new value. An exception is generated if Key is  not  present	in
	      the dictionary.

       update(Key, Fun, Initial, Dict1) -> Dict2

	      Types  Key = Initial = term()
		     Fun = fun(Value1) -> Value2
		     Value1 = Value2 = term()
		     Dict1 = Dict2 = dictionary()

	      Update  a value in a dictionary by calling Fun on the value to get a new value. If Key is not present in the dictionary then Initial
	      will be stored as the first value. For example append/3 could be defined as:

	      append(Key, Val, D) ->
		  update(Key, fun (Old) -> Old ++ [Val] end, [Val], D).

       update_counter(Key, Increment, Dict1) -> Dict2

	      Types  Key = term()
		     Increment = number()
		     Dict1 = Dict2 = dictionary()

	      Add Increment to the value associated with Key and store this value. If Key is not present in the dictionary then Increment will	be
	      stored as the first value.

	      This could be defined as:

	      update_counter(Key, Incr, D) ->
		  update(Key, fun (Old) -> Old + Incr end, Incr, D).

	      but is faster.

NOTES

       The functions append and append_list are included so we can store keyed values in a list accumulator . For example:

       > D0 = dict:new(),
	 D1 = dict:store(files, [], D0),
	 D2 = dict:append(files, f1, D1),
	 D3 = dict:append(files, f2, D2),
	 D4 = dict:append(files, f3, D3),
	 dict:fetch(files, D4).
       [f1,f2,f3]

       This saves the trouble of first fetching a keyed value, appending a new value to the list of stored values, and storing the result.

       The function fetch should be used if the key is known to be in the dictionary, otherwise find .

SEE ALSO

       gb_trees(3erl) , orddict(3erl)

Ericsson AB							   stdlib 1.17.3							dict(3erl)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove duplicates based on the two key columns

Discussion started by: kmsekhar

2. Shell Programming and Scripting

Search based on 1,2,4,5 columns and remove duplicates in the same file.

Discussion started by: onesuri

3. Shell Programming and Scripting

need to remove duplicates based on key in first column and pattern in last column

Discussion started by: script_op2a

4. Shell Programming and Scripting

finding duplicates in csv based on key columns

Discussion started by: baskivs