Remove Duplicates on multiple Key Columns and get the Latest Record from Date/Time Column Post: 302797725

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove duplicates based on the two key columns

Hi All, I needs to fetch unique records based on a keycolumn(ie., first column1) and also I needs to get the records which are having max value on column2 in sorted manner... and duplicates have to store in another output file. Input : Input.txt 1234,0,x 1234,1,y 5678,10,z 9999,10,k...

2. Shell Programming and Scripting

Search based on 1,2,4,5 columns and remove duplicates in the same file.

Hi, I am unable to search the duplicates in a file based on the 1st,2nd,4th,5th columns in a file and also remove the duplicates in the same file. Source filename: Filename.csv "1","ccc","information","5000","temp","concept","new" "1","ddd","information","6000","temp","concept","new"...

3. Shell Programming and Scripting

need to remove duplicates based on key in first column and pattern in last column

Given a file such as this I need to remove the duplicates. 00060011 PAUL BOWSTEIN ad_waq3_921_20100826_010517.txt 00060011 PAUL BOWSTEIN ad_waq3_921_20100827_010528.txt 0624-01 RUT CORPORATION ad_sade3_10_20100827_010528.txt 0624-01 RUT CORPORATION ...

4. Shell Programming and Scripting

finding duplicates in csv based on key columns

Hi team, I have 20 columns csv files. i want to find the duplicates in that file based on the column1 column10 column4 column6 coulnn8 coulunm2 . if those columns have same values . then it should be a duplicate record. can one help me on finding the duplicates, Thanks in advance. ...

5. Shell Programming and Scripting

Removing duplicates in fixed width file which has multiple key columns

Hi All , I have a requirement where I need to remove duplicates from a fixed width file which has multiple key columns .Also , need to capture the duplicate records into another file . File has 8 columns. Key columns are col1 and col2. Col1 has the length of 8 col 2 has the length of 3. ...

6. Shell Programming and Scripting

Remove the time from the date column

Hi, I have file named file1.txt with below contents cat file1.txt 1/29/2014 0:00,706886 1/30/2014 0:00,791265 1/31/2014 0:00,987087 2/1/2014 0:00,1098572 2/2/2014 0:00,572477 2/3/2014 0:00,701715 I want to display as below 1/29/2014,706886 1/30/2014,791265 1/31/2014,987087...

7. UNIX for Dummies Questions & Answers

Display latest record from file based on multiple columns combination

I have requirement to print latest record from file based on multiple columns combination. EWAPE EW1SLE0000 EW1SOMU01 ABORTED 03/16/2015 100004 03/16/2015 100005 001 EWAPE EW1SLE0000 EW1SOMU01 ABORTED 03/18/2015 140003 03/18/2015 140004 001 EWAPE EW1SLE0000 EW1SOMU01 ABORTED 03/18/2015 220006...

8. UNIX for Beginners Questions & Answers

Sort and remove duplicates in directory based on first 5 columns:

I have /tmp dir with filename as: 010020001_S-FOR-Sort-SYEXC_20160229_2212101.marker 010020001_S-FOR-Sort-SYEXC_20160229_2212102.marker 010020001-S-XOR-Sort-SYEXC_20160229_2212104.marker 010020001-S-XOR-Sort-SYEXC_20160229_2212105.marker 010020001_S-ZOR-Sort-SYEXC_20160229_2212106.marker...

9. Shell Programming and Scripting

awk to Sum columns when other column has duplicates and append one column value to another with Care

Hi Experts, Please bear with me, i need help I am learning AWk and stuck up in one issue. First point : I want to sum up column value for column 7, 9, 11,13 and column15 if rows in column 5 are duplicates.No action to be taken for rows where value in column 5 is unique. Second point : For...

10. UNIX for Beginners Questions & Answers

Remove duplicates in a dataframe (table) keeping all the different cells of just one of the columns

Hello all, I need to filter a dataframe composed of several columns of data to remove the duplicates according to one of the columns. I did it with pandas. In the main time, I need that the last column that contains all different data ( not redundant) is conserved in the output like this: A ...

LEARN ABOUT LINUX

gb_trees

gb_trees(3erl)						     Erlang Module Definition						    gb_trees(3erl)

NAME

       gb_trees - General Balanced Trees

DESCRIPTION

       An  efficient implementation of Prof. Arne Andersson's General Balanced Trees. These have no storage overhead compared to unbalanced binary
       trees, and their performance is in general better than AVL trees.

       This module considers two keys as different if and only if they do not compare equal ( == ).

DATA STRUCTURE

       Data structure:

       - {Size, Tree}, where `Tree' is composed of nodes of the form:
	 - {Key, Value, Smaller, Bigger}, and the "empty tree" node:
	 - nil.

       There is no attempt to balance trees after deletions. Since deletions do not increase the height of a tree, this should be OK.

       Original balance condition h(T) <= ceil(c * log(|T|)) has been changed to the similar (but not quite equivalent) condition 2 ^ h(T) <=  |T|
       ^ c . This should also be OK.

       Performance is comparable to the AVL trees in the Erlang book (and faster in general due to less overhead); the difference is that deletion
       works for these trees, but not for the book's trees. Behaviour is logarithmic (as it should be).

DATA TYPES

       gb_tree() = a GB tree

EXPORTS

       balance(Tree1) -> Tree2

	      Types  Tree1 = Tree2 = gb_tree()

	      Rebalances Tree1 . Note that this is rarely necessary, but may be motivated when a large number of nodes have been deleted from  the
	      tree  without  further  insertions. Rebalancing could then be forced in order to minimise lookup times, since deletion only does not
	      rebalance the tree.

       delete(Key, Tree1) -> Tree2

	      Types  Key = term()
		     Tree1 = Tree2 = gb_tree()

	      Removes the node with key Key from Tree1 ; returns new tree. Assumes that the key is present in the tree, crashes otherwise.

       delete_any(Key, Tree1) -> Tree2

	      Types  Key = term()
		     Tree1 = Tree2 = gb_tree()

	      Removes the node with key Key from Tree1 if the key is present in the tree, otherwise does nothing; returns new tree.

       empty() -> Tree

	      Types  Tree = gb_tree()

	      Returns a new empty tree

       enter(Key, Val, Tree1) -> Tree2

	      Types  Key = Val = term()
		     Tree1 = Tree2 = gb_tree()

	      Inserts Key with value Val into Tree1 if the key is not present in the tree, otherwise updates Key to value Val in Tree1	.  Returns
	      the new tree.

       from_orddict(List) -> Tree

	      Types  List = [{Key, Val}]
		     Key = Val = term()
		     Tree = gb_tree()

	      Turns an ordered list List of key-value tuples into a tree. The list must not contain duplicate keys.

       get(Key, Tree) -> Val

	      Types  Key = Val = term()
		     Tree = gb_tree()

	      Retrieves the value stored with Key in Tree . Assumes that the key is present in the tree, crashes otherwise.

       lookup(Key, Tree) -> {value, Val} | none

	      Types  Key = Val = term()
		     Tree = gb_tree()

	      Looks up Key in Tree ; returns {value, Val} , or none if Key is not present.

       insert(Key, Val, Tree1) -> Tree2

	      Types  Key = Val = term()
		     Tree1 = Tree2 = gb_tree()

	      Inserts Key with value Val into Tree1 ; returns the new tree. Assumes that the key is not present in the tree, crashes otherwise.

       is_defined(Key, Tree) -> bool()

	      Types  Tree = gb_tree()

	      Returns true if Key is present in Tree , otherwise false .

       is_empty(Tree) -> bool()

	      Types  Tree = gb_tree()

	      Returns true if Tree is an empty tree, and false otherwise.

       iterator(Tree) -> Iter

	      Types  Tree = gb_tree()
		     Iter = term()

	      Returns an iterator that can be used for traversing the entries of Tree ; see next/1 . The implementation of this is very efficient;
	      traversing the whole tree using next/1 is only slightly slower than getting the list of all elements using to_list/1 and	traversing
	      that.  The main advantage of the iterator approach is that it does not require the complete list of all elements to be built in mem-
	      ory at one time.

       keys(Tree) -> [Key]

	      Types  Tree = gb_tree()
		     Key = term()

	      Returns the keys in Tree as an ordered list.

       largest(Tree) -> {Key, Val}

	      Types  Tree = gb_tree()
		     Key = Val = term()

	      Returns {Key, Val} , where Key is the largest key in Tree , and Val is the value associated with this key. Assumes that the tree	is
	      nonempty.

       map(Function, Tree1) -> Tree2

	      Types  Function = fun(K, V1) -> V2
		     Tree1 = Tree2 = gb_tree()

	      maps  the function F(K, V1) -> V2 to all key-value pairs of the tree Tree1 and returns a new tree Tree2 with the same set of keys as
	      Tree1 and the new set of values V2.

       next(Iter1) -> {Key, Val, Iter2} | none

	      Types  Iter1 = Iter2 = Key = Val = term()

	      Returns {Key, Val, Iter2} where Key is the smallest key referred to by the iterator Iter1 , and Iter2 is the new iterator to be used
	      for traversing the remaining nodes, or the atom none if no nodes remain.

       size(Tree) -> int()

	      Types  Tree = gb_tree()

	      Returns the number of nodes in Tree .

       smallest(Tree) -> {Key, Val}

	      Types  Tree = gb_tree()
		     Key = Val = term()

	      Returns {Key, Val} , where Key is the smallest key in Tree , and Val is the value associated with this key. Assumes that the tree is
	      nonempty.

       take_largest(Tree1) -> {Key, Val, Tree2}

	      Types  Tree1 = Tree2 = gb_tree()
		     Key = Val = term()

	      Returns {Key, Val, Tree2} , where Key is the largest key in Tree1 , Val is the value associated with this key,  and  Tree2  is  this
	      tree with the corresponding node deleted. Assumes that the tree is nonempty.

       take_smallest(Tree1) -> {Key, Val, Tree2}

	      Types  Tree1 = Tree2 = gb_tree()
		     Key = Val = term()

	      Returns  {Key,  Val,  Tree2} , where Key is the smallest key in Tree1 , Val is the value associated with this key, and Tree2 is this
	      tree with the corresponding node deleted. Assumes that the tree is nonempty.

       to_list(Tree) -> [{Key, Val}]

	      Types  Tree = gb_tree()
		     Key = Val = term()

	      Converts a tree into an ordered list of key-value tuples.

       update(Key, Val, Tree1) -> Tree2

	      Types  Key = Val = term()
		     Tree1 = Tree2 = gb_tree()

	      Updates Key to value Val in Tree1 ; returns the new tree. Assumes that the key is present in the tree.

       values(Tree) -> [Val]

	      Types  Tree = gb_tree()
		     Val = term()

	      Returns the values in Tree as an ordered list, sorted by their corresponding keys. Duplicates are not removed.

SEE ALSO

       gb_sets(3erl) , dict(3erl)

Ericsson AB							   stdlib 1.17.3						    gb_trees(3erl)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove duplicates based on the two key columns

Discussion started by: kmsekhar

2. Shell Programming and Scripting

Search based on 1,2,4,5 columns and remove duplicates in the same file.

Discussion started by: onesuri

3. Shell Programming and Scripting

need to remove duplicates based on key in first column and pattern in last column

Discussion started by: script_op2a

4. Shell Programming and Scripting

finding duplicates in csv based on key columns

Discussion started by: baskivs