Help - manipulate data by columns and repeated

07-19-2016

Registered User

2, 0

Join Date: Jul 2016

Last Activity: 5 October 2016, 10:13 AM EDT

Posts: 2

Thanks Given: 1

Thanked 0 Times in 0 Posts

Help - manipulate data by columns and repeated

Hello good afternoon to everyone.
I'm new to the forum and would like to request your help in handling data. I hope my English is clear.

I have a file (Dato01.txt) to contine the following structure.

Code:

# Col1  -  Col2 - Col3 - Col4
Patricia started Jun 22 05:22:58
Carolina started Jun 22 05:23:03
Carolina started Jun 22 05:23:37
Andrea   started Jun 22 05:25:52
Ana      started Jun 22 05:26:11
Andrea   started Jun 22 05:26:52

I have to separate the newest repeated in a file and leave only the oldest in the original file. It should be something like this:

(Dato01.txt)

Code:

# Col1  -  Col2 - Col3 - Col4
Patricia started Jun 22 05:22:58
Carolina started Jun 22 05:23:03
Andrea   started Jun 22 05:25:52
Ana      started Jun 22 05:26:11

(Dato02.txt)

Code:

# Col1  -  Col2 - Col3 - Col4
Carolina started Jun 22 05:23:37
Andrea   started Jun 22 05:26:52

Try it with "for, uniq, grep" but can not find the right formula, if someone can help me thank you very much.

Moderator's Comments:

Please use code tags as required by forum rules!

Last edited by RudiC; 07-19-2016 at 04:45 PM.. Reason: Added code tags

kelevra

View Public Profile for kelevra

Find all posts by kelevra

07-19-2016

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

Handling dates is one of the most difficult tasks, esp. with non-numeric month values. Fortunately, my sort ((GNU coreutils) 8.25) offers the

Quote:

-M, --month-sort
compare (unknown) < 'JAN' < ... < 'DEC'

option. If yours does too, try

Code:

sort -k1,1 -k3M Dato01.txt | awk 'T[$1] {print > "Dato03.txt"} !T[$1]++ {print > "Dato02.txt"}'

RudiC

View Public Profile for RudiC

Find all posts by RudiC

07-19-2016

Read Only

1,278, 486

Join Date: Sep 2012

Last Activity: 27 February 2020, 8:59 PM EST

Location: Houston, Texas, USA

Posts: 1,278

Thanks Given: 0

Thanked 486 Times in 451 Posts

Code:

infile=Dato01.txt
outfile=Dato02.txt

[[ ! -f $infile.bk ]] && cp $infile $infile.bk

ex $infile <<EDIT
$(awk '
  NR==1 {print ":" NR " w " outfile; next;}
  {if (a[$1]++) r[c++]=NR}
  END {
     for (i=0; i<c; i++) print ":" r[i] " w >> " outfile;
     for (i=c-1; i>=0; i--) print ":" r[i] " d";
     print ":wq!";
  }
' outfile=$outfile $infile)
EDIT

This User Gave Thanks to rdrtx1 For This Post:

rdrtx1

View Public Profile for rdrtx1

Find all posts by rdrtx1

07-25-2016

Registered User

2, 0

Join Date: Jul 2016

Last Activity: 5 October 2016, 10:13 AM EDT

Posts: 2

Thanks Given: 1

Thanked 0 Times in 0 Posts

Thank you very much rdrtx1, the script does exactly what I need. It would be too much to ask if you can explain a little, someone else may also have the same question I and comments would be of great help. Again many thanks for helping my question.

kelevra

View Public Profile for kelevra

Find all posts by kelevra

07-25-2016

Read Only

1,278, 486

Join Date: Sep 2012

Last Activity: 27 February 2020, 8:59 PM EST

Location: Houston, Texas, USA

Posts: 1,278

Thanks Given: 0

Thanked 486 Times in 451 Posts

Code:

infile=Dato01.txt                                            # set input file name variable
outfile=Dato02.txt                                           # set second file (repeat lines) name variable

[[ ! -f $infile.bk ]] && cp $infile $infile.bk               # backup input file

ex $infile <<EDIT                                            # invoke inline editor (ex) for input file (ex script built by awk)
$(awk '                                                      # uset awk to build commands for ex
  NR==1 {print ":" NR " w " outfile; next;}                  # write first line to second file (done by ex)
  {if (a[$1]++) r[c++]=NR}                                   # build repeat lines array
  END {
     for (i=0; i<c; i++) print ":" r[i] " w >> " outfile;    # write repeat lines to second file (done by ex)
     for (i=c-1; i>=0; i--) print ":" r[i] " d";             # delete repeat lines from input file (done by ex)
     print ":wq!";                                           # write input file (done by ex)
  }
' outfile=$outfile $infile)                                  # end of awk
EDIT                                                         # end of ex script

rdrtx1

View Public Profile for rdrtx1

Find all posts by rdrtx1

07-25-2016

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Here is another way to do what rdrtx1 was doing just using awk to create two output files and cp to copy the updated version of the input file back to the input file when it is done. Of course, both of these suggestions depend on entries in your input files always being in increasing time order (as in your sample data):

Code:

#!/bin/ksh
# We can't use awk to overwrite the input file directly, so we create a
# temporary output file with the lines from the input file that are to be kept
# and a duplicate output file with the lines for names that appear two or more
# times in the input file.
#
# When awk compltes, if it was successful, we'll copy the temporary output file
# back to the input file.  Otherwise, the input file will not be changed.

InFile="Dato01.txt"		# Name the input file.
DupFile="Dato02.txt"		# Name the output file for duplicates.
TempFile="$InFile.$$"		# Name the temporary output file.

trap 'rm -f "$TempFile"' EXIT	# When the script completes, remove the temp file.

awk -v new="$TempFile" -v dup="$DupFile" '
NR == 1 {
	# Copy the header line from the input file to both output files.
	print > new
	print > dup
	next
}
{	if($1 in seen) {
		# We have seen this person before.  Copy this line to the
		# duplicates file.
		print > dup
	} else {
		# We have not seen this person before.  Copy this line to the
		# temporary file (which will replace the input file when we are
		# done).
		print > new

		# Note that we have seen this person.
		seen[$1]
	}
}' "$InFile" > "$TempFile" && cp "$TempFile" "$InFile"

This was written and tested using a Korn shell, but this should work with any shell that uses basic Bourne shell syntax (including ash. bash, dash, ksh, zsh, and several others; but not csh and its derivatives).

If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk.

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

Shell Programming and Scripting

Help - manipulate data by columns and repeated

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Manipulate the columns of 2 files

Discussion started by: khalidou13

2. Shell Programming and Scripting

Need help to manipulate data using script

Discussion started by: nith_anandan

3. Shell Programming and Scripting

Transposing Repeated Rows to Columns.

Discussion started by: ravzter

4. UNIX for Dummies Questions & Answers

Manipulate and move columns in a file

Discussion started by: chumsky

5. Shell Programming and Scripting

Manipulate columns using sed

Discussion started by: stoyanova

6. Shell Programming and Scripting

Converted repeated rows into splitted columns

Discussion started by: vasanth.vadalur

7. Shell Programming and Scripting

how to manipulate with lines while playing with data

Discussion started by: jojo123

8. Shell Programming and Scripting

comparing the values of repeated keys in multiple columns

Discussion started by: repinementer

9. UNIX for Dummies Questions & Answers

Excel data manipulate

Discussion started by: rahulrathod

10. Filesystems, Disks and Memory

manipulate csv file to add columns

Discussion started by: oscarmon