Help - manipulate data by columns and repeated


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Help - manipulate data by columns and repeated
# 1  
Old 07-19-2016
Help - manipulate data by columns and repeated

Hello good afternoon to everyone.
I'm new to the forum and would like to request your help in handling data. I hope my English is clear.

I have a file (Dato01.txt) to contine the following structure.

Code:
# Col1  -  Col2 - Col3 - Col4
Patricia started Jun 22 05:22:58
Carolina started Jun 22 05:23:03
Carolina started Jun 22 05:23:37
Andrea   started Jun 22 05:25:52
Ana      started Jun 22 05:26:11
Andrea   started Jun 22 05:26:52

I have to separate the newest repeated in a file and leave only the oldest in the original file. It should be something like this:

(Dato01.txt)
Code:
# Col1  -  Col2 - Col3 - Col4
Patricia started Jun 22 05:22:58
Carolina started Jun 22 05:23:03
Andrea   started Jun 22 05:25:52
Ana      started Jun 22 05:26:11

(Dato02.txt)
Code:
# Col1  -  Col2 - Col3 - Col4
Carolina started Jun 22 05:23:37
Andrea   started Jun 22 05:26:52

Try it with "for, uniq, grep" but can not find the right formula, if someone can help me thank you very much.




Moderator's Comments:
Mod Comment Please use code tags as required by forum rules!

Last edited by RudiC; 07-19-2016 at 04:45 PM.. Reason: Added code tags
# 2  
Old 07-19-2016
Handling dates is one of the most difficult tasks, esp. with non-numeric month values. Fortunately, my sort ((GNU coreutils) 8.25) offers the
Quote:
-M, --month-sort
compare (unknown) < 'JAN' < ... < 'DEC'
option. If yours does too, try

Code:
sort -k1,1 -k3M Dato01.txt | awk 'T[$1] {print > "Dato03.txt"} !T[$1]++ {print > "Dato02.txt"}'

# 3  
Old 07-19-2016
Code:
infile=Dato01.txt
outfile=Dato02.txt

[[ ! -f $infile.bk ]] && cp $infile $infile.bk

ex $infile <<EDIT
$(awk '
  NR==1 {print ":" NR " w " outfile; next;}
  {if (a[$1]++) r[c++]=NR}
  END {
     for (i=0; i<c; i++) print ":" r[i] " w >> " outfile;
     for (i=c-1; i>=0; i--) print ":" r[i] " d";
     print ":wq!";
  }
' outfile=$outfile $infile)
EDIT

This User Gave Thanks to rdrtx1 For This Post:
# 4  
Old 07-25-2016
Thank you very much rdrtx1, the script does exactly what I need. It would be too much to ask if you can explain a little, someone else may also have the same question I and comments would be of great help. Again many thanks for helping my question.
# 5  
Old 07-25-2016
Code:
infile=Dato01.txt                                            # set input file name variable
outfile=Dato02.txt                                           # set second file (repeat lines) name variable

[[ ! -f $infile.bk ]] && cp $infile $infile.bk               # backup input file

ex $infile <<EDIT                                            # invoke inline editor (ex) for input file (ex script built by awk)
$(awk '                                                      # uset awk to build commands for ex
  NR==1 {print ":" NR " w " outfile; next;}                  # write first line to second file (done by ex)
  {if (a[$1]++) r[c++]=NR}                                   # build repeat lines array
  END {
     for (i=0; i<c; i++) print ":" r[i] " w >> " outfile;    # write repeat lines to second file (done by ex)
     for (i=c-1; i>=0; i--) print ":" r[i] " d";             # delete repeat lines from input file (done by ex)
     print ":wq!";                                           # write input file (done by ex)
  }
' outfile=$outfile $infile)                                  # end of awk
EDIT                                                         # end of ex script

# 6  
Old 07-25-2016
Here is another way to do what rdrtx1 was doing just using awk to create two output files and cp to copy the updated version of the input file back to the input file when it is done. Of course, both of these suggestions depend on entries in your input files always being in increasing time order (as in your sample data):
Code:
#!/bin/ksh
# We can't use awk to overwrite the input file directly, so we create a
# temporary output file with the lines from the input file that are to be kept
# and a duplicate output file with the lines for names that appear two or more
# times in the input file.
#
# When awk compltes, if it was successful, we'll copy the temporary output file
# back to the input file.  Otherwise, the input file will not be changed.

InFile="Dato01.txt"		# Name the input file.
DupFile="Dato02.txt"		# Name the output file for duplicates.
TempFile="$InFile.$$"		# Name the temporary output file.

trap 'rm -f "$TempFile"' EXIT	# When the script completes, remove the temp file.

awk -v new="$TempFile" -v dup="$DupFile" '
NR == 1 {
	# Copy the header line from the input file to both output files.
	print > new
	print > dup
	next
}
{	if($1 in seen) {
		# We have seen this person before.  Copy this line to the
		# duplicates file.
		print > dup
	} else {
		# We have not seen this person before.  Copy this line to the
		# temporary file (which will replace the input file when we are
		# done).
		print > new

		# Note that we have seen this person.
		seen[$1]
	}
}' "$InFile" > "$TempFile" && cp "$TempFile" "$InFile"

This was written and tested using a Korn shell, but this should work with any shell that uses basic Bourne shell syntax (including ash. bash, dash, ksh, zsh, and several others; but not csh and its derivatives).

If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Manipulate the columns of 2 files

Hello, I have two files to be treated. First file: col1 col2 col3 col4 Second file: colbis - I try to add the unique column of the file 2 towards the file 1. - To obtain the following result with a shell script ksh: col1 col2 col3 col4 colbis (4 Replies)
Discussion started by: khalidou13
4 Replies

2. Shell Programming and Scripting

Need help to manipulate data using script

Hi i want to manipulate my data to convert row to column name 600 Slno vlan 1 600 2 609 3 700 name 700 Slno vlan 1 600 2 609 3 700 (8 Replies)
Discussion started by: nith_anandan
8 Replies

3. Shell Programming and Scripting

Transposing Repeated Rows to Columns.

I have 1000s of these rows that I would like to transpose to columns. However I would like the transpose every 3 consecutive rows to columns like below, sorted by column 3 and provide a total for each occurrences. Finally I would like a grand total of column 3. 21|FE|41|0B 50\65\78 15... (2 Replies)
Discussion started by: ravzter
2 Replies

4. UNIX for Dummies Questions & Answers

Manipulate and move columns in a file

Hello Unix Gurus, I have a request 2 perform several functions on a file, delete columns, delete rows based on column value, and finally move around columns in the final output. Consider the following input file with 12 columns; ... (1 Reply)
Discussion started by: chumsky
1 Replies

5. Shell Programming and Scripting

Manipulate columns using sed

Hello, I would like to remove the first column of lines beginning by a character (in my case is an open square bracket) and finishing by a space (or any other delimiter). For example: string1 string2 string3 to string2 string3 I found this previous topic: ... (1 Reply)
Discussion started by: stoyanova
1 Replies

6. Shell Programming and Scripting

Converted repeated rows into splitted columns

Dear Friends, I have an input file contains lot of datas, which is like repaeated rows report. The output file need to have column wise report, rather than row-wise. Input File random line 1 random line 2 random line 3 ------------------------------------- Start line 1.1 (9.9) ... (1 Reply)
Discussion started by: vasanth.vadalur
1 Replies

7. Shell Programming and Scripting

how to manipulate with lines while playing with data

hello everyone, well I have a file which contains data, I want to add the data on hourly basis, like my file contains data for 24 hours, (so a total of 1440 ) lines. Now i want to add the data on hourly basis to get average values. like if I use (head) command it is ok for first go, but... (5 Replies)
Discussion started by: jojo123
5 Replies

8. Shell Programming and Scripting

comparing the values of repeated keys in multiple columns

Hi Guyz The 1st column of the input file has repeated keys like x,y and z. The ist task is if the 1st column has unique key (say x) and then need to consider 4th column, if it is + symbol then subtract 2nd column value with 3rd column value (we will get 2(10-8)) or if it is - symbol subtract 3rd... (3 Replies)
Discussion started by: repinementer
3 Replies

9. UNIX for Dummies Questions & Answers

Excel data manipulate

All, I have the following format of data in a spreadsheet A 1 2 3 4 B 1 2 3 4 where 'A' is value of 'A1', '1 2 3 4' is value of cell B1, 'B' is value of cell A2, and '1 2 3 4' is value of cell B2. There... (12 Replies)
Discussion started by: rahulrathod
12 Replies

10. Filesystems, Disks and Memory

manipulate csv file to add columns

Hi, I have a csv file with a key composed by 3 columns and some other numeric fields and I need to obtain the partial amounts by some part of the key. This may be some difficult to understand, so better see an example, where my input file is: name,surname,department,y2004,y2005,y2006... (6 Replies)
Discussion started by: oscarmon
6 Replies
Login or Register to Ask a Question