Read values in each col starting 3rd row.Print occurrence value.


 
Thread Tools Search this Thread
Homework and Emergencies Emergency UNIX and Linux Support Read values in each col starting 3rd row.Print occurrence value.
# 1  
Old 04-10-2016
Read values in each col starting 3rd row.Print occurrence value.

Hello Friends,

Hope all are doing fine.

Here is a tricky issue.

my input file is like this
Code:
07 10 14 20 21
03 15 27 30 32
01 10 11 19 30
02 06 14 15 17
01 06 20 25 29


Logic:
1. Please print another column as "0-0-0-0-0" for the first and second rows.
2. Read the first column of third row, which is 1. Look for this value in all columns of first and second row. 1 is not present in first or second rows, so print a value 2 for this.
3. Then read the second column of third row, which is 10. There is 10 in first and not in second rows. So, it basically skipped the second row only. Now print a value of 1.
4. Then read the third column of third row, which is 11. 11 does not appear in first or second rows, so print a value of 2.
5. 19 has no appearances in first or second rows, so its value will be 2.
6. 30 did not appear in first row, but it appears in second row, which is the IMMEDIATE row of the current row that is being read which is row no.3. So, since no rows were skipped, we will print 0 for this one

So far the output will be

Code:
07 10 14 20 21 0-0-0-0-0
03 15 27 30 32 0-0-0-0-0
01 10 11 19 30 2-1-2-2-0
02 06 14 15 17
01 06 20 25 29

Logic Continued:
7. Now read the first column of fourth row, which is 2. Look for this value across anywhere in all the three rows above. Since 2 is not present, we will print a value 3 for this. Because it is not present in the three rows above.
8. Now read the second column of 4th row, which is 6. 6 is also not present across any of the three rows above. So its value will also be 3.
9. Read the third column of 4th row, which is 14. 14 is present in first row only but not in the second or third rows, so it skipped two rows. So print a value of 2 for this.
10. Read forth column of 4th row, which is 15. It is not present in first row. Fine. It is present in second row and not in third row. It basically skipped one IMMEDIATE row which is the third. We don't really care for the first row here. All we worry about is the number of times a value skipped after it appeared in the input file. So, the value for 15 will be 1.
11. Read last column of 4th row, which is 17. It is not present in any of the three rows above. So, we will print 3. Basically, if a value is not present across all top rows of the current row being considered, we HYPOTHESIZE that this value was PRESENT before the first line of the input. That is the reason we are printing 3 for the values that are not seen in any of the rows above row number 4.

So far, the output looks like this

Code:
07 10 14 20 21 0-0-0-0-0
03 15 27 30 32 0-0-0-0-0
01 10 11 19 30 2-1-2-2-0
02 06 14 15 17 3-3-2-1-3
01 06 20 25 29

Logic Continued:
12. Now the last row's first value which is 01. This iss present in third row and skipped the 4th. So, its value will be 1. If you see a value present in any row above the current row, then you DONT have to move any way further up because you have already seen that value.
13. Second column of last row, which is 06. This is present in 4th row. So, the value will be zero and DO NOT check any lines above because a value has been encountered.
14. Third column of last row, which is 20. It is present in first row but not in second, third or fourth rows. So, it skipped three rows. Print a value of 3 for this.
15. Forth column of last row, which is 25. This is not present anywhere. "Remember our hypothesis - this value occurred before the first line". So, we are printing 4 for this.
16. Fifth column of last row, which is 29. Present nowhere. So, print a value of 4.

Here is the final output

Code:
07 10 14 20 21 0-0-0-0-0
03 15 27 30 32 0-0-0-0-0
01 10 11 19 30 2-1-2-2-0
02 06 14 15 17 3-3-2-1-3
01 06 20 25 29 1-0-3-4-4

I would also like to have the frequency of unique numbers in the output column like this here

Code:
0=12times
1=3times
2=4times
3=4times
4=2times

Please ask me any questions or comments in case of any doubt.

P.S:
a. My columns are always 5.
b. My input file always has 25 records only.
c. A bonus of 5000 bits will be awarded to the best working solution.

Thank You!

Last edited by jacobs.smith; 04-10-2016 at 04:03 PM.. Reason: code tags format
# 2  
Old 04-10-2016
Why is the column added to row 2 always filled with 0-0-0-0-0? Why aren't entries in that row set to 1 if the number in a given column in row 2 is not present in row 1? In the given example, why shouldn't the last field in the output for row 2 be 1-1-1-1-1?

Other than being an interesting puzzle, does this problem address some real-world issue?
# 3  
Old 04-10-2016
In the secondary output:
Code:
0=12times
1=3times
2=4times
3=4times
4=2times

where do these numbers come from?

If you're counting the number of times a digit appears in the input, 0 occurs 13 times (not 12 times) in your sample input. If you're counting the number of times a value appears in your sample input, 0 (or 00) does not appear at all???

All of your input values are two digit strings. Are we supposed to treat 01 and 1 as the same value or as distinct values? If they are the same, is 010 to be treated as an octal value (decimal 8) or as a decimal value (10)?
# 4  
Old 04-10-2016
Assuming that data values are strings (not numbers that need to be converted to a canonical format), and that you want a count of the number of times a string appears in your input file, the following awk script seems to come close to what you said you wanted:
Code:
awk '
{	for(i = 1; i <= NF; i++) {
		c[$i]++
		if(NR > 2)
			lf = ((i > 1) ? lf "-" : "") NR - lr[$i] - 1
		else	lf = (i > 1) ? lf "-0" : "0"
	}
	for(i = 1; i <= NF; i++)
		lr[$i] = NR
	print $0, lf
}
END {	cmd = "sort -t="
	printf("\n%d rows containing %d columns processed.\n\n", NR, NF)
	for(i in c)
		printf("%s=%dtime%s\n", i, c[i], (c[i] == 1) ? "" : "s") | cmd
	close(cmd)
}' file

producing the following output from your sample data:
Code:
07 10 14 20 21 0-0-0-0-0
03 15 27 30 32 0-0-0-0-0
01 10 11 19 30 2-1-2-2-0
02 06 14 15 17 3-3-2-1-3
01 06 20 25 29 1-0-3-4-4

5 rows containing 5 columns processed.

01=2times
02=1time
03=1time
06=2times
07=1time
10=2times
11=1time
14=2times
15=2times
17=1time
19=1time
20=2times
21=1time
25=1time
27=1time
29=1time
30=2times
32=1time

(although if I were specifying the output format I'd put spaces around the equal signs and before the "time" in the secondary output.

As always, if you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk.
This User Gave Thanks to Don Cragun For This Post:
# 5  
Old 04-11-2016
Quote:
Originally Posted by Don Cragun
Assuming that data values are strings (not numbers that need to be converted to a canonical format), and that you want a count of the number of times a string appears in your input file, the following awk script seems to come close to what you said you wanted:
Code:
awk '
{	for(i = 1; i <= NF; i++) {
		c[$i]++
		if(NR > 2)
			lf = ((i > 1) ? lf "-" : "") NR - lr[$i] - 1
		else	lf = (i > 1) ? lf "-0" : "0"
	}
	for(i = 1; i <= NF; i++)
		lr[$i] = NR
	print $0, lf
}
END {	cmd = "sort -t="
	printf("\n%d rows containing %d columns processed.\n\n", NR, NF)
	for(i in c)
		printf("%s=%dtime%s\n", i, c[i], (c[i] == 1) ? "" : "s") | cmd
	close(cmd)
}' file

producing the following output from your sample data:
Code:
07 10 14 20 21 0-0-0-0-0
03 15 27 30 32 0-0-0-0-0
01 10 11 19 30 2-1-2-2-0
02 06 14 15 17 3-3-2-1-3
01 06 20 25 29 1-0-3-4-4

5 rows containing 5 columns processed.

01=2times
02=1time
03=1time
06=2times
07=1time
10=2times
11=1time
14=2times
15=2times
17=1time
19=1time
20=2times
21=1time
25=1time
27=1time
29=1time
30=2times
32=1time

(although if I were specifying the output format I'd put spaces around the equal signs and before the "time" in the secondary output.

As always, if you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk.
Don, Thank you.

You are really a Don!!!!

Your counting at the end is much more comprehensive than what I had thought.

It's all biology related. Definitely real time. Thank you.

I will be sending you the bonus right away.

Thanks a lot once again.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to mark the row based on col value.?

Hi Gurus, I have requirement to identify the records based on one column value. the sample file as below: ID AMT, AMT1 100,10, 2 100,20, 3 200,30, 0 200, 40, 0 300, 20, 2 300, 50, 2 400, 20, 1 400, 60, 0 for each ID, there 2 records, if any one record amt1 is 0, the in 4th col add... (5 Replies)
Discussion started by: ken6503
5 Replies

2. Shell Programming and Scripting

UNIX help to print 50 lines after every 3rd occurrence pattern till end of file

I need help with extract/print lines till stop pattern. This needs to happen after every 3rd occurrence of start pattern and continue till end of file. Consider below is an example of the log file. my start pattern will be every 3rd occurrence of ERROR_FILE_NOT_FOUND and stop pattern will be... (5 Replies)
Discussion started by: NSS
5 Replies

3. Shell Programming and Scripting

Read row number from 1 file and print that row of second file

Hi. How can I read row number from one file and print that corresponding record present at that row in another file. eg file1 1 3 5 7 9 file2 11111 22222 33333 44444 55555 66666 77777 88888 99999 (3 Replies)
Discussion started by: Abhiraj Singh
3 Replies

4. Shell Programming and Scripting

Modifying col values based on another col

Hi, Please help with this. I have several excel files (with and .xlsx format) with 10-15 columns each. They all have the same type of data but the columns are not ordered in the same way. Here is a 3 column example. What I want to do add the alphabet from column 2 to column 3, provided... (9 Replies)
Discussion started by: newbie83
9 Replies

5. Shell Programming and Scripting

Change col to row using shell script..Very Complex

Hi guys I have file A with Below Data ABC123 X1 X2 X3 ABC123 Y1 Y33 Y4 ABC123 Z1 ZS2 ZL3 ABC234 P1 PP3 PP9 ABC234 Q1 ABC234 R1 P09 PO332 PO331 OKI12 .. .. .. Now I want file B as below ABC123 X1 X2 X3;Y1 Y33 Y4;Z1 ZS2 ZL3 ABC234 P1 PP3 PP9;Q1;R1 P09 PO332 PO331 OKI12... (1 Reply)
Discussion started by: asavaliya
1 Replies

6. UNIX for Dummies Questions & Answers

how to join files with diff col # and row #?

I am a new user of Unix/Linux, so this question might be a bit simple! I am trying to join two (very large) files that both have different # of cols and rows in each file. I want to keep 'all' rows and 'all' cols from both files in the joint file, and the primary key variables are in the rows.... (1 Reply)
Discussion started by: BNasir
1 Replies

7. UNIX for Advanced & Expert Users

Print line based on highest value of col (B) and repetion of values in col (A)

Hello everyone, I am writing a script to process data from the ATP world tour. I have a file which contains: t=540 y=2011 r=1 p=N409 t=540 y=2011 r=2 p=N409 t=540 y=2011 r=3 p=N409 t=540 y=2011 r=4 p=N409 t=520 y=2011 r=1 p=N409 t=520 y=2011 r=2 p=N409 t=520 y=2011 r=3 p=N409 The... (4 Replies)
Discussion started by: imahmoud
4 Replies

8. Ubuntu

Match col 1 of File 1 with col 1 File 2 and create a 3rd file

Hello, I have a 1.6 GB file that I would like to modify by matching some ids in col1 with the ids in col 1 of file2.txt and save the results into a 3rd file. For example: File 1 has 1411 rows, I ignore how many columns it has (thousands) File 2 has 311 rows, 1 column Would like to... (7 Replies)
Discussion started by: sogi
7 Replies

9. Shell Programming and Scripting

Awk to print distinct col values

Hi Guys... I am newbie to awk and would like a solution to probably one of the simple practical questions. I have a test file that goes as: 1,2,3,4,5,6 7,2,3,8,7,6 9,3,5,6,7,3 8,3,1,1,1,1 4,4,2,2,2,2 I would like to know how AWK can get me the distinct values say for eg: on col2... (22 Replies)
Discussion started by: anduzzi
22 Replies

10. Shell Programming and Scripting

Print starting 3rd line until end of the file.

Hi, I want to Print starting 3rd line until end of the file. Pls let me know the command. Thanks in advance. (1 Reply)
Discussion started by: smc3
1 Replies
Login or Register to Ask a Question