Proper Column wise matching


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Proper Column wise matching
# 1  
Old 08-11-2016
Proper Column wise matching

My below code works fine if none of the columns has pipe as its content in it, If it has the pipe in any of the content then the value moves to the next column.

I wanted my code to work fine even if the column has pipe in it apart from the delimiter.

NOTE : If there is a pipe in the content apart from the delimiter it is been escaped by \(backslash)

Code:
Code:
#set -x
awk  '
NR==1 {for (cc=1; cc<=NF; cc++) n[$cc]=$cc; t=$0; next;}
{
   if ($1 != '0') c[1]++;
   for (i=2; i<=NF; i++) if ($i != "NA" && $i != "null" && $i != "") c[i]++;
}
END {
   print t;
   --NR
   r="";
   for (i=1 ; i<cc; i++) {
      p=(c[i]/NR)*100;
      r=(i == 1) ? "" p : r OFS p;
   }
   print r
}
' FS="|" OFS="|" $1

# 2  
Old 08-11-2016
Not sure I understand what you are up to. How about a decent input sample, the desired result, and the logics connecting them?

To ignore escaped delimiters, replace them by a token upfront, work on the modified file, and then reverse the replacement.
# 3  
Old 08-12-2016
Code:
[sdp@blr-qe101 .nikhil]$ sh filler.sh c10.txt 
unique_bank_transaction_id|merchant name_GT|MERCHANT_NAME_TDE|output
100|100|100|100
[sdp@blr-qe101 .nikhil]$ sh filler.sh 10.txt 
unique_bank_transaction_id|merchant name_GT|MERCHANT_NAME_TDE|output
100|100|100|100

Code:
cat 10.txt 
unique_bank_transaction_id|merchant name_GT|MERCHANT_NAME_TDE|output
076679010|WALMART|Walmart|TP
2242937867|PUBLIX SUPER MARKETS INC|Publix Super Markets|TP
100441566|CHICK-FIL-A|Chick|jacke|TP
1000549208|BURLINGTON - BURLINGTON COAT FACTORY|Burlington Coat Factory|TP
1000146040284|ABERCROMBIE & FITCH|Abercrombie & Fitch|TP
1000146428873|ABERCROMBIE & FITCH|Abercrombie & Fitch|TP
1000539406|ABERCROMBIE & FITCH|Abercrombie & Fitch|TP
10005847326|ABERCROMBIE & FITCH|Abercrombie & Fitch|TP
100056070|ABERCROMBIE & FITCH|Abercrombie & Fitch|TP



Code:
[sdp@blr-qe101 .nikhil]$ cat c10.txt  
unique_bank_transaction_id|merchant name_GT|MERCHANT_NAME_TDE|output
076679010|WALMART|Walmart|TP
2242937867|PUBLIX SUPER MARKETS INC|Publix Super Markets|TP
100441566|CHICK-\|FIL-A|Chick||TP
1000549208|BURLINGTON - BURLINGTON COAT FACTORY|Burlington Coat Factory|TP
1000146040284|ABERCROMBIE & FITCH|Abercrombie & Fitch|TP
1000146428873|ABERCROMBIE & FITCH|Abercrombie & Fitch|TP
1000539406|ABERCROMBIE & FITCH|Abercrombie & Fitch|TP
10005847326|ABERCROMBIE & FITCH|Abercrombie & Fitch|TP
100056070|ABERCROMBIE & FITCH|Abercrombie & Fitch|TP




Moderator's Comments:
Mod Comment Please use CODE tags CORRECTLY!
And, by the way, do you countercheck what you posted?


---------- Post updated 08-12-16 at 01:57 PM ---------- Previous update was 08-11-16 at 06:48 PM ----------

any one can plz help? In the above content, If u observe the BOLD one, You would realise that there is a extra pipe in it.

My query here is, If there is a extra pipe with the backslash (\|) It should be ignored not considered as the next column

Last edited by RudiC; 08-11-2016 at 11:09 AM.. Reason: Changed CODE tags.
# 4  
Old 08-12-2016
Did you try the hint given?
# 5  
Old 08-12-2016
Rudi,

It is a huge file of some 8 GB's, the prob is we have constraint of space.. Hence can't try...
# 6  
Old 08-12-2016
Uhm where is the problem to try it with a short example like you have already given? It comes to principle about the problem, not to process an 8GB file...
# 7  
Old 08-17-2016
Zaxxon,

I'll try implementing if u give the solution for small file as well.
Plz help
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Column wise text adding

Hi I have pasted sample data as below:- in data.txt Please suggest any way out: as the 3rd field is cat data.txt 22:37:34 STARTING abc 22:37:40 FAILURE sadn 00:06:42 STARTING asd 00:06:51 FAILURE ad 02:06:38 STARTING acs 02:06:46 FAILURE cz 04:06:35 STARTING xzc... (1 Reply)
Discussion started by: Gaurav198
1 Replies

2. Shell Programming and Scripting

awk-gsub on column-wise on each row

awk '{ gsub(/....=/,""); print }' want.dat >final.dat the above awk command which removes all the chars before and including '=' on the entire row. --thats what it meant be.:) but i need to remove text on column-wise on each row. many thanks, EM ---------- Post updated at 10:00 AM... (4 Replies)
Discussion started by: elamurugu
4 Replies

3. Shell Programming and Scripting

Search a file column wise and delete it

Scottn, m really sorry but i have not got my answer yet. my concern is how to delete the row !!! i have a file which has a column that is unique i am intending to serach it and if it is there to remove the row. the file looks like ROLLNO,NAME ,SUB1,SUB2,SUB3,TOTAL,PERCENTAGE,RESULT... (9 Replies)
Discussion started by: gotam
9 Replies

4. Shell Programming and Scripting

Search a file column wise and delete it

i have a file which has a column that is unique i am intending to serach it and if it is there to remove the row. the file looks like ROLLNO,NAME ,SUB1,SUB2,SUB3,TOTAL,PERCENTAGE,RESULT 15 ,rig ,34 ,56 ,87 ,177 ,59 % ,PASS 23 ,wel ,45 ,76 ,56 ,177 ,59 % ... (0 Replies)
Discussion started by: gotam
0 Replies

5. Shell Programming and Scripting

Sum of column by group wise

Hello All , I have a problem with summing of column by group Input File - COL_1,COL_2,COL_3,COL_4,COL_5,COL_6,COL_7,COL_8,COL_9,COL_10,COL_11 3010,21,1923D ,6,0,0.26,0,0.26,-0.26,1,200807 3010,21,192BI ,6,24558.97,1943.94,0,1943.94,22615.03,1,200807 3010,21,192BI... (8 Replies)
Discussion started by: jambesh
8 Replies

6. UNIX for Advanced & Expert Users

How to compare two text files in column wise?

Hi All, I have two txt files like this File1: no name ---------- 12 aaaa 23 bbbb 55 cccc File2 dname dno ------------ civil 33 mech 55 arch 66 Now i want to compare col1 from File and col2 from File2, if its match i want fetch all columns from... (3 Replies)
Discussion started by: psiva_arul
3 Replies

7. Solaris

column wise substitution in a file

Hi, I have two files. Want to make an addition of the fifth column of from both the files and redirect it to a third file. Both files have same records except fifth field and same record should be inserted into new file having fifth field as addition of fifth fields of both files. for... (2 Replies)
Discussion started by: sanjay1979
2 Replies

8. Shell Programming and Scripting

Column wise file parsing.

Shell script for the below operation : File "A" contains : SEQ++1' MOA+9:000,00:ABC' RFF+AIK:000000007' FII+PH+0170++AA' NAD+PL+++XXXXXXXXXXX XXXXXXX XX++XXX XXXX XXXX X.X. XXXXXXXXX+++NL' SEQ++2' MOA+9:389,47:ABC' RFF+AIK:02110300000008' FII+PH+0PSTBNL2A:25:5+BB'... (5 Replies)
Discussion started by: navojit dutta
5 Replies

9. Shell Programming and Scripting

o/p column wise by nawk

hi i have file which hav following entries 1501,AAA,2.00 1525,AAA,2.00 1501,AAA,2.00 1525,AAA,2.00 1501,AAA,3.00 1525,AAA,3.00 1525,AAA,3.00 1501,AAA,3.00 1501,AAA,3.00 i want to have a o/p coloum wise like 1501,AAA,13 1525,AAA,10 here 13 comes as a sum of last colum value... (6 Replies)
Discussion started by: aaysa123
6 Replies

10. Shell Programming and Scripting

processing matrix column wise

I have a m X n matrix written out to file, say like this: 1,2,3,4,5,6 2,6,3,10,34,67 1,45,6,7,8,8 I want to calculate the column averages in the MINIMUM amount of code or processing possible. I would have liked to use my favorite tool, "AWK" but since it processes rowwise, getting the... (5 Replies)
Discussion started by: Abhishek Ghose
5 Replies
Login or Register to Ask a Question