Please help !!!!Problem with data file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Please help !!!!Problem with data file
# 22  
Old 04-07-2007
Sorry Vgresh,

It's working. Thank you very very much.
# 23  
Old 04-07-2007
Quote:
Originally Posted by vgersh99
to join 'broken' lines....
nawk -f ds.awk myBigFile.txt > myBigFile_new.txt

ds.awk
Code:
BEGIN {
  FS=OFS="|"

  FLD_max=12
}
# NF - number of fields in the current record
# if the number of fields in mthe current record is less than maximum number 
# of fields - execute the action: if "rec" is NOT empty - concatenate it with
# the current record [rec OFS $0$0]; if "rec" IS empty - assign the current
# record/line to it [$0]. Increment the "running" number of fields [fld] by the
# number of the fields in the current record [NF]. Proceed to the next record
# [next].
NF < FLD_max {rec=(rec != "") ? rec OFS $0 : $0; fld+=NF;next }

# if the running number of fields [fld] is greater tan the MAX: output the
# "rec" and initialize "rec" to be empty and the running number of fields to be "0"

fld >= FLD_max { print rec; rec=""; fld=0 }

# if NONE of the above actions could be executed - "print" the current 
# record/line: "1" is a shortname for "print $0"
1

Hi vgersh99,

Even i was facing the same problem,but i used to manually rectify it using Ultra edit.Your code was really helpful and i really thank you for that.

But i do face another problem,i mean everyday i get a file with some 55,000 records.And in some records the last column data breaks and gets into a new line.

9aaa230|Apr 4 2007 11:59:41:903PM|xxxxxxxxx|xxxxxxx|xx|xx|xx|xxxx| |xx|xxx
444xxxxxyyya


the last column data gets onto a new line,and i do get the same error mesg,like not enough vartext fields....what ur prog does is removing the data in the new line....is it possible to just append the data of the new line back to the End of line of the last field of the previous record...like..

9aaa230|Apr 4 2007 11:59:41:903PM|xxxxxxxxx|xxxxxxx|xx|xx|xx|xxxx| |xx|xxx444xxxxxyyya

Thanks again,
Kumar
# 24  
Old 04-07-2007
Quote:
Originally Posted by kumarsaravana_s
Hi vgersh99,

Even i was facing the same problem,but i used to manually rectify it using Ultra edit.Your code was really helpful and i really thank you for that.

But i do face another problem,i mean everyday i get a file with some 55,000 records.And in some records the last column data breaks and gets into a new line.

9aaa230|Apr 4 2007 11:59:41:903PM|xxxxxxxxx|xxxxxxx|xx|xx|xx|xxxx| |xx|xxx
444xxxxxyyya


the last column data gets onto a new line,and i do get the same error mesg,like not enough vartext fields....what ur prog does is removing the data in the new line....is it possible to just append the data of the new line back to the End of line of the last field of the previous record...like..

9aaa230|Apr 4 2007 11:59:41:903PM|xxxxxxxxx|xxxxxxx|xx|xx|xx|xxxx| |xx|xxx444xxxxxyyya

Thanks again,
Kumar
that's what the code is supposed to do for ALL the records.... except for the LAST improperly formatted 'record'.

Actually your requirement (based on your sample input and the desired output] is different for the OPs: the OP wanted to concatenate lines creating a NEW field. Given your sample record the OPs desired output would be:
Code:
9aaa230|Apr  4 2007 11:59:41:903PM|xxxxxxxxx|xxxxxxx|xx|xx|xx|xxxx| |xx|xxx|444xxxxxyyya

You on the other hand just want to concatenate lines.

Here's a new version of the script accounting for the LAST record/line for OP's requirement:
Code:
BEGIN {
  FS=OFS="|"

  FLD_max=12

  stderr="cat 2>&1"
}
NF < FLD_max {printf("Bad record: [%d] :: [%s]\n", FNR, $0) | stderr; rec=(rec != "") ? rec OFS $0 : $0; fld+=NF;next }
fld >= FLD_max { print rec; rec=""; fld=0 }
1
END {
  if (rec != "" && split(rec, a, FS) >= FLD_max ) print rec
}

Here's the modified code for your/Kumar's requirement:
Code:
BEGIN {
  FS=OFS="|"

  FLD_max=12

  stderr="cat 2>&1"
}
NF < FLD_max {printf("Bad record: [%d] :: [%s]\n", FNR, $0) | stderr; rec=(rec != "") ? rec : $0; fld+=NF;next }
fld >= FLD_max { print rec; rec=""; fld=0 }
1
END {
  if (rec != "" && split(rec, a, FS) >= FLD_max ) print rec
}


Last edited by vgersh99; 04-07-2007 at 05:04 PM..
# 25  
Old 04-07-2007
Quote:
Originally Posted by vgersh99
that's what the code is supposed to do for ALL the records.... except for the LAST improperly formatted 'record'.

Actually your requirement (based on your sample input and the desired output] is different for the OPs: the OP wanted to concatenate lines creating a NEW field. Given your sample record the OPs desired output would be:
Code:
9aaa230|Apr  4 2007 11:59:41:903PM|xxxxxxxxx|xxxxxxx|xx|xx|xx|xxxx| |xx|xxx|444xxxxxyyya

You on the other hand just want to concatenate lines.

Here's a new version of the script accounting for the LAST record/line for OP's requirement:
Code:
BEGIN {
  FS=OFS="|"

  FLD_max=12

  stderr="cat 2>&1"
}
NF < FLD_max {printf("Bad record: [%d] :: [%s]\n", FNR, $0) | stderr; rec=(rec != "") ? rec OFS $0 : $0; fld+=NF;next }
fld >= FLD_max { print rec; rec=""; fld=0 }
1
END {
  if (rec != "" && split(rec, a, FS) >= FLD_max ) print rec
}

Here's the modified code for your/Kumar's requirement:
Code:
BEGIN {
  FS=OFS="|"

  FLD_max=12

  stderr="cat 2>&1"
}
NF < FLD_max {printf("Bad record: [%d] :: [%s]\n", FNR, $0) | stderr; rec=(rec != "") ? rec : $0; fld+=NF;next }
fld >= FLD_max { print rec; rec=""; fld=0 }
1
END {
  if (rec != "" && split(rec, a, FS) >= FLD_max ) print rec
}

I dont want to add an additional pipe,i want the data that is on the newline to append to the last column data of the previous record.

The output should be like this...

Code:
9aaa230|Apr  4 2007 11:59:41:903PM|xxxxxxxxx|xxxxxxx|xx|xx|xx|xxxx| |xx|xxx444xxxxxyyya

# 26  
Old 04-07-2007
Kumar,
read my last modified post!

Quote:
Originally Posted by Kumar
I dont want to add an additional pipe,i want the data that is on the newline to append to the last column data of the previous record.
If that's case, then number of fields will be different then the FLD_max set to '12'....
# 27  
Old 04-07-2007
Quote:
Originally Posted by vgersh99
Here's the modified code for your/Kumar's requirement:
Code:
BEGIN {
  FS=OFS="|"

  FLD_max=12

  stderr="cat 2>&1"
}
NF < FLD_max {printf("Bad record: [%d] :: [%s]\n", FNR, $0) | stderr; rec=(rec != "") ? rec : $0; fld+=NF;next }
fld >= FLD_max { print rec; rec=""; fld=0 }
1
END {
  if (rec != "" && split(rec, a, FS) >= FLD_max ) print rec
}


I seem to get some error when i execute the code u sent for me..
> vi new.txt

"new.txt" 11 lines, 902 characters
9429732|Apr 4 2007 12:51:09:063AM| |CREDCHK |0|421|58|2592|PAR| |
9427428|Apr 4 2007 12:00:00:066AM|7736752|NETEVENT |3146628|937|307|5035| |TP|TP06173598941
9427429|Apr 4 2007 12:00:02:560AM|7736744|NETEVENT |3083574|940|765|1304| |TP|TP06173600979
9427430|Apr 4 2007 12:00:03:613AM|7736759|NETEVENT |3146568|781|307|531| |TP|TP06173582
254
9427431|Apr 4 2007 12:00:04:430AM|7736668|NETEVENT |6000177|712|899|2080| |TP|TP0547906557
9427432|Apr 4 2007 12:00:04:580AM|7736747|NETEVENT |1039574|716|957|2806| |TP|TP06173875607
9427433|Apr 4 2007 12:00:07:723AM|7736751|NETEVENT |1039980|646|596|6982| |TP|TP06173873938
9427434|Apr 4 2007 12:00:07:920AM|2799783|NETEVENT |3018155|510|648|4964| |MD|MD0130328
9427435|Apr 4 2007 12:00:08:290AM|2799781|NETEVENT |3022569|713|248|2027| |MD|MD0125661
9427436|Apr 4 2007 12:00:08:616AM|2799782|NETEVENT |3077955|757|345|1839| |MD|MD015546
"new.txt" 11 lines, 902 characters

> nawk -f ds1.awk new.txt > new1.txt
nawk: syntax error at source line 2
context is
<<< >>>
nawk: bailing out at source line 2
>
# 28  
Old 04-07-2007
Quote:
Originally Posted by vgersh99
Kumar,
read my last modified post!


If that's case, then number of fields will be different then the FLD_max set to '12'....
i did change it to 10...
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Replace data of a file with data from another file using shell scripting.

Dears, I'm new to shell scripting and i was wondering if you can help me with following matter. I have a file containing 400,000 records. The file contains two columns like: 00611291,0270404000005453 25262597,1580401000016155 25779812,1700403000001786 00388934,1200408000000880... (1 Reply)
Discussion started by: paniklas
1 Replies

2. Shell Programming and Scripting

Extract header data from one file and combine it with data from another file

Hi, Great minds, I have some files, in fact header files, of CTD profiler, I tried a lot C programming, could not get output as I was expected, because my programming skills are very poor, finally, joined unix forum with the hope that, I may get what I want, from you people, Here I have attached... (17 Replies)
Discussion started by: nex_asp
17 Replies

3. UNIX for Dummies Questions & Answers

Mapping a data in a file and delete line in source file if data does not exist.

Hi Guys, Please help me with my problem here: I have a source file: 1212 23232 343434 ASAS1 4 3212 23232 343434 ASAS2 4 3234 23232 343434 QWQW1 4 1134 23232 343434 QWQW2 4 3212 23232 343434 QWQW3 4 and a mapping... (4 Replies)
Discussion started by: kokoro
4 Replies

4. Shell Programming and Scripting

Three Difference File Huge Data Comparison Problem.

I got three different file: Part of File 1 ARTPHDFGAA . . Part of File 2 ARTGHHYESA . . Part of File 3 ARTPOLYWEA . . (4 Replies)
Discussion started by: patrick87
4 Replies

5. Shell Programming and Scripting

Scan and change file data content problem

Input file >Read_1 XXXXXXXXXXSDFXXXXXDS (condition 1: After the last "X" per line, if the distance is less than or equal to 3 letter, replace those not "X" letter with "X") TREXXXXXXXSDFXXXXXDS (condition 2: Before the first "X" per line, if the distance is less than or equal to 3 letter,... (12 Replies)
Discussion started by: patrick87
12 Replies

6. Shell Programming and Scripting

Compare and print out data only appear in file 1 problem

Below is the data content of file_1 and file_2: file_1 >sample_1 FKGJGPOPOPOQA ASDADWEEWERE ASDAWEWQWRW ASDASDASDASDD file_2 >sample_1 DRTOWPFPOPOQA ASDADWEEASDF ASDADRTYWRW ASDASDASDASDD I got try the following perl script. Unfortunately, it can't give my desired output result... (7 Replies)
Discussion started by: patrick87
7 Replies

7. Shell Programming and Scripting

Find and replace data in text file with data in same file

OK I will do my best to explain what I need help with. I am trying to format an ldif file so I can import it into Oracle oid. I need the file to look like this example. Keep in mind there are 3000 of these in the file. changetype: modify replace: userpassword dn:... (0 Replies)
Discussion started by: timothyha22
0 Replies

8. Shell Programming and Scripting

C Shell problem: using a key from one file to find data in another

I've never written scripts (just switched from Ada to C++). I have a book that's over my head and a few examples, other then that I'm floundering. Everything here at work is being done in C Shell. None of the C++ programmers are experienced in shell scripting. I have a data file with the... (2 Replies)
Discussion started by: bassmaster
2 Replies

9. Shell Programming and Scripting

Problem in writing the data to a file in one row

Hi All I am reading data from the database and writing to temporary file in the below format. 1=XP|external_component|com.adp.meetingalertemail.processing.MeetingAlertEmail|EMAILALERTPUSH|32|4#XP |classpath|/usr/home/dfusr/lib/xalan.jar: /usr/home/dfusr/lib/xerces.jar: ... (2 Replies)
Discussion started by: rajeshorpu
2 Replies

10. Shell Programming and Scripting

Problem getting data to a report file.

Hi all, I'm trying in vain to workout how I can generate a report from a months worth of files that get created every day. There is one file per day and each daily file contain the output from a df -v command. With the following section of code ... for xdffile in $1$2/df?? do ... (4 Replies)
Discussion started by: Cameron
4 Replies
Login or Register to Ask a Question