Help with file processing


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Help with file processing
# 1  
Old 06-23-2015
Help with file processing

I have a fixed width file coming from source system. The total characters on each record is 786.

I am getting records in the file with less than and greater than this number and the process is failing bcs of this length of the record

Is there a way to spit out the bad records (length <>786) into a bad file and process the good records?

I am on linux.
# 2  
Old 06-23-2015
Is the fixed width in bytes or in characters?
What character set are you using?
What shell are you using?
What operating system are you using?

On systems with a version of awk that conforms to the standards:
Code:
awk 'length($0) != 786' file

will print all lines in file that do not contain 786 characters (not counting the terminating <newline> character). If you want the count to include the <newline> character, change the 786 in that script to 785.

On some systems, the awk length() function incorrectly counts bytes instead of counting characters. If the character set you're using contains multi-byte characters (such as UTF-8), the number of bytes in a line may vary even though the number of characters is constant.
This User Gave Thanks to Don Cragun For This Post:
# 3  
Old 06-23-2015
Please see my responses below

Is the fixed width in bytes or in characters?
Quote:
---Characters
What character set are you using?
Quote:
UTF-8
What shell are you using?
Quote:
KSH
What operating system are you using?
Quote:
Linux
# 4  
Old 06-23-2015
Code:
#Allowing the new line to be included in the 786 characters total.
perl -ne 'length != 786 and print' file_to_process > bad_lines_file

#Removing new lines from the 786 characters total.
perl -nle 'length != 786 and print' file_to_process > bad_lines_file

# 5  
Old 06-24-2015
Quote:
Originally Posted by Don Cragun
[..]

On some systems, the awk length() function incorrectly counts bytes instead of counting characters. If the character set you're using contains multi-byte characters (such as UTF-8), the number of bytes in a line may vary even though the number of characters is constant.
Incredibly, both BSD awk and mawk suffer from this Smilie , which is clearly a bug (mawk is supposed to be SUS v2 conformant)..

gawk and /usr/xpg4/bin/awk on Solaris functions correctly though (nawk does not, but it is not POSIX compliant so I would not expect it to).
# 6  
Old 06-24-2015
Quote:
Originally Posted by Scrutinizer
Incredibly, both BSD awk and mawk suffer from this Smilie , which is clearly a bug (mawk is supposed to be SUS v2 conformant)..

gawk and /usr/xpg4/bin/awk on Solaris functions correctly though (nawk does not, but it is not POSIX compliant so I would not expect it to).
Mac OS X picks up most of its utilities from BSD, and most of the OS X utilities do adhere to the POSIX standards requirements even in places where BSD utilities sometimes fail. The awk utility, however, fails on at least two counts:
  1. length() should count characters; but instead it counts bytes, and
  2. awk -v variable=value 'script' and awk -vvariable=value 'script' should be treated exactly the same, but the 1st form works and the 2nd form gives a syntax error.
At least both bash and ksh on OS X correctly count characters (not bytes) with ${#variable}.
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

awk - Rename output file, after processing, same as input file

I have one input file ABC.txt and one output DEF.txt. After the ABC is processed and created output, I want to rename ABC.txt to ABC.orig and DEF to ABC.txt. Currently when I am doing this, it does not process the input file as it cannot read and write to the same file. How can I achieve this? ... (12 Replies)
Discussion started by: High-T
12 Replies

2. Programming

awk processing / Shell Script Processing to remove columns text file

Hello, I extracted a list of files in a directory with the command ls . However this is not my computer, so the ls functionality has been revamped so that it gives the filesizes in front like this : This is the output of ls command : I stored the output in a file filelist 1.1M... (5 Replies)
Discussion started by: ajayram
5 Replies

3. Shell Programming and Scripting

Recursive file processing from a path and printing output in a file

Hi All, The script below read the path and searches for the directories/subdirectories and for the files. If files are found in the sub directories then read the content of the all files and put the content in csv(comma delimted) format and the call the write to xml function to write the std... (1 Reply)
Discussion started by: Optimus81
1 Replies

4. Shell Programming and Scripting

File Processing

i am having the input file as below 123456789: xxxxx12xxxxxxxxxxxxxxxxxx a_cnt 123456789: xxxxxxxxxxxxxxxxxxxxxxx a_cnt 123456789: a_cnt xxxxaq1wsxxxxxxxxxxxx12xxxxxxxxxx 123456789: xxxxxxxxxxxxasxxxx a_cnt i need the numbers in the backets of a_cnt O/p required as below 1 2 3 4... (2 Replies)
Discussion started by: expert
2 Replies

5. Shell Programming and Scripting

How to make parallel processing rather than serial processing ??

Hello everybody, I have a little problem with one of my program. I made a plugin for collectd (a stats collector for my servers) but I have a problem to make it run in parallel. My program gathers stats from logs, so it needs to run in background waiting for any new lines added in the log... (0 Replies)
Discussion started by: Samb95
0 Replies

6. Shell Programming and Scripting

How to processing the log file within certain dates based on the file name

Hi I am working on the script parsing specific message "TEST" from multiple file. The log file name looks like: N3.2009-11-26-03-05-02.console.log.tar.gz N4.2009-11-29-00-25-03.console.log.tar.gz N6.2009-12-01-10-05-02.console.log.tar.gz I am using the following command: zgrep -a --text... (1 Reply)
Discussion started by: shyork2001
1 Replies

7. Shell Programming and Scripting

how to change the current file processing to some other random file in awk ?

Hello, say suppose i am processing an file emp.dat the field of which are deptno empno empname etc now say suppose i want to change the file to emp.lst then how can i do it? Here i what i attempted but in vain BEGIN{ system("sort emp.dat > emp.lst") FILENAME="emp.lst" } { print... (2 Replies)
Discussion started by: salman4u
2 Replies

8. Shell Programming and Scripting

Checking for a control file before processing a data file

Hi All, I am very new to Shell scripting... I got a requirement. I will have few text files(data files) in a particular directory. they will be with .txt extension. With same name, but with a different extension control files also will be there. For example, Sample_20081001.txt is the data... (4 Replies)
Discussion started by: purna.cherukuri
4 Replies

9. Shell Programming and Scripting

Have a shell script check for a file to exist before processing another file

I have a shell script that runs all the time looking for a certain type of file and then it processes the file through a series of other scripts. The script is watching a directory that has files uploaded to it via SFTP. It already checks the size of the file to make sure that it is not still... (3 Replies)
Discussion started by: heprox
3 Replies
Login or Register to Ask a Question