Sponsored Content
Top Forums Shell Programming and Scripting Clense Junk Data File - Using Shell or awk or sed Post 302075869 by jim mcnamara on Wednesday 7th of June 2006 10:00:45 AM
Old 06-07-2006
Quote:
move this to after #3 check the pipe count
-Step -1 Replace all pipes ‘|' within the file with space ‘ ‘
Code:
 tr -s '|' ' ' < oldfile > newfile

Quote:
-Step - 2 Remove Special Character and junk data within the file - Tricky part is we do not have a defined set of special / junk character. Solution would be to remove any character that's not a part of the keyboard stroke.

Remove Character NOT IN [ A-Z, a-z, 0-9, `,~, !, @, #, $, %, &, *, (, ), _, -, + ,=, .,",',:,;,{,},[,],<,>,?,/,\,|,, )
Code:
sed 's/^A-Za-z0-9, `~!@#$%&*()_-+=."\|':;{}\[\]<>\?\/\\//g' filename > newfile

Quote:
- Step - 3 Check the count of pipes on each line of the data to make sure we have the correct number. I would receive 4 pipes on each line. Which means if there are less we need to keep pading the next line ( concat the below lines ). This fields is basicall a memo where the user would have typed a small paragraph that needs to be joined into a single line.
Not sure about this step....
Quote:
-Step - 4 Replace all zzz with pipe ‘|'
Code:
sed 's/zzz/|/g' oldfile > newfile

 

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

Shell Script to clense junk data file

Hello Shell Gurus i need help in solving this puzzle. We have a junk data file that needs to be fed into the database. Need to clense the data file thru shell script. I am not a expert and so need help with Here is what i need to do on the input file -Step -1 Replace all pipes ‘|' within... (0 Replies)
Discussion started by: rimss
0 Replies

2. Shell Programming and Scripting

Big data file - sed/grep/awk?

Morning guys. Another day another question. :rolleyes: I am knocking up a script to pull some data from a file. The problem is the file is very big (up to 1 gig in size), so this solution: for results in `grep "^\ ... works, but takes ages (we're talking minutes) to run. The data is held... (8 Replies)
Discussion started by: dlam
8 Replies

3. Shell Programming and Scripting

sed or awk to extract data from Xml file

Hi, I want to get data from Xml file by using sed or awk command. I want to get the following result : mon titre 1;Createur1;Dossier1 mon titre 1;Createur1;Dossier1 and save it in cvs file (fichier.cvs). FROM this Xml file (test.xml): <playlist version="1"> <trackList> <track>... (1 Reply)
Discussion started by: yeclota
1 Replies

4. Shell Programming and Scripting

formatting data file with awk or sed

Hi, I have a (quite large) data file which looks like: _____________ header part.. more header part.. x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 ... ... x59 x60 y1 y2 y3 y4... ... y100 ______________ where x1, x2,...,x60 and y1, y2,...y100 are numbers of 10 digits (so each line... (5 Replies)
Discussion started by: lego
5 Replies

5. Shell Programming and Scripting

how to get data from hex file using SED or AWK based on pattern sign

I have a binary (hex) file I need to parse to get some data which are encoded this way: .* b4 . . . 01 12 .* af .* 83 L1 x1 x2 xL 84 L2 y1 y2 yL By another words there is a stream of hexadecimal bytes (in my example separated by space for better readability). I need to get value stored in... (3 Replies)
Discussion started by: sameucho
3 Replies

6. Shell Programming and Scripting

AWK, Perl or Shell? Unique strings and their maximum values from 3 column data file

I have a file containing data like so: 2012-01-02 GREEN 4 2012-01-02 GREEN 6 2012-01-02 GREEN 7 2012-01-02 BLUE 4 2012-01-02 BLUE 3 2012-01-02 GREEN 4 2012-01-02 RED 4 2012-01-02 RED 8 2012-01-02 GREEN 4 2012-01-02 YELLOW 5 2012-01-02 YELLOW 2 I can't always predict what the... (4 Replies)
Discussion started by: rich@ardz
4 Replies

7. Shell Programming and Scripting

AWK/Shell script for formatting data in a file

Hi All, Need an urgent help to convert a unix file in to a particular format: **source file:** 1111111 2d2f2h2 3dfgsd3 ........... 1111111 <-- repeats in every nth line. remaining all lines will be different 123ss41 432ff45 ........... 1111111 <-- repetition qwe1234 123weq3... (1 Reply)
Discussion started by: rajivnairfis
1 Replies

8. Shell Programming and Scripting

awk - sed / reading from a data file and doing algebraic operations

Hi everyone, I am trying to write a bash script which reads a data file and does some algebraic operations. here is the structure of data.xml file that I have; 1 <data> 2 . 3 . 4 . 5 </data> 6 <data> 7 . 8 . 9 . 10</data> etc. Each data block contains same number of lines (say... (4 Replies)
Discussion started by: hayreter
4 Replies

9. UNIX for Dummies Questions & Answers

Remove untagged and junk data from an XML

Hi All , I have seen a lot of code samples which suggest how to remove the junk data from and XML , I need a code in unix which removes the junk characters as well as the valid characters those are not in XML tags , for example my XML is as follows : <?xml version="1.0"... (6 Replies)
Discussion started by: IshuGupta
6 Replies

10. UNIX for Advanced & Expert Users

Need Optimization shell/awk script to aggreagte (sum) for all the columns of Huge data file

Optimization shell/awk script to aggregate (sum) for all the columns of Huge data file File delimiter "|" Need to have Sum of all columns, with column number : aggregation (summation) for each column File not having the header Like below - Column 1 "Total Column 2 : "Total ... ...... (2 Replies)
Discussion started by: kartikirans
2 Replies
LESSECHO(1)						      General Commands Manual						       LESSECHO(1)

NAME
lessecho - expand metacharacters SYNOPSIS
lessecho [-ox] [-cx] [-pn] [-dn] [-mx] [-nn] [-ex] [-a] file ... DESCRIPTION
lessecho is a program that simply echos its arguments on standard output. But any metacharacter in the output is preceded by an "escape" character, which by default is a backslash. OPTIONS
A summary of options is included below. -ex Specifies "x", rather than backslash, to be the escape char for metachars. If x is "-", no escape char is used and arguments con- taining metachars are surrounded by quotes instead. -ox Specifies "x", rather than double-quote, to be the open quote character, which is used if the -e- option is specified. -cx Specifies "x" to be the close quote character. -pn Specifies "n" to be the open quote character, as an integer. -dn Specifies "n" to be the close quote character, as an integer. -mx Specifies "x" to be a metachar. By default, no characters are considered metachars. -nn Specifies "n" to be a metachar, as an integer. -fn Specifies "n" to be the escape char for metachars, as an integer. -a Specifies that all arguments are to be quoted. The default is that only arguments containing metacharacters are quoted SEE ALSO
less(1) AUTHOR
This manual page was written by Thomas Schoepf <schoepf@debian.org>, for the Debian GNU/Linux system (but may be used by others). Send bug reports or comments to bug-less@gnu.org. Version 458: 04 Apr 2013 LESSECHO(1)
All times are GMT -4. The time now is 12:03 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy