Sponsored Content
Top Forums Shell Programming and Scripting Clense Junk Data File - Using Shell or awk or sed Post 302075799 by rimss on Wednesday 7th of June 2006 12:12:17 AM
Old 06-07-2006
Clense Junk Data File - Using Shell or awk or sed

Hello Shell Gurus i need help in solving this puzzle. We have a junk data file that needs to be fed into the database. Need to clense the data file thru shell script. I am not a expert and so need help with

Here is what i need to do on the input file

-Step -1 Replace all pipes ‘|' within the file with space ‘ ‘

-Step - 2 Remove Special Character and junk data within the file - Tricky part is we do not have a defined set of special / junk character. Solution would be to remove any character that's not a part of the keyboard stroke.

Remove Character NOT IN [ A-Z, a-z, 0-9, `,~, !, @, #, $, %, &, *, (, ), _, -, + ,=, .,",',:,;,{,},[,],<,>,?,/,\,|,, )

NOTE Basically remove any special charater thats not on the key board stroke.

- Step - 3 Check the count of pipes on each line of the data to make sure we have the correct number. I would receive 4 pipes on each line. Which means if there are less we need to keep pading the next line ( concat the below lines ). This fields is basicall a memo where the user would have typed a small paragraph that needs to be joined into a single line.

-Step - 4 Replace all zzz with pipe ‘|'


Note : Below is a QA step to be embedded within the script after clensing. This is just to spit out a error log file that can be used to identify and fix records manually

-Step - 5 Check the length of the 2nd field > 50 and third field > 200 if yes write to error log file the line number and the record info

-Step - 6 Check the number of fields or pipe within each line. if fields not equal to 4 then write to the same error log. The line number and record record info

Sample Broken Lines and data
-----------------------------


467zzzComputer|MonitorzzzPurchase Prise $150
Best Price $100
Cheapest Price $75
highest price $200zzzTzzz


Correct record would look like this
467|Computer Monitor|Purchase Prise $150 Best Price $100 Cheapest Price $75 highest price $200|T|

Note. Broken lines fixed. The '|' got replaced with a space where it read Computer|Monitor. The memo field converted into single line. Also all zzz got replaced with a pipe.

Thanks
 

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

Shell Script to clense junk data file

Hello Shell Gurus i need help in solving this puzzle. We have a junk data file that needs to be fed into the database. Need to clense the data file thru shell script. I am not a expert and so need help with Here is what i need to do on the input file -Step -1 Replace all pipes ‘|' within... (0 Replies)
Discussion started by: rimss
0 Replies

2. Shell Programming and Scripting

Big data file - sed/grep/awk?

Morning guys. Another day another question. :rolleyes: I am knocking up a script to pull some data from a file. The problem is the file is very big (up to 1 gig in size), so this solution: for results in `grep "^\ ... works, but takes ages (we're talking minutes) to run. The data is held... (8 Replies)
Discussion started by: dlam
8 Replies

3. Shell Programming and Scripting

sed or awk to extract data from Xml file

Hi, I want to get data from Xml file by using sed or awk command. I want to get the following result : mon titre 1;Createur1;Dossier1 mon titre 1;Createur1;Dossier1 and save it in cvs file (fichier.cvs). FROM this Xml file (test.xml): <playlist version="1"> <trackList> <track>... (1 Reply)
Discussion started by: yeclota
1 Replies

4. Shell Programming and Scripting

formatting data file with awk or sed

Hi, I have a (quite large) data file which looks like: _____________ header part.. more header part.. x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 ... ... x59 x60 y1 y2 y3 y4... ... y100 ______________ where x1, x2,...,x60 and y1, y2,...y100 are numbers of 10 digits (so each line... (5 Replies)
Discussion started by: lego
5 Replies

5. Shell Programming and Scripting

how to get data from hex file using SED or AWK based on pattern sign

I have a binary (hex) file I need to parse to get some data which are encoded this way: .* b4 . . . 01 12 .* af .* 83 L1 x1 x2 xL 84 L2 y1 y2 yL By another words there is a stream of hexadecimal bytes (in my example separated by space for better readability). I need to get value stored in... (3 Replies)
Discussion started by: sameucho
3 Replies

6. Shell Programming and Scripting

AWK, Perl or Shell? Unique strings and their maximum values from 3 column data file

I have a file containing data like so: 2012-01-02 GREEN 4 2012-01-02 GREEN 6 2012-01-02 GREEN 7 2012-01-02 BLUE 4 2012-01-02 BLUE 3 2012-01-02 GREEN 4 2012-01-02 RED 4 2012-01-02 RED 8 2012-01-02 GREEN 4 2012-01-02 YELLOW 5 2012-01-02 YELLOW 2 I can't always predict what the... (4 Replies)
Discussion started by: rich@ardz
4 Replies

7. Shell Programming and Scripting

AWK/Shell script for formatting data in a file

Hi All, Need an urgent help to convert a unix file in to a particular format: **source file:** 1111111 2d2f2h2 3dfgsd3 ........... 1111111 <-- repeats in every nth line. remaining all lines will be different 123ss41 432ff45 ........... 1111111 <-- repetition qwe1234 123weq3... (1 Reply)
Discussion started by: rajivnairfis
1 Replies

8. Shell Programming and Scripting

awk - sed / reading from a data file and doing algebraic operations

Hi everyone, I am trying to write a bash script which reads a data file and does some algebraic operations. here is the structure of data.xml file that I have; 1 <data> 2 . 3 . 4 . 5 </data> 6 <data> 7 . 8 . 9 . 10</data> etc. Each data block contains same number of lines (say... (4 Replies)
Discussion started by: hayreter
4 Replies

9. UNIX for Dummies Questions & Answers

Remove untagged and junk data from an XML

Hi All , I have seen a lot of code samples which suggest how to remove the junk data from and XML , I need a code in unix which removes the junk characters as well as the valid characters those are not in XML tags , for example my XML is as follows : <?xml version="1.0"... (6 Replies)
Discussion started by: IshuGupta
6 Replies

10. UNIX for Advanced & Expert Users

Need Optimization shell/awk script to aggreagte (sum) for all the columns of Huge data file

Optimization shell/awk script to aggregate (sum) for all the columns of Huge data file File delimiter "|" Need to have Sum of all columns, with column number : aggregation (summation) for each column File not having the header Like below - Column 1 "Total Column 2 : "Total ... ...... (2 Replies)
Discussion started by: kartikirans
2 Replies
PCIC(4)                                                    BSD Kernel Interfaces Manual                                                    PCIC(4)

NAME
pcic -- PC Card bridge driver SYNOPSIS
device pcic device pccard DESCRIPTION
The pcic driver provides support for older ISA and non-Yenta PCI PC Card bridges. The pcic driver supports most ExCA devices attached to the ISA bus or PCI devices that do not conform to the Yenta specification. The following ISA devices, or true clones, are supported in the current code. Intel i82365SL Step A Intel i82365SL Step B Intel i82365SL Step C Intel's original 16-bit PC Card controller. Intel i82365SL-DF Intel's last version of this device. 3.3V support was added. VLSI 82C146 An older VLSI part with some issues on some machines. Cirrus Logic PD-6710 Cirrus Logic PD-6720 Cirrus Logic PD-6722 Cirrus Logic's pcic controller. Compatible with the i82365SL Step C with the addition of a different 3.3V control. Ricoh RF5C296 Ricoh RF5C396 Ricoh's PC Card bridge chips. These are compatible with the i82365SL Step C, but with yet another different 3.3V con- trol. Vadem 365 Vadem 465 Compatible with i82365SL Step C. Vadem 468 Vadem 469 Like the earlier Vadem models, but with Vadem's own, incompatible, 3.3V control system. IBM PCIC IBM clone of the original i82365SL part, with its own ID register value. Has no 3.3V ability. Many other vendors made parts in this arena, but most of them were compatible with one of the above chipsets. The following PCI pcmcia bridges are supported: Cirrus Logic PD6729 Cirrus Logic PD6730 O2micro OZ6729 O2micro OZ6730 BUGS
This does not work at all at the moment. BSD July 9, 2002 BSD
All times are GMT -4. The time now is 02:27 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy