Sponsored Content
Top Forums Shell Programming and Scripting Shell script Help - Data cleansing Post 302970504 by pdathu on Thursday 7th of April 2016 09:42:09 AM
Old 04-07-2016
Hammer & Screwdriver Shell script Help - Data cleansing

Hello community, I am getting a log files from system and I need to clean the data and store as txt files for reporting purposes. Since these files are generated in Unix box, so we have to write shell script to handle the data cleansing.

Please find the sample file data looks like:

Code:
InsertTime:201604070523 DocID:101
#headers 
'DocID: 101    MOVEABLE TOOLS:   2 QTY:    0     HELD TOOLS:   0 QTY:    0     BLOCKED TOOLS:   0 QTY:    0'
#columns  'TargetDoc' 'GRank' 'LRank' 'Priority' 'Loc ID'
#widths 12 3 3 12 25
#types 'STRING' 'INTEGER' 'INTEGER' 'STRING' 'STRING'
#rows
'aaaaa' '1' '1' 'Slow' '8gkahinka.01'
'aaaaa' '1' '0' 'Slow' '7nlafnjbaflnbja.01'

#blocked '' '' 
#rule 'Rule_Abcd'
#doc '101'
#station_type ' '
#queue_duration '1.09673e-05'
#process_duration '4.61456'
#ISS-DLIS-DIAGS
InsertTime:201604070523 DocID:102
#headers 
'DocID: 102    MOVEABLE TOOLS:   2 QTY:    0     HELD TOOLS:   0 QTY:    0     BLOCKED TOOLS:   0 QTY:    0'
#columns  'TargetDoc' 'Rank' 'Check Name' 'Loc ID'
#widths 12 3 3 12 25
#types 'STRING' 'INTEGER' 'INTEGER' 'STRING' 'STRING'
#rows
'aa' '1' 'xyz' '8gkahinka.01'
'aax' '1' 'none' '7nlafnjbaflnbja.01'

#blocked '' '' 
#rule 'Rule_Axf'
#doc '102'
#station_type ' '
#queue_duration '1.09673e-05'
#process_duration '4.61456'
#ISS-DLIS-DIAGS
InsertTime:201604070750 DocID:101
#headers 
'DocID: 101    MOVEABLE TOOLS:   2 QTY:    0     HELD TOOLS:   0 QTY:    0     BLOCKED TOOLS:   0 QTY:    0'
#columns  'TargetDoc' 'GRank' 'LRank' 'Priority' 'Loc ID'
#widths 12 3 3 12 25
#types 'STRING' 'INTEGER' 'INTEGER' 'STRING' 'STRING'
#rows
'xxxx' '1' '1' 'Slow' 'bjkkacka.01'
'yyyy' '1' '0' 'Slow' 'jiafjklas.001'

#blocked '' '' 
#rule 'Rule_Abcd'
#doc '101'
#station_type ' '
#queue_duration '1.09673e-05'
#ISS-DLIS-DIAGS

This was a raw data and I need to write a shell script to cleanse the data.
1. row started with # is like comment and we need to ignore that other than #coulmns
2. #columns are give the columns names and #rows give the actual data.
3. unwanted data highlighted with red color and useful data highlighted as black color
4. The header for out put file is always all the #headers in the data along with InsertTime and DocID
5. assign the values as per header and add InsertTime & DocID values too.
6. data delimiter is | in the out put file.

Please find the desired out put:

Code:
InsertTime|DocID|TargetDoc|GRank|LRank|Priority|Loc ID|Rank|Check Name
201604070523|101|aaaaa|1|1|Slow|8gkahinka.01||
201604070523|101|aaaaa|1|0|Slow|7nlafnjbaflnbja.01||
201604070523|102|aa||||8gkahinka.01|1|xyz
201604070523|102|aax||||7nlafnjbaflnbja.01|1|none
201604070750|101|xxxx|1|1|Slow|bjkkacka.01||
201604070750|101|yyyy|1|0|Slow|jiafjklas.001||


Last edited by RudiC; 04-07-2016 at 11:11 AM.. Reason: Added code tags
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Pipe data to shell script

Sorry about the noobish question but... How do I capture data thats piped to my script? For instance, ls -al | myscript.sh How do I access the output from ls -al in myscript.sh? (3 Replies)
Discussion started by: tomjones07
3 Replies

2. Shell Programming and Scripting

Getting remote data through shell script

Hi, I need to get the details (File System status & Memory status) of a remote server. I am executing a shell script in ksh and preparing the report. Pls help. Regards, armohans. (1 Reply)
Discussion started by: armohans
1 Replies

3. UNIX for Dummies Questions & Answers

cleansing file in unix

Hi Experts, Our requirement is to cleanse a specific formatted file in unix. For example : File pattern is : Job name.......................................... \\\\Jobs\Amey ABC PQRS ABCD XYZ Job name.......................................... WEQ RED AAA Desired Result: (2 Replies)
Discussion started by: Amey Joshi
2 Replies

4. Shell Programming and Scripting

reformat data with a shell script

Can anyone help me with a shell script that can do the following: I have a data in fasta format (first line is the header, followed by a sequence of characters). >ALLLY GGCCCCTCGAGCCTCGAACCGGAACCTCCAAATCCGAGACGCTCTGCTTATGAGGACCTC GAAATATGCCGGCCAGTGAAAAAATCTTGTGGCTTTGAGGGCTTTTGGTTGGCCAGGGGC... (5 Replies)
Discussion started by: manishabh
5 Replies

5. Shell Programming and Scripting

Help with cleansing data

I have a file with 27 fields seperated by pipe. I have a field 17 that is defined as numeric and the data coming in might contain character and other miscellaneous data like (@,!,~,#,%,^,&,*,(,)). I have to make sure that the column strictly contains numeric data and if it contains any of the... (2 Replies)
Discussion started by: dsravan
2 Replies

6. UNIX for Dummies Questions & Answers

Data Importing using shell script

Hi All, I have a .csv file pipe delimter.., I am using excel data import option for importing the data from a pipe delimter file to xls...I want to make this happen using shell script. Please let me know how can I do this using shell script. Regards, Deepti (2 Replies)
Discussion started by: gaur.deepti
2 Replies

7. UNIX for Advanced & Expert Users

Convert column data to row data using shell script

Hi, I want to convert a 3-column data to 3-row data using shell script. Any suggestion in this regard is highly appreciated. Thanks. (4 Replies)
Discussion started by: sktkpl
4 Replies

8. Shell Programming and Scripting

Need a shell script to clean data

Hi, Appreciated if anyone can throw some hint I have a file format like this: old(1): PRCNCP 1 old(2): PRSKU ... (6 Replies)
Discussion started by: netbanker
6 Replies

9. UNIX for Dummies Questions & Answers

Shell script to read lines in a text file and filter user data Shell Programming and Scripting

sxsaaas (3 Replies)
Discussion started by: VikrantD
3 Replies

10. Shell Programming and Scripting

Shell script to correct the data

Hi, I have below data in my flat file.I would like to remove the quotes and comma necessary from the data.Below is the details I would like to have in my output. Could anybody help me providing the Unix shell script for this. Input : ABC,ABC,10/15/2012,"47,936,164.567 ","1,036,997.453... (2 Replies)
Discussion started by: sonu_pal
2 Replies
CMDTEST(1)						      General Commands Manual							CMDTEST(1)

NAME
cmdtest - blackbox testing of Unix command line tools SYNOPSIS
cmdtest [-c=COMMAND] [--command=COMMAND] [--config=FILE] [--dump-config] [--dump-memory-profile=METHOD] [--dump-setting-names] [--generate-manpage=TEMPLATE] [-h] [--help] [-k] [--keep] [--list-config-files] [--log=FILE] [--log-keep=N] [--log-level=LEVEL] [--log-max=SIZE] [--no-default-configs] [--output=FILE] [-t=TEST] [--test=TEST] [--timings] [--version] [FILE]... DESCRIPTION
cmdtest black box tests Unix command line tools. Given some test scripts, their inputs, and expected outputs, it verifies that the command line produces the expected output. If not, it reports problems, and shows the differences. Each test case foo consists of the following files: foo.script a script to run the test (this is required) foo.stdin the file fed to standard input foo.stdout the expected output to the standard output foo.stderr the expected output to the standard error foo.exit the expected exit code foo.setup a shell script to run before the test foo.teardown a shell script to run after test Usually, a single test is not enough. All tests are put into the same directory, and they may share some setup and teardown code: setup-once a shell script to run once, before any tests setup a shell script to run before each test teardown a shell script to run after each test teardown-once a shell script to run once, after all tests cmdtest is given the name of the directory with all the tests, or several such directories, and it does the following: o execute setup-once o for each test case (unique prefix foo): -- execute setup -- execute foo.setup -- execute the command, by running foo.script, and redirecting standard input to come from foo.stdin, and capturing standard output and error and exit codes -- execute foo.teardown -- execute teardown -- report result of test: does exit code match foo.exit, standard output match foo.stdout, and standard error match foo.stderr? o execute teardown-once Except for foo.script, all of these files are optional. If a setup or teardown script is missing, it is simply not executed. If one of the standard input, output, or error files is missing, it is treated as if it were empty. If the exit code file is missing, it is treated as if it specified an exit code of zero. The shell scripts may use the following environment variables: DATADIR a temporary directory where files may be created by the test TESTNAME name of the current test (will be empty for setup-once and teardown-once) SRCDIR directory from which cmdtest was launched OPTIONS
-c, --command=COMMAND ignored for backwards compatibility --config=FILE add FILE to config files --dump-config write out the entire current configuration --dump-memory-profile=METHOD make memory profiling dumps using METHOD, which is one of: none, simple, meliae, or heapy (default: simple) --dump-setting-names write out all names of settings and quit --generate-manpage=TEMPLATE fill in manual page TEMPLATE -h, --help show this help message and exit -k, --keep keep temporary data on failure --list-config-files list all possible config files --log=FILE write log entries to FILE (default is to not write log files at all); use "syslog" to log to system log --log-keep=N keep last N logs (10) --log-level=LEVEL log at LEVEL, one of debug, info, warning, error, critical, fatal (default: debug) --log-max=SIZE rotate logs larger than SIZE, zero for never (default: 0) --no-default-configs clear list of configuration files to read --output=FILE write output to FILE, instead of standard output -t, --test=TEST run only TEST (can be given many times) --timings report how long each test takes --version show program's version number and exit EXAMPLE
To test that the echo(1) command outputs the expected string, create a file called echo-tests/hello.script containing the following con- tent: #!/bin/sh echo hello, world Also create the file echo-tests/hello.stdout containing: hello, world Then you can run the tests: $ cmdtest echo-tests test 1/1 1/1 tests OK, 0 failures If you change the stdout file to be something else, cmdtest will report the differences: $ cmdtest echo-tests FAIL: hello: stdout diff: --- echo-tests/hello.stdout 2011-09-11 19:14:47 +0100 +++ echo-tests/hello.stdout-actual 2011-09-11 19:14:49 +0100 @@ -1 +1 @@ -something else +hello, world test 1/1 0/1 tests OK, 1 failures Furthermore, the echo-tests directory will contain the actual output files, and diffs from the expected files. If one of the actual output files is actually correct, you can actualy rename it to be the expected file. Actually, that's a very convenient way of creating the ex- pected output files: you run the test, fixing things, until you've manually checked the actual output is correct, then you rename the file. SEE ALSO
cliapp(5). CMDTEST(1)
All times are GMT -4. The time now is 04:11 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy