Shell script Help - Data cleansing Post: 302970504

Sponsored Content

Top Forums Shell Programming and Scripting Shell script Help - Data cleansing Post 302970504 by pdathu on Thursday 7th of April 2016 09:42:09 AM

04-07-2016

Registered User

Shell script Help - Data cleansing

Hello community, I am getting a log files from system and I need to clean the data and store as txt files for reporting purposes. Since these files are generated in Unix box, so we have to write shell script to handle the data cleansing.

Please find the sample file data looks like:

Code:

InsertTime:201604070523 DocID:101
#headers 
'DocID: 101    MOVEABLE TOOLS:   2 QTY:    0     HELD TOOLS:   0 QTY:    0     BLOCKED TOOLS:   0 QTY:    0'
#columns  'TargetDoc' 'GRank' 'LRank' 'Priority' 'Loc ID'
#widths 12 3 3 12 25
#types 'STRING' 'INTEGER' 'INTEGER' 'STRING' 'STRING'
#rows
'aaaaa' '1' '1' 'Slow' '8gkahinka.01'
'aaaaa' '1' '0' 'Slow' '7nlafnjbaflnbja.01'

#blocked '' '' 
#rule 'Rule_Abcd'
#doc '101'
#station_type ' '
#queue_duration '1.09673e-05'
#process_duration '4.61456'
#ISS-DLIS-DIAGS
InsertTime:201604070523 DocID:102
#headers 
'DocID: 102    MOVEABLE TOOLS:   2 QTY:    0     HELD TOOLS:   0 QTY:    0     BLOCKED TOOLS:   0 QTY:    0'
#columns  'TargetDoc' 'Rank' 'Check Name' 'Loc ID'
#widths 12 3 3 12 25
#types 'STRING' 'INTEGER' 'INTEGER' 'STRING' 'STRING'
#rows
'aa' '1' 'xyz' '8gkahinka.01'
'aax' '1' 'none' '7nlafnjbaflnbja.01'

#blocked '' '' 
#rule 'Rule_Axf'
#doc '102'
#station_type ' '
#queue_duration '1.09673e-05'
#process_duration '4.61456'
#ISS-DLIS-DIAGS
InsertTime:201604070750 DocID:101
#headers 
'DocID: 101    MOVEABLE TOOLS:   2 QTY:    0     HELD TOOLS:   0 QTY:    0     BLOCKED TOOLS:   0 QTY:    0'
#columns  'TargetDoc' 'GRank' 'LRank' 'Priority' 'Loc ID'
#widths 12 3 3 12 25
#types 'STRING' 'INTEGER' 'INTEGER' 'STRING' 'STRING'
#rows
'xxxx' '1' '1' 'Slow' 'bjkkacka.01'
'yyyy' '1' '0' 'Slow' 'jiafjklas.001'

#blocked '' '' 
#rule 'Rule_Abcd'
#doc '101'
#station_type ' '
#queue_duration '1.09673e-05'
#ISS-DLIS-DIAGS

This was a raw data and I need to write a shell script to cleanse the data.
1. row started with # is like comment and we need to ignore that other than #coulmns
2. #columns are give the columns names and #rows give the actual data.
3. unwanted data highlighted with red color and useful data highlighted as black color
4. The header for out put file is always all the #headers in the data along with InsertTime and DocID
5. assign the values as per header and add InsertTime & DocID values too.
6. data delimiter is | in the out put file.

Please find the desired out put:

Code:

InsertTime|DocID|TargetDoc|GRank|LRank|Priority|Loc ID|Rank|Check Name
201604070523|101|aaaaa|1|1|Slow|8gkahinka.01||
201604070523|101|aaaaa|1|0|Slow|7nlafnjbaflnbja.01||
201604070523|102|aa||||8gkahinka.01|1|xyz
201604070523|102|aax||||7nlafnjbaflnbja.01|1|none
201604070750|101|xxxx|1|1|Slow|bjkkacka.01||
201604070750|101|yyyy|1|0|Slow|jiafjklas.001||

Last edited by RudiC; 04-07-2016 at 11:11 AM.. Reason: Added code tags

pdathu

View Public Profile for pdathu

Find all posts by pdathu

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Pipe data to shell script

Sorry about the noobish question but... How do I capture data thats piped to my script? For instance, ls -al | myscript.sh How do I access the output from ls -al in myscript.sh?

2. Shell Programming and Scripting

Getting remote data through shell script

Hi, I need to get the details (File System status & Memory status) of a remote server. I am executing a shell script in ksh and preparing the report. Pls help. Regards, armohans.

3. UNIX for Dummies Questions & Answers

cleansing file in unix

Hi Experts, Our requirement is to cleanse a specific formatted file in unix. For example : File pattern is : Job name.......................................... \\\\Jobs\Amey ABC PQRS ABCD XYZ Job name.......................................... WEQ RED AAA Desired Result:

4. Shell Programming and Scripting

reformat data with a shell script

Can anyone help me with a shell script that can do the following: I have a data in fasta format (first line is the header, followed by a sequence of characters). >ALLLY GGCCCCTCGAGCCTCGAACCGGAACCTCCAAATCCGAGACGCTCTGCTTATGAGGACCTC GAAATATGCCGGCCAGTGAAAAAATCTTGTGGCTTTGAGGGCTTTTGGTTGGCCAGGGGC...

5. Shell Programming and Scripting

Help with cleansing data

I have a file with 27 fields seperated by pipe. I have a field 17 that is defined as numeric and the data coming in might contain character and other miscellaneous data like (@,!,~,#,%,^,&,*,(,)). I have to make sure that the column strictly contains numeric data and if it contains any of the...

6. UNIX for Dummies Questions & Answers

Data Importing using shell script

Hi All, I have a .csv file pipe delimter.., I am using excel data import option for importing the data from a pipe delimter file to xls...I want to make this happen using shell script. Please let me know how can I do this using shell script. Regards, Deepti

7. UNIX for Advanced & Expert Users

Convert column data to row data using shell script

Hi, I want to convert a 3-column data to 3-row data using shell script. Any suggestion in this regard is highly appreciated. Thanks.

8. Shell Programming and Scripting

Need a shell script to clean data

Hi, Appreciated if anyone can throw some hint I have a file format like this: old(1): PRCNCP 1 old(2): PRSKU ...

9. UNIX for Dummies Questions & Answers

Shell script to read lines in a text file and filter user data Shell Programming and Scripting

sxsaaas

10. Shell Programming and Scripting

Shell script to correct the data

Hi, I have below data in my flat file.I would like to remove the quotes and comma necessary from the data.Below is the details I would like to have in my output. Could anybody help me providing the Unix shell script for this. Input : ABC,ABC,10/15/2012,"47,936,164.567 ","1,036,997.453...

LEARN ABOUT DEBIAN

cmdtest

CMDTEST(1)						      General Commands Manual							CMDTEST(1)

NAME

       cmdtest - blackbox testing of Unix command line tools

SYNOPSIS

       cmdtest	  [-c=COMMAND]	 [--command=COMMAND]   [--config=FILE]	 [--dump-config]   [--dump-memory-profile=METHOD]   [--dump-setting-names]
       [--generate-manpage=TEMPLATE]  [-h]  [--help]  [-k]  [--keep]   [--list-config-files]   [--log=FILE]   [--log-keep=N]   [--log-level=LEVEL]
       [--log-max=SIZE] [--no-default-configs] [--output=FILE] [-t=TEST] [--test=TEST] [--timings] [--version] [FILE]...

DESCRIPTION

       cmdtest black box tests Unix command line tools.  Given some test scripts, their inputs, and expected outputs, it verifies that the command
       line produces the expected output.  If not, it reports problems, and shows the differences.

       Each test case foo consists of the following files:

       foo.script
	      a script to run the test (this is required)

       foo.stdin
	      the file fed to standard input

       foo.stdout
	      the expected output to the standard output

       foo.stderr
	      the expected output to the standard error

       foo.exit
	      the expected exit code

       foo.setup
	      a shell script to run before the test

       foo.teardown
	      a shell script to run after test

       Usually, a single test is not enough. All tests are put into the same directory, and they may share some setup and teardown code:

       setup-once
	      a shell script to run once, before any tests

       setup  a shell script to run before each test

       teardown
	      a shell script to run after each test

       teardown-once
	      a shell script to run once, after all tests

       cmdtest is given the name of the directory with all the tests, or several such directories, and it does the following:

       o execute setup-once

       o for each test case (unique prefix foo):

	      -- execute setup

	      -- execute foo.setup

	      -- execute the command, by running foo.script, and redirecting standard input to come from foo.stdin, and capturing standard output
		and error and exit codes

	      -- execute foo.teardown

	      -- execute teardown

	      -- report result of test: does exit code match foo.exit, standard output match foo.stdout, and standard error match foo.stderr?

       o execute teardown-once

       Except for foo.script, all of these files are optional.	If a setup or teardown script is missing, it is simply not executed.  If one of
       the standard input, output, or error files is missing, it is treated as if it were empty.  If the exit code file is missing, it is treated
       as if it specified an exit code of zero.

       The shell scripts may use the following environment variables:

       DATADIR
	      a temporary directory where files may be created by the test

       TESTNAME
	      name of the current test (will be empty for setup-once and teardown-once)

       SRCDIR directory from which cmdtest was launched

OPTIONS

       -c, --command=COMMAND
	      ignored for backwards compatibility

       --config=FILE
	      add FILE to config files

       --dump-config
	      write out the entire current configuration

       --dump-memory-profile=METHOD
	      make memory profiling dumps using METHOD, which is one of: none, simple, meliae, or heapy (default: simple)

       --dump-setting-names
	      write out all names of settings and quit

       --generate-manpage=TEMPLATE
	      fill in manual page TEMPLATE

       -h, --help
	      show this help message and exit

       -k, --keep
	      keep temporary data on failure

       --list-config-files
	      list all possible config files

       --log=FILE
	      write log entries to FILE (default is to not write log files at all); use "syslog" to log to system log

       --log-keep=N
	      keep last N logs (10)

       --log-level=LEVEL
	      log at LEVEL, one of debug, info, warning, error, critical, fatal (default: debug)

       --log-max=SIZE
	      rotate logs larger than SIZE, zero for never (default: 0)

       --no-default-configs
	      clear list of configuration files to read

       --output=FILE
	      write output to FILE, instead of standard output

       -t, --test=TEST
	      run only TEST (can be given many times)

       --timings
	      report how long each test takes

       --version
	      show program's version number and exit

EXAMPLE

       To test that the echo(1) command outputs the expected string, create a file called echo-tests/hello.script containing the following con-
       tent:

	      #!/bin/sh
	      echo hello, world

       Also create the file echo-tests/hello.stdout containing:

	      hello, world

       Then you can run the tests:

	      $ cmdtest echo-tests
	      test 1/1
	      1/1 tests OK, 0 failures

       If you change the stdout file to be something else, cmdtest will report the differences:

	      $ cmdtest echo-tests
	      FAIL: hello: stdout diff:
	      --- echo-tests/hello.stdout   2011-09-11 19:14:47 +0100
	      +++ echo-tests/hello.stdout-actual 2011-09-11 19:14:49 +0100
	      @@ -1 +1 @@
	      -something else
	      +hello, world

	      test 1/1
	      0/1 tests OK, 1 failures

       Furthermore, the echo-tests directory will contain the actual output files, and diffs from the expected files.  If one of the actual output
       files is actually correct, you can actualy rename it to be the expected file.  Actually, that's a very convenient way of creating the ex-
       pected output files: you run the test, fixing things, until you've manually checked the actual output is correct, then you rename the file.

SEE ALSO

       cliapp(5).

																	CMDTEST(1)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Pipe data to shell script

Discussion started by: tomjones07

2. Shell Programming and Scripting

Getting remote data through shell script

Discussion started by: armohans

3. UNIX for Dummies Questions & Answers

cleansing file in unix

Discussion started by: Amey Joshi

4. Shell Programming and Scripting

reformat data with a shell script

Discussion started by: manishabh