Sponsored Content
Top Forums Shell Programming and Scripting Shell script Help - Data cleansing Post 302970536 by RudiC on Thursday 7th of April 2016 04:36:03 PM
Old 04-07-2016
Phew, what an ordeal! Far from elegant, utterly clumsy, but it does what is requested:
Code:
awk '
BEGIN           {ARGV[ARGC++] = ARGV[1]
                 HD[++HDCNT]  = "InsertTime"
                 HD[++HDCNT]  = "DocID"
                }

/^#col/         {FS = "\047"
                 $0 = $0
                 for (i=2; i<NF; i+=2)  {gsub (" ", "_", $i)
                                         if (FNR == NR) {if (!($i in X))        {X[$i]
                                                                                 HD[++HDCNT] = $i
                                                                                }
                                                        }
                                        }
                 FS = " "
                 $0 = $0
                 if (FNR != NR) {delete COL
                                 COL[1] = 1
                                 COL[2] = 2
                                 for (i=2; i<=NF; i++)  {for (j=3; j<=HDCNT; j++)       {if ($i == HD[j])       {COL[i+1] = j
                                                                                                                 break
                                                                                                                }
                                                                                        }
                                                        }
                                }
                }

FNR == NR       {next
                }

FNR == 1        {for (i=1; i<HDCNT; i++) printf "%s|", HD[i]
                 printf "%s%s", HD[HDCNT], ORS
                }

                {gsub (/\047/, "")
                }

/^#/ || /^ *$/ ||
/^DocID/        {next
                }

/^Inse/         {gsub (/(^| )[^ :]*:/, " ")
                 INS = $1
                 DOC = $2
                 next
                }

                {$1 = INS OFS DOC OFS $1
                 n = split ($0, T)
                 $0 = ""
                 OFS = "|"
                 for (i=1; i<=n; i++)   {$(COL[i]) = T[i]
                                         $HDCNT = $HDCNT
                                        }   
                 print
                 OFS = " "
                }
'  file
InsertTime|DocID|TargetDoc|GRank|LRank|Priority|Loc_ID|Rank|Check_Name
201604070523|101|aaaaa|1|1|Slow|8gkahinka.01||
201604070523|101|aaaaa|1|0|Slow|7nlafnjbaflnbja.01||
201604070523|102|aa||||8gkahinka.01|1|xyz
201604070523|102|aax||||7nlafnjbaflnbja.01|1|none
201604070750|101|xxxx|1|1|Slow|bjkkacka.01||
201604070750|101|yyyy|1|0|Slow|jiafjklas.001||

We're running through the file twice, and I had to replace the spaces in the column headers by underscore char to avoid additional field splits. Anybody out there to try to pimp it up?

Last edited by RudiC; 04-07-2016 at 05:56 PM..
This User Gave Thanks to RudiC For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Pipe data to shell script

Sorry about the noobish question but... How do I capture data thats piped to my script? For instance, ls -al | myscript.sh How do I access the output from ls -al in myscript.sh? (3 Replies)
Discussion started by: tomjones07
3 Replies

2. Shell Programming and Scripting

Getting remote data through shell script

Hi, I need to get the details (File System status & Memory status) of a remote server. I am executing a shell script in ksh and preparing the report. Pls help. Regards, armohans. (1 Reply)
Discussion started by: armohans
1 Replies

3. UNIX for Dummies Questions & Answers

cleansing file in unix

Hi Experts, Our requirement is to cleanse a specific formatted file in unix. For example : File pattern is : Job name.......................................... \\\\Jobs\Amey ABC PQRS ABCD XYZ Job name.......................................... WEQ RED AAA Desired Result: (2 Replies)
Discussion started by: Amey Joshi
2 Replies

4. Shell Programming and Scripting

reformat data with a shell script

Can anyone help me with a shell script that can do the following: I have a data in fasta format (first line is the header, followed by a sequence of characters). >ALLLY GGCCCCTCGAGCCTCGAACCGGAACCTCCAAATCCGAGACGCTCTGCTTATGAGGACCTC GAAATATGCCGGCCAGTGAAAAAATCTTGTGGCTTTGAGGGCTTTTGGTTGGCCAGGGGC... (5 Replies)
Discussion started by: manishabh
5 Replies

5. Shell Programming and Scripting

Help with cleansing data

I have a file with 27 fields seperated by pipe. I have a field 17 that is defined as numeric and the data coming in might contain character and other miscellaneous data like (@,!,~,#,%,^,&,*,(,)). I have to make sure that the column strictly contains numeric data and if it contains any of the... (2 Replies)
Discussion started by: dsravan
2 Replies

6. UNIX for Dummies Questions & Answers

Data Importing using shell script

Hi All, I have a .csv file pipe delimter.., I am using excel data import option for importing the data from a pipe delimter file to xls...I want to make this happen using shell script. Please let me know how can I do this using shell script. Regards, Deepti (2 Replies)
Discussion started by: gaur.deepti
2 Replies

7. UNIX for Advanced & Expert Users

Convert column data to row data using shell script

Hi, I want to convert a 3-column data to 3-row data using shell script. Any suggestion in this regard is highly appreciated. Thanks. (4 Replies)
Discussion started by: sktkpl
4 Replies

8. Shell Programming and Scripting

Need a shell script to clean data

Hi, Appreciated if anyone can throw some hint I have a file format like this: old(1): PRCNCP 1 old(2): PRSKU ... (6 Replies)
Discussion started by: netbanker
6 Replies

9. UNIX for Dummies Questions & Answers

Shell script to read lines in a text file and filter user data Shell Programming and Scripting

sxsaaas (3 Replies)
Discussion started by: VikrantD
3 Replies

10. Shell Programming and Scripting

Shell script to correct the data

Hi, I have below data in my flat file.I would like to remove the quotes and comma necessary from the data.Below is the details I would like to have in my output. Could anybody help me providing the Unix shell script for this. Input : ABC,ABC,10/15/2012,"47,936,164.567 ","1,036,997.453... (2 Replies)
Discussion started by: sonu_pal
2 Replies
IGAWK(1)							 Utility Commands							  IGAWK(1)

NAME
igawk - gawk with include files SYNOPSIS
igawk [ all gawk options ] -f program-file [ -- ] file ... igawk [ all gawk options ] [ -- ] program-text file ... DESCRIPTION
Igawk is a simple shell script that adds the ability to have ``include files'' to gawk(1). AWK programs for igawk are the same as for gawk, except that, in addition, you may have lines like @include getopt.awk in your program to include the file getopt.awk from either the current directory or one of the other directories in the search path. OPTIONS
See gawk(1) for a full description of the AWK language and the options that gawk supports. EXAMPLES
cat << EOF > test.awk @include getopt.awk BEGIN { while (getopt(ARGC, ARGV, "am:q") != -1) ... } EOF igawk -f test.awk SEE ALSO
gawk(1) Effective AWK Programming, Edition 1.0, published by the Free Software Foundation, 1995. AUTHOR
Arnold Robbins (arnold@skeeve.com). ATTRIBUTES
See attributes(5) for descriptions of the following attributes: +--------------------+-----------------+ | ATTRIBUTE TYPE | ATTRIBUTE VALUE | +--------------------+-----------------+ |Availability | SUNWgawk | +--------------------+-----------------+ |Interface Stability | Volatile | +--------------------+-----------------+ NOTES
Source for gawk is available on http://opensolaris.org. Free Software Foundation Nov 3 1999 IGAWK(1)
All times are GMT -4. The time now is 08:52 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy