Sponsored Content
Top Forums Shell Programming and Scripting Shell script Help - Data cleansing Post 302970539 by RudiC on Thursday 7th of April 2016 05:32:20 PM
Old 04-07-2016
Well, one pass only, and slightly prettier:
Code:
awk '
BEGIN           {HD[++HDCNT]  = "InsertTime"
                 HD[++HDCNT]  = "DocID"
                }

/^#col/         {FS = "\047"
                 $0 = $0
                 for (i=2; i<NF; i+=2)  {gsub (" ", "_", $i)
                                         if (!($i in X))        {X[$i]
                                                                 HD[++HDCNT] = $i
                                                                }
                                        }
                 FS = " "
                 $0 = $0
                 RW[++RCNT] = $0
                }

/^#/       ||
/^ *$/     ||
/^.DocID/       {next
                }

/^Inse/         {gsub (/(^| )[^ :]*:/, " ")
                 INS = $1
                 DOC = $2
                 next
                }

                {gsub (/\047/, "")
                 LN[RCNT,++LCNT[RCNT]] = INS FS DOC FS $0
                }


END             {for (i=1; i<HDCNT; i++) printf "%s|", HD[i]
                 printf "%s%s", HD[HDCNT], ORS
                 OFS = "|"
                 for (r=1; r<=RCNT; r++)
                        {delete COL 
                         COL[1] = 1
                         COL[2] = 2
                         n = split (RW[r], T)
                         for (i=2; i<=n; i++) for (j=3; j<=HDCNT; j++) if (T[i] == HD[j]) COL[i+1] = j
                         for (l=1; l<=LCNT[r]; l++)
                                {n = split (LN[r,l], T)
                                 $0 = ""
                                 for (i=1; i<=n; i++) $(COL[i]) = T[i]
                                 $HDCNT = $HDCNT
                                 print
                                }
                        }
                }
'  file
InsertTime|DocID|TargetDoc|GRank|LRank|Priority|Loc_ID|Rank|Check_Name
201604070523|101|aaaaa|1|1|Slow|8gkahinka.01||
201604070523|101|aaaaa|1|0|Slow|7nlafnjbaflnbja.01||
201604070523|102|aa||||8gkahinka.01|1|xyz
201604070523|102|aax||||7nlafnjbaflnbja.01|1|none
201604070750|101|xxxx|1|1|Slow|bjkkacka.01||
201604070750|101|yyyy|1|0|Slow|jiafjklas.001||


Last edited by RudiC; 04-07-2016 at 06:52 PM..
This User Gave Thanks to RudiC For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Pipe data to shell script

Sorry about the noobish question but... How do I capture data thats piped to my script? For instance, ls -al | myscript.sh How do I access the output from ls -al in myscript.sh? (3 Replies)
Discussion started by: tomjones07
3 Replies

2. Shell Programming and Scripting

Getting remote data through shell script

Hi, I need to get the details (File System status & Memory status) of a remote server. I am executing a shell script in ksh and preparing the report. Pls help. Regards, armohans. (1 Reply)
Discussion started by: armohans
1 Replies

3. UNIX for Dummies Questions & Answers

cleansing file in unix

Hi Experts, Our requirement is to cleanse a specific formatted file in unix. For example : File pattern is : Job name.......................................... \\\\Jobs\Amey ABC PQRS ABCD XYZ Job name.......................................... WEQ RED AAA Desired Result: (2 Replies)
Discussion started by: Amey Joshi
2 Replies

4. Shell Programming and Scripting

reformat data with a shell script

Can anyone help me with a shell script that can do the following: I have a data in fasta format (first line is the header, followed by a sequence of characters). >ALLLY GGCCCCTCGAGCCTCGAACCGGAACCTCCAAATCCGAGACGCTCTGCTTATGAGGACCTC GAAATATGCCGGCCAGTGAAAAAATCTTGTGGCTTTGAGGGCTTTTGGTTGGCCAGGGGC... (5 Replies)
Discussion started by: manishabh
5 Replies

5. Shell Programming and Scripting

Help with cleansing data

I have a file with 27 fields seperated by pipe. I have a field 17 that is defined as numeric and the data coming in might contain character and other miscellaneous data like (@,!,~,#,%,^,&,*,(,)). I have to make sure that the column strictly contains numeric data and if it contains any of the... (2 Replies)
Discussion started by: dsravan
2 Replies

6. UNIX for Dummies Questions & Answers

Data Importing using shell script

Hi All, I have a .csv file pipe delimter.., I am using excel data import option for importing the data from a pipe delimter file to xls...I want to make this happen using shell script. Please let me know how can I do this using shell script. Regards, Deepti (2 Replies)
Discussion started by: gaur.deepti
2 Replies

7. UNIX for Advanced & Expert Users

Convert column data to row data using shell script

Hi, I want to convert a 3-column data to 3-row data using shell script. Any suggestion in this regard is highly appreciated. Thanks. (4 Replies)
Discussion started by: sktkpl
4 Replies

8. Shell Programming and Scripting

Need a shell script to clean data

Hi, Appreciated if anyone can throw some hint I have a file format like this: old(1): PRCNCP 1 old(2): PRSKU ... (6 Replies)
Discussion started by: netbanker
6 Replies

9. UNIX for Dummies Questions & Answers

Shell script to read lines in a text file and filter user data Shell Programming and Scripting

sxsaaas (3 Replies)
Discussion started by: VikrantD
3 Replies

10. Shell Programming and Scripting

Shell script to correct the data

Hi, I have below data in my flat file.I would like to remove the quotes and comma necessary from the data.Below is the details I would like to have in my output. Could anybody help me providing the Unix shell script for this. Input : ABC,ABC,10/15/2012,"47,936,164.567 ","1,036,997.453... (2 Replies)
Discussion started by: sonu_pal
2 Replies
BEGIN(7)							   SQL Commands 							  BEGIN(7)

NAME
BEGIN - start a transaction block SYNOPSIS
BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ] where transaction_mode is one of: ISOLATION LEVEL { SERIALIZABLE | REPEATABLE READ | READ COMMITTED | READ UNCOMMITTED } READ WRITE | READ ONLY DESCRIPTION
BEGIN initiates a transaction block, that is, all statements after a BEGIN command will be executed in a single transaction until an explicit COMMIT [commit(7)] or ROLLBACK [rollback(7)] is given. By default (without BEGIN), PostgreSQL executes transactions in ``autocom- mit'' mode, that is, each statement is executed in its own transaction and a commit is implicitly performed at the end of the statement (if execution was successful, otherwise a rollback is done). Statements are executed more quickly in a transaction block, because transaction start/commit requires significant CPU and disk activity. Execution of multiple statements inside a transaction is also useful to ensure consistency when making several related changes: other ses- sions will be unable to see the intermediate states wherein not all the related updates have been done. If the isolation level or read/write mode is specified, the new transaction has those characteristics, as if SET TRANSACTION [set_transac- tion(7)] was executed. PARAMETERS
WORK TRANSACTION Optional key words. They have no effect. Refer to SET TRANSACTION [set_transaction(7)] for information on the meaning of the other parameters to this statement. NOTES
START TRANSACTION [start_transaction(7)] has the same functionality as BEGIN. Use COMMIT [commit(7)] or ROLLBACK [rollback(7)] to terminate a transaction block. Issuing BEGIN when already inside a transaction block will provoke a warning message. The state of the transaction is not affected. To nest transactions within a transaction block, use savepoints (see SAVEPOINT [savepoint(7)]). For reasons of backwards compatibility, the commas between successive transaction_modes can be omitted. EXAMPLES
To begin a transaction block: BEGIN; COMPATIBILITY
BEGIN is a PostgreSQL language extension. It is equivalent to the SQL-standard command START TRANSACTION [start_transaction(7)], whose ref- erence page contains additional compatibility information. Incidentally, the BEGIN key word is used for a different purpose in embedded SQL. You are advised to be careful about the transaction semantics when porting database applications. SEE ALSO
COMMIT [commit(7)], ROLLBACK [rollback(7)], START TRANSACTION [start_transaction(7)], SAVEPOINT [savepoint(7)] SQL - Language Statements 2010-05-14 BEGIN(7)
All times are GMT -4. The time now is 10:41 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy