How to parse parts of 1 column into two separate columns?

Login or Register to Ask a Question and Join Our Community

How to parse parts of 1 column into two separate columns?

Tags

Login to Discuss or Reply to this Discussion in Our Community

Top Forums Shell Programming and Scripting How to parse parts of 1 column into two separate columns?

12-08-2014

Registered User

3, 0

Join Date: Oct 2014

Last Activity: 8 December 2014, 1:53 PM EST

Posts: 3

Thanks Given: 0

Thanked 0 Times in 0 Posts

How to parse parts of 1 column into two separate columns?

I have a shell script that is currently transferring a csv file from a server into a Teradata database table. One of the 30 or so columns is called "destination_url". In that URL there are parameters, and it is possible for those parameters to be repeated because of referring companies copying referred URLs from one another. So, for example one of these urls might look like this:

http://othercompany.com/media/redir.php?prof=403&camp=3&affcode=kw1266732&cid={creative}&networkType={ifContent:content}&url[]=http://www.mycompany.com/search?keywords=shelf&taxonomy=dep32&cid=128513&kid=169896850098703&TRACK=PSGGL

My task is to take the cid value and kid value and pull them out and place them in two columns. The two new columns will be called cid_value and kid_value. I should point out that the source destination_url will remain unchanged. I just need the two values from it.

The problem is that the urls SOMETIMES have more than one cid and/or kid. So usually I need to pull the first instance of the cid and kid. cid and kid will always be numeric. cid will always have 7 digits. kid can have anywhere from 6 to 14 digits.

I know this is possible to do in shell but I have NO idea how to do this. Can someone help? I'm new to shell scripting. Feeling totally lost and without help. I know it will be using sed and awk and I'm trying to read about them but nothing is clicking yet.

Last edited by craigwg; 12-08-2014 at 01:25 PM.. Reason: Clarity

craigwg

View Public Profile for craigwg

Find all posts by craigwg

12-08-2014

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

Please use code tags as required by forum rules!

This is quite incomplete a specification. Nothing is said about field separators, filed count per line, whatsoever. No header line is given to add the "cid_value" etc. strings to. The first cid entry has "{creative}" assigned; so what needs to be extracted - the first entry or the one with numbers or the first with numbers?
With quite some guesses & assumptions, I came up with

Code:

awk     'match($0,"cid=[0-9]*\&") {$(NF+1)=substr($0, RSTART+4, RLENGTH-5)}
         match($0,"kid=[0-9]*\&") {$(NF+1)=substr($0, RSTART+4, RLENGTH-5)}
         1
        ' file

Give it a shot and come back with results.

RudiC

View Public Profile for RudiC

Find all posts by RudiC

12-08-2014

Registered User

3, 0

Join Date: Oct 2014

Last Activity: 8 December 2014, 1:53 PM EST

Posts: 3

Thanks Given: 0

Thanked 0 Times in 0 Posts

Thanks. My apologies. I'm trying to post as best I can. New guy! The field seperater is "

Code:

DLE

". There are 65 fields in the source file. The end will have 2 additional (cid_value and kid_value). Destination_url is the 34th column of the given 65 columns.

You are correct, I am looking for the first numerical instance of cid and kid. I posted the example I posted because it shows a little bit of the variability of the Destination_url column.

I am going to play with your example above. I'm not sure how to work that into my existing shell file but I will try my best.

---------- Post updated at 02:28 PM ---------- Previous update was at 02:03 PM ----------

Perhaps this will help. In my shell script there is a use of sed and awk that is taking another column called "adGroup" that is parsing out 4 columns. This has NOTHING to do with my project. But it also shows what I have come up with. Its not working yet, but it isn't erroring out either. I don't understand if I'm using the match command correctly. Nor do I understand what the end ' file does. Does that need to be replaced with my output table, which in this case is called edw_work.unfiltered_table? Either way here is the current code:

Code:

    # The data is the passed to an awk statement that breaks the ad grou:q!:p column (column 2) up into up to 10 columns. See below for more comments.
      sed -e1d $SRC_PATH$SRC_FILE_NAME$NEXTDATE$SRC_FILE_EXT | awk -F"" -v OFS="" '{ for (i=4;i<NF;i+=1) { gsub(",", "", $i) }; print }' | sed 's/\"//g' > $DEST_PATH$DEST_FILE_NAME$NEXTDATE$DEST_FILE_EXT 2>> $LOG_FULL

	  
	  #Craig's awk statement to take the destination_URL column and pull the correct cid and kid from it and insert it into new columns (cid_value and kid_value)that will be directly after the destination_url column
	  awk     
		'match($34,"cid=[0-9]*\&") {$(NF+35)=substr($0, RSTART+4, RLENGTH-6)}
         match($34,"kid=[0-9]*\&") {$(NF+36)=substr($0, RSTART+4, RLENGTH-14)}
         1
        ' file

craigwg

View Public Profile for craigwg

Find all posts by craigwg

12-08-2014

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

Don't use NF+35 nor 36. NF+1 will add one field at the end as requested, and then again. There may be awk versions that do not allow this type of assignments. If the field count is known and constant, you can use constant values as well, i.e. NF+1 and NF+2

RudiC

View Public Profile for RudiC

Find all posts by RudiC

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Improve awk code that has three separate parts

I have a very inefficient awk below that I need some help improving. Basically, there are three parts, that ideally, could be combined into one search and one output file. Thank you :). Part 1: Check if the user inputted string contains + or - in it and if it does the input is writting to a...

2. Shell Programming and Scripting

Separate a hash variable into 2 parts in Perl

Dear Perl users/experts, Could somebody help me how to solve my problem, I have a hash variable that I want to convert into dot file (graphviz). I know how to convert it to dot file but I need some modification on the output of the hash variable before convert it to dot file. Eeach key of...

3. Shell Programming and Scripting

Request: How to Parse dynamic SQL query to pad extra columns to match the fixed number of columns

Hello All, I have a requirement in which i will be given a sql query as input in a file with dynamic number of columns. For example some times i will get 5 columns, some times 8 columns etc up to 20 columns. So my requirement is to generate a output query which will have 20 columns all the...

4. UNIX for Dummies Questions & Answers

How to generate one long column by merging two separate two columns in a single file?

Dear all, I have a simple question. I have a file like below (separated by tab): col1 col2 col3 col4 col5 col6 col7 21 66745 rs1234 21 rs5678 23334 0.89 21 66745 rs2334 21 rs9978 23334 0.89 21 66745 ...

5. Shell Programming and Scripting

Need help with awk statement to break nth column in csv file into 3 separate columns

Hello Members, I have a csv file in the format below. Need help with awk statement to break nth column into 3 separate columns and export the changes to new file. input file --> file.csv cat file.csv|less "product/fruit/mango","location/asia/india","type/alphonso" need output in...

6. Shell Programming and Scripting

Print every 5 4th column values as separate row with different first column

Hi, I have the following file, chr1 100 200 20 chr1 201 300 22 chr1 220 345 23 chr1 230 456 33.5 chr1 243 567 90 chr1 345 600 20 chr1 430 619 21.78 chr1 870 910 112.3 chr1 914 920 12 chr1 930 999 13 My output would be peak1 20 22 23 33.5 90 peak2 20 21.78 112.3 12 13 Here the...

7. UNIX for Advanced & Expert Users

Parse key-value pair into separate rows

Hi, I'm getting key-value pairs in a string as follows - 0000xm7zcNDIkP888vRqGv93xA7:176n00qql||9700005405552747,9700005405717924,9700005405733788|unidentified,unidentified,unidentified I need output as follows - row1:...

8. UNIX for Advanced & Expert Users

Parse (delimited string) key-value pairs in a column into separate lines

Hi experts, e.g. i/p data looks like 0000xm7zcNDIkP888vRqGv93xA7:176n00qql||9700005405552747,9700005405717924,9700005405733788|unidentified,unidentified,unidentified|| o/p data should like - row1: 0000xm7zcNDIkP888vRqGv93xA7:176n00qql||9700005405552747|unidentified ...

9. Shell Programming and Scripting

Using bash to separate files files based on parts of a filename

Hey guys, Sorry for the basic question but I have a lot of files that I want to separate into groups based on filenames which I can then cat together. Eg I have: (a_b_c.txt) WB34_2_SLA8.txt WB34_1_SLA8.txt WB34_1_DB10.txt WB34_2_DB10.txt WB34_1_SLA8.txt WB34_2_SLA8.txt 77_1_SLA8.txt...

10. Shell Programming and Scripting

Extracting columns from a matrix and storing each column in a separate file

Hi All, I have a huge matrix file consisting some some millions rows and 6000 columns. The contents are just floating point numbers in the matrix. I want to extract each column (i.e. 6000 of them) and store each column in a separate file. For example, 1.dat will consist of elements from column...

Login or Register to Ask a Question