How to use regex on particular column (Removing comma from particular column)?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to use regex on particular column (Removing comma from particular column)?
# 1  
Old 09-23-2014
How to use regex on particular column (Removing comma from particular column)?

Hi,

I have pipe separated file which contains some data having comma(,) in it. I want to remove the comma(,) only from particular column without changing data in other columns.

Below is the sample data file, I want to remove the comma(,) only from 5th column.

Code:
$ cat file1
ABC | DEF, HIJ| LMN| opq,| 12,345|XYZ
PQR,123 | LMN| RWX|pqr| 6,12|LMN
13,67| PQR,123 | LMN| RWX|7,89|XY
LMN,SDF|ABC,S|OPQ|SLM,SD|9,00|P
123,23|45,789|12,567.04|78,765|6,90|9,00,875

I tried using sed as below but, I am not able to figure out why the used command is removing comma(,) from only 5th column for the 1st 4 lines and not removing any comma(,) from last line.

Code:
$ sed 's/\(.*\),\(.*\)/\1\2/g' file1
ABC | DEF, HIJ| LMN| opq,| 12345|XYZ
PQR,123 | LMN| RWX|pqr| 612|LMN
13,67| PQR,123 | LMN| RWX|789|XY
LMN,SDF|ABC,S|OPQ|SLM,SD|900|P
123,23|45,789|12,567.04|78,765|6,90|9,00875
$

I also tried using awk as below but, it is not removing comma(,). To be frank, I did not understand why I have to use "\\1" in awk, because in sed as used above I can use \1 for representing groups. It would be helpful if somebody can explain the difference in using regex in sed and awk for grouping data.


$ awk -F"|" '{print gensub(/\(.*\),\(.*\)/,"\\1""\\2",g,$5);}' file1
12,345
6,12
7,89
9,00
6,90


I am thinking whether we could use sed inside awk for particular column, but not sure how to do it.

Expected output is as follows:
Code:
ABC | DEF, HIJ| LMN| opq,| 12345|XYZ
PQR,123 | LMN| RWX|pqr| 612|LMN
13,67| PQR,123 | LMN| RWX|789|XY
LMN,SDF|ABC,S|OPQ|SLM,SD|900|P
123,23|45,789|12,567.04|78,765|690|9,00,875

Thanks in advance for any help on this.
Moderator's Comments:
Mod Comment Please use CODE tags (not ICODE tags) for multi-line displays.

Last edited by Don Cragun; 09-23-2014 at 07:36 AM.. Reason: Fix ICODE tags.
# 2  
Old 09-23-2014
The sed utility uses BREs (Basic Regular Expressions); awk uses EREs (Extended ...). But, you're making your RE much too complex for what you're trying to do in awk.

If you want to get rid of the 1st comma in the 5th field, use:
Code:
awk -F"|" '{sub(/,/,"",$5)}1' file1

and, if you want to get rid of all commas in the 5th field, use:
Code:
awk -F"|" '{gsub(/,/,"",$5)}1' file1

# 3  
Old 09-23-2014
Quote:
Originally Posted by Don Cragun
The sed utility uses BREs (Basic Regular Expressions); awk uses EREs (Extended ...). But, you're making your RE much too complex for what you're trying to do in awk.

If you want to get rid of the 1st comma in the 5th field, use:
Code:
awk -F"|" '{sub(/,/,"",$5)}1' file1

and, if you want to get rid of all commas in the 5th field, use:
Code:
awk -F"|" '{gsub(/,/,"",$5)}1' file1

Hello Don,

Above code will give the requested output but it wouldn't have OFS (Output field seprator) as |. So just adding the same to your suggestion.

Code:
awk -F"|" '{gsub(/\,/,X,$5)} 1' OFS="|"  Input_file ## For removing all commas in 5th field
OR
awk -F"|" '{sub(/\,/,X,$5)} 1' OFS="|"  Input_file  ## For removing 1st comma in 5th field

Thanks,
R. Singh
This User Gave Thanks to RavinderSingh13 For This Post:
# 4  
Old 09-23-2014
Thanks Don. Solution was much simpler than I thought.

---------- Post updated at 05:36 PM ---------- Previous update was at 05:31 PM ----------

Thanks Ravinder. Can you please advise what X refers to in your code. It removes the comma, but bit of confused why it is used instead of "".
# 5  
Old 09-23-2014
Quote:
Originally Posted by Prathmesh
Thanks Don. Solution was much simpler than I thought.

---------- Post updated at 05:36 PM ---------- Previous update was at 05:31 PM ----------

Thanks Ravinder. Can you please advise what X refers to in your code. It removes the comma, but bit of confused why it is used instead of "".
Hello Prathmesh,

It is a variable whose value is NULL so I am simply replaing the comma with a variable whose value is NULL.Smilie

We can also say as follows.

Quote:
gsub(a, b [, c]) Each substring matching the regular expression a in the string c, substitute the string b, and return the number of substitutions. If c is not supplied, use $0.
Thanks,
R. Singh
# 6  
Old 09-23-2014
Thanks Ravinder.
# 7  
Old 09-23-2014
Quote:
Originally Posted by RavinderSingh13
Hello Don,

Above code will give the requested output but it wouldn't have OFS (Output field seprator) as |. So just adding the same to your suggestion.

Code:
awk -F"|" '{gsub(/\,/,X,$5)} 1' OFS="|"  Input_file ## For removing all commas in 5th field
OR
awk -F"|" '{sub(/\,/,X,$5)} 1' OFS="|"  Input_file  ## For removing 1st comma in 5th field

Thanks,
R. Singh
Thanks Ravinder,
I obviously need to get more sleep. Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Bring values in the second column into single line (comma sep) for uniq value in the first column

I want to bring values in the second column into single line for uniq value in the first column. My input jvm01, Web 2.0 Feature Pack Library jvm01, IBM WebSphere JAX-RS jvm01, Custom01 Shared Library jvm02, Web 2.0 Feature Pack Library jvm02, IBM WebSphere JAX-RS jvm03, Web 2.0 Feature... (10 Replies)
Discussion started by: kchinnam
10 Replies

2. Shell Programming and Scripting

Difference of the same column when two other column matches and one column differs less than 1 hour

This is my input file : # cat list 20130430121600, cucm, location,76,2 20130430121600,cucm1,location1,76,4 20130430122000,cucm,location,80,8 20130430122000,cucm1,location1,90,8 20130430140000,cucm1,location1,87,11 20130430140000, cucm,location,67,9 This is the required output ... (1 Reply)
Discussion started by: Lakshmikumari
1 Replies

3. Shell Programming and Scripting

Insert comma in place of column

Hi all, I have a file in which I have to insert commna between entries of 2 column and createa new file separated by commas not a columns if input is FHIT Adenosine Monotungstate Not Available CS Trifluoroacetonyl Coenzyme A Not Available Theo expected output is ... (5 Replies)
Discussion started by: manigrover
5 Replies

4. Shell Programming and Scripting

Replace pipe <|> with comma <,> in a column

Hi All Gurus, I need to replace a pipe <|> with a comma <,> in a few columns with pipe delimited file. The column name are fixed for the replacement of comma <,>. For below example, Col3, Col6 and Col8 are columns need to replace with comma <,> if any pipe encountered. example:... (14 Replies)
Discussion started by: agathaeleanor
14 Replies

5. Shell Programming and Scripting

Match column 3 in file1 to column 1 in file 2 and replace with column 2 from file2

Match column 3 in file1 to column 1 in file 2 and replace with column 2 from file2 file 1 sample SNDK 80004C101 AT XLNX 983919101 BB NETL 64118B100 BS AMD 007903107 CC KLAC 482480100 DC TER 880770102 KATS ATHR 04743P108 KATS... (7 Replies)
Discussion started by: rydz00
7 Replies

6. Shell Programming and Scripting

rearrange the column names with comma as column delimiter

Hi, I am new to shell scripting, i have requirement can any one help me out in this regrads, in directory i have file like invoice1.txt, invoice2.txt in each file i have fixed number of columns, 62 in number but they are randomly arranged.like for first file invoice1.txt can have columns... (5 Replies)
Discussion started by: madhav62
5 Replies

7. Shell Programming and Scripting

Changing one column of delimited file column to fixed width column

Hi, Iam new to unix. I have one input file . Input file : ID1~Name1~Place1 ID2~Name2~Place2 ID3~Name3~Place3 I need output such that only first column should change to fixed width column of 15 characters of length. Output File: ID1<<12 spaces>>Name1~Place1 ID2<<12... (5 Replies)
Discussion started by: manneni prakash
5 Replies

8. Shell Programming and Scripting

select particular column and creat a new file with comma

Hi all, I am very new in programming. Can anyone please help me in the matter below? I have one raw file like: gi|77|ref|NC_002971.3| Coxiella burnetii RSA 493, complete genome 6371 ATCGTGGTTGTGGTTCAT 5032 P 2 12 gi|71|ref|NC_005773.3| Pseudomonas syringae pv.... (4 Replies)
Discussion started by: iammitra
4 Replies

9. UNIX Desktop Questions & Answers

Unix Comma Separated to Excel Column

I would like to copy 2 parts of a csv file from Unix to an XL sheet. However to save time I do not want to format the column ever time I cut and paste into XL(Text2Column). I've used awk -F, '{Print $1, $2....}'. Is there a script or code that can automatically format the csv for XL columns? ... (3 Replies)
Discussion started by: ravzter
3 Replies

10. UNIX for Dummies Questions & Answers

Removing comma after 3rd column

I have 10,000 lines to remove the commas after the 3rd column. Any help will be appreciated! Removing commas after the 3rd column Input aaaaa,bbbbb,cccccc,ddddddd aaaaa,bbbbb,cccccc aaaaa,bbbbb,cccccc,ddddddd,eeeeee aaaaa,bbbbb,cccccc,ddddddd,eeeeee,fffffff........(limit of comma is not... (13 Replies)
Discussion started by: buddyme
13 Replies
Login or Register to Ask a Question