A cleaner way to rearrange column

11-28-2016

Registered User

362, 16

Join Date: Mar 2010

Last Activity: 3 March 2020, 10:38 PM EST

Location: Boston

Posts: 362

Thanks Given: 193

Thanked 16 Times in 15 Posts

A cleaner way to rearrange column

Hello,

I have some tab delimited text data,

Code:

index   name    chg_p   chg_m
1       name,1  1       0
2       name,2  1       1
3       name,3  1       0
4       name,4  1       0
5       name,5  1       1

I need to duplicate the "index" column, call it "id" and insert it after the "name" column. Then I need to print all of the remaining columns. For the example above it would look like,

Code:

index   name    id      chg_p   chg_m
1       name,1  1       1       0
2       name,2  2       1       1
3       name,3  3       1       0
4       name,4  4       1       0
5       name,5  5       1       1

I have run something in awk like,

Code:

awk -F'\t' 'BEGIN{OFS="\t"} {print $1, $2, $1, $3, $4}' input > output

This appears to work but there are some issues. First, I end up with two columns named "index", so I have to add an additional command to find the second one and change it to "id". The second is that I don't see a clever way to print from column 3 to the end like you would with cut 3-. The only thing I have got to work is to hard code columns $3-$50 (there are that many columns in this particular file).

Is there a way to do this with fewer steps an more elegant code. What I have now is a bit kludgy, even for me.

Thanks,

LMHmedchem

LMHmedchem

View Public Profile for LMHmedchem

Find all posts by LMHmedchem

11-28-2016

Moderator

3,105, 1,603

Join Date: May 2013

Last Activity: 31 August 2020, 1:46 AM EDT

Location: Chennai

Posts: 3,105

Thanks Given: 1,269

Thanked 1,603 Times in 1,369 Posts

Hello LMHmedchem,

Could you please try following and let us know if this helps you.

Code:

awk 'NR==1{$2=$2 "\t" "id";print;next} {$2=$2 OFS $1} 1' OFS="\t"  Input_file

Output will be as follows.

Code:

index   name    id      chg_p   chg_m
1       name,1  1       1       0
2       name,2  2       1       1
3       name,3  3       1       0
4       name,4  4       1       0
5       name,5  5       1       1

Thanks,
R. Singh

This User Gave Thanks to RavinderSingh13 For This Post:

RavinderSingh13

View Public Profile for RavinderSingh13

Find all posts by RavinderSingh13

11-28-2016

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

In addition to what RavinderSingh13 suggested, you could also try:

Code:

awk 'BEGIN { FS = OFS = "\t" } { $2 = $2 OFS (NR > 1 ? $1 : "id") } 1' input > output

As always, if you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk.

This User Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

11-28-2016

Registered User

2,288, 480

Join Date: Apr 2007

Last Activity: 3 May 2020, 8:28 AM EDT

Location: Saint Paul, MN USA / BSD, CentOS, Debian, OS X, Solaris

Posts: 2,288

Thanks Given: 430

Thanked 480 Times in 395 Posts

Hi.

I'll skip the usual posting of the entire demonstration script.

Code:

Input:
a  b  c  d  e  f  g  h  i
1,1 1,2 1,3 1,4 1,5 1,6 1,7 1,8 1,9
2,1 2,2 2,3 2,4 2,5 2,6 2,7 2,8 2,9
3,1 3,2 3,3 3,4 3,5 3,6 3,7 3,8 3,9

-----
 Results for sequence "recut 1,2,1,4,6-":
a       b       a       d       f       g       h       i
1,1     1,2     1,1     1,4     1,6     1,7     1,8     1,9
2,1     2,2     2,1     2,4     2,6     2,7     2,8     2,9
3,1     3,2     3,1     3,4     3,6     3,7     3,8     3,9

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 3.16.0-4-amd64, x86_64
Distribution        : Debian 8.6 (jessie) 
bash GNU bash 4.3.30
recut - ( local: RepRev 1.1, ~/bin/recut, 2010-06-10 )
csvtool - ( /usr/bin/csvtool, 2014-08-06 )

Missing textutils:
http://www1.cuni.cz/~obo/textutils/ verified (2016.08)

The recut command allows dash-notation, replication of fields, etc. The details for recut are:

Code:

recut   Process fields like cut, allow repetitions and re-ordering. (what)
Path    : ~/bin/recut
Version : - ( local: RepRev 1.1, ~/bin/recut, 2010-06-10 )
Length  : 56 lines
Type    : Perl script, ASCII text executable
Shebang : #!/usr/bin/perl
Modules : (for perl codes)
 Getopt::Long   2.42

See the webpage noted above for 80 or so similar commands.

The command csvtool can also do this:

Code:

 Results for sequence "csvtool -t " " -u " " col 1,2,1,4-5,6-":
a b a d e f g h i
1,1 1,2 1,1 1,4 1,5 1,6 1,7 1,8 1,9
2,1 2,2 2,1 2,4 2,5 2,6 2,7 2,8 2,9
3,1 3,2 3,1 3,4 3,5 3,6 3,7 3,8 3,9

Best wishes ... cheers, drl

Last edited by drl; 11-28-2016 at 08:03 AM..

This User Gave Thanks to drl For This Post:

drl

View Public Profile for drl

Find all posts by drl

11-28-2016

Registered User

362, 16

Join Date: Mar 2010

Last Activity: 3 March 2020, 10:38 PM EST

Location: Boston

Posts: 362

Thanks Given: 193

Thanked 16 Times in 15 Posts

Thank you for the suggestions.

I have tried the code posted by both RavinderSingh13 and Don Cragun and both work with the examples I have tested on openSuse 13.2 x86_64.

There doesn't seem to be any difference in performance.

I haven't been able to try the csvtool and recut suggestions posted by drl because I don't have either tool installed. Both of these tools appear to have an elegant and intuitive syntax for this kind of thing. It looks like csvtool is available in the default repositories but recut is not. I generally lean towards using awk and sed because they are always available. It looks like you need all of textlive to get csvtool, which is too bad because textlive is about 1500 packages.

---------- Post updated at 01:30 PM ---------- Previous update was at 12:23 PM ----------

It looks like I also need to add another column to the end of the file. The column would be named "target" and would have the same value for every row. The row values would be floating point numbers like 0.0 or 1.0.

For example, add a column "target" to the end

Code:

index   name    id      chg_p   chg_m
1       name,1  1       1       0
2       name,2  2       1       1
3       name,3  3       1       0
4       name,4  4       1       0
5       name,5  5       1       1

To look like this

Code:

index   name    id      chg_p   chg_m   target
1       name,1  1       1       0       1.0
2       name,2  2       1       1       1.0
3       name,3  3       1       0       1.0
4       name,4  4       1       0       1.0
5       name,5  5       1       1       1.0

The only way I can think of to do this is to generate a second file with the new column and then paste the files together.

Code:

# get the number of lines in file that column will be added to
lines_in_file=$(wc -l < "$temp_output")

# add header to new file
echo "target" > temp_output2

# add a dummy target value for each data row
for (( c=1; c<$lines_in_file; c++ ))
do
   echo "1.0" >> temp_output2
done
# add newline at end
echo >> temp_output2

# combine the files
paste  $temp_output  temp_output2 > output_file

This more or less works, but are there any suggestions for a better way?

LMHmedchem

Last edited by LMHmedchem; 11-28-2016 at 02:51 PM..

LMHmedchem

View Public Profile for LMHmedchem

Find all posts by LMHmedchem

11-28-2016

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

I presume you are using awk anyhow, as proposed by RavinderSingh13 and Don Cragun? How about adapting their code, like

Code:

awk 'BEGIN { FS = OFS = "\t" } { $2 = $2 OFS (!(NR-1)?"id":$1); $(NF+1) = !(NR-1)?"target":1.3 } 1'  file
index	name	id	chg_p	chg_m	target
1	name,1	1	1	0	1.3
2	name,2	2	1	1	1.3
3	name,3	3	1	0	1.3
4	name,4	4	1	0	1.3
5	name,5	5	1	1	1.3

Last edited by RudiC; 11-28-2016 at 04:21 PM..

This User Gave Thanks to RudiC For This Post:

RudiC

View Public Profile for RudiC

Find all posts by RudiC

11-28-2016

Registered User

2,288, 480

Join Date: Apr 2007

Last Activity: 3 May 2020, 8:28 AM EDT

Location: Saint Paul, MN USA / BSD, CentOS, Debian, OS X, Solaris

Posts: 2,288

Thanks Given: 430

Thanked 480 Times in 395 Posts

Hi.

Here are additional results with tools available from repositories or from the net:
Code snippet:

Code:

pl " Results for sequence \"arrange -f 1 2 1 4-5 7\":"
arrange -f '1 2 1 4-5 7' $FILE |
align
pe " (\"arrange\" is a work in progress)"

pl " Results for sequence \"csvtool -t \" \" -u \" \" col 1,2,1,4-5,6-\":"
csvtool -t " " -u " " col 1,2,1,4-5,6- $FILE

pl " Results for spit - create fields of data:"
spit -r 3 -v "2.71828"

pl " Results for \"csvfix -f 1,2,1,4-5\":"
csvfix read_dsv -s " " -f 1,2,1,4:5 $FILE |
tee f1 |
csvfix put -v "1.0" |
tee f2 |
csvfix write_dsv -s " " |
tee f3

pl " Results for add_field, part of CRUSH toolset:"
add_field -l "target" -v "3.14" -A "i" -d " " $FILE

pl " Results for add_field, part of CRUSH toolset:"
add_field -l "target" -v "3.14" -f 100 -d " " $FILE

producing:
-----

Code:

 Results for sequence "arrange -f 1 2 1 4-5 7":
a   b   a   d   e   g
1,1 1,2 1,1 1,4 1,5 1,7
2,1 2,2 2,1 2,4 2,5 2,7
3,1 3,2 3,1 3,4 3,5 3,7
 ("arrange" is a work in progress)

-----
 Results for sequence "csvtool -t " " -u " " col 1,2,1,4-5,6-":
a b a d e f g h i
1,1 1,2 1,1 1,4 1,5 1,6 1,7 1,8 1,9
2,1 2,2 2,1 2,4 2,5 2,6 2,7 2,8 2,9
3,1 3,2 3,1 3,4 3,5 3,6 3,7 3,8 3,9

-----
 Results for spit - create fields of data:
2.71828 
2.71828 
2.71828 

-----
 Results for "csvfix -f 1,2,1,4-5":
a b a d e 1.0
1,1 1,2 1,1 1,4 1,5 1.0
2,1 2,2 2,1 2,4 2,5 1.0
3,1 3,2 3,1 3,4 3,5 1.0

-----
 Results for add_field, part of CRUSH toolset:
a b c d e f g h i target
1,1 1,2 1,3 1,4 1,5 1,6 1,7 1,8 1,9 3.14
2,1 2,2 2,3 2,4 2,5 2,6 2,7 2,8 2,9 3.14
3,1 3,2 3,3 3,4 3,5 3,6 3,7 3,8 3,9 3.14

-----
 Results for add_field, part of CRUSH toolset:
a b c d e f g h i target
1,1 1,2 1,3 1,4 1,5 1,6 1,7 1,8 1,9 3.14
2,1 2,2 2,3 2,4 2,5 2,6 2,7 2,8 2,9 3.14
3,1 3,2 3,3 3,4 3,5 3,6 3,7 3,8 3,9 3.14

recut   Process fields like cut, allow repetitions and re-ordering. (what)
Path    : ~/bin/recut
Version : - ( local: RepRev 1.1, ~/bin/recut, 2010-06-10 )
Length  : 56 lines
Type    : Perl script, ASCII text executable
Shebang : #!/usr/bin/perl
Home    : http://www1.cuni.cz/~obo/textutils/
Modules : (for perl codes)
 Getopt::Long   2.42

arrange Arrange fields, like cut, but in user-specified order. (what)
Path    : ~/bin/arrange
Version : 1.15
Length  : 355 lines
Type    : Perl script, ASCII text executable
Shebang : #!/usr/bin/perl
Modules : (for perl codes)
 warnings       1.23
 strict 1.08
 Carp   1.3301
 Getopt::Euclid 0.4.5

spit    Generate, create structured data sequence, -r rows of -v values in -c columns. (what)
Path    : ~/bin/spit
Version : - ( local: RepRev 1.10, ~/bin/spit, 2015-08-13 )
Length  : 80 lines
Type    : awk or perl script, ASCII text
Shebang : #!/usr/bin/env bash

csvtool tool for performing manipulations on CSV files from sh... (man)
Path    : /usr/bin/csvtool
Version : - ( /usr/bin/csvtool, 2014-08-06 )
Type    : ELF 64-bit LSB executable, x86-64, version 1 (SYSV ...)
Help    : probably available with --help
Home    : https://github.com/Chris00/ocaml-csv

csvfix  Manipulate csv files, file:///home/dennisl/src/csvfix/csvfix16/csvfix.html?Introduction.html (local html) (doc)
Path    : ~/executable/csvfix
Version : - ( local: ~/executable/csvfix, 2014-05-17 )
Type    : ELF 64-bit LSB executable, x86-64, version 1 (SYSV ...)
Home    : http://neilb.bitbucket.org/csvfix/

add_field       adds a field to a string-delimited log file. (man)
Path    : /usr/local/bin/add_field
Version : - ( local: /usr/local/bin/add_field, 2016-11-28 )
Length  : 273 lines
Type    : a /usr/bin/perl -w script, ASCII text executable
Shebang : #!/usr/bin/perl -w
Help    : probably available with [     ]-h,--help
Home    : https://github.com/google/crush-tools
Modules : (for perl codes)
 strict 1.08
 Getopt::Long   2.42
 Carp   1.3301

Of interest may be csvfix which allows a sub-command put to create a new field.

The local spit creates fields of values (or location in a matrix).

The CRUSH toolset includes add_field with a name and value.

The details above show the homes for the toolsets.

Best wishes ... cheers, drl

This User Gave Thanks to drl For This Post:

drl

View Public Profile for drl

Find all posts by drl

Shell Programming and Scripting

A cleaner way to rearrange column

9 More Discussions You Might Find Interesting

1. War Stories

Data Centre meets Vacuum Cleaner

Discussion started by: gull04

2. Shell Programming and Scripting

Maybe a cleaner way to generate a file?

Discussion started by: crimso

3. Shell Programming and Scripting

Cleaner way to use shell variable in awk /X/,/Y/ syntax?

Discussion started by: hanson44

4. Shell Programming and Scripting

Cleaner method for this if-then statement?

Discussion started by: lupin..the..3rd

5. Shell Programming and Scripting

Grabbing the newest file, cleaner method?

Discussion started by: Karunamon

6. Programming

How to simplify this perl script to a cleaner simpler look?

Discussion started by: callyvan

7. Shell Programming and Scripting

rearrange the column names with comma as column delimiter

Discussion started by: madhav62

8. Shell Programming and Scripting

script to rearrange data.

Discussion started by: andersgs

9. Shell Programming and Scripting

rearrange a file

Discussion started by: riderman