A cleaner way to rearrange column


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting A cleaner way to rearrange column
# 1  
Old 11-28-2016
A cleaner way to rearrange column

Hello,

I have some tab delimited text data,
Code:
index   name    chg_p   chg_m
1       name,1  1       0
2       name,2  1       1
3       name,3  1       0
4       name,4  1       0
5       name,5  1       1

I need to duplicate the "index" column, call it "id" and insert it after the "name" column. Then I need to print all of the remaining columns. For the example above it would look like,

Code:
index   name    id      chg_p   chg_m
1       name,1  1       1       0
2       name,2  2       1       1
3       name,3  3       1       0
4       name,4  4       1       0
5       name,5  5       1       1

I have run something in awk like,
Code:
awk -F'\t' 'BEGIN{OFS="\t"} {print $1, $2, $1, $3, $4}' input > output

This appears to work but there are some issues. First, I end up with two columns named "index", so I have to add an additional command to find the second one and change it to "id". The second is that I don't see a clever way to print from column 3 to the end like you would with cut 3-. The only thing I have got to work is to hard code columns $3-$50 (there are that many columns in this particular file).

Is there a way to do this with fewer steps an more elegant code. What I have now is a bit kludgy, even for me.

Thanks,

LMHmedchem
# 2  
Old 11-28-2016
Hello LMHmedchem,

Could you please try following and let us know if this helps you.
Code:
awk 'NR==1{$2=$2 "\t" "id";print;next} {$2=$2 OFS $1} 1' OFS="\t"  Input_file

Output will be as follows.
Code:
index   name    id      chg_p   chg_m
1       name,1  1       1       0
2       name,2  2       1       1
3       name,3  3       1       0
4       name,4  4       1       0
5       name,5  5       1       1

Thanks,
R. Singh
This User Gave Thanks to RavinderSingh13 For This Post:
# 3  
Old 11-28-2016
In addition to what RavinderSingh13 suggested, you could also try:
Code:
awk 'BEGIN { FS = OFS = "\t" } { $2 = $2 OFS (NR > 1 ? $1 : "id") } 1' input > output

As always, if you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk.
This User Gave Thanks to Don Cragun For This Post:
# 4  
Old 11-28-2016
Hi.

I'll skip the usual posting of the entire demonstration script.
Code:
Input:
a  b  c  d  e  f  g  h  i
1,1 1,2 1,3 1,4 1,5 1,6 1,7 1,8 1,9
2,1 2,2 2,3 2,4 2,5 2,6 2,7 2,8 2,9
3,1 3,2 3,3 3,4 3,5 3,6 3,7 3,8 3,9

-----
 Results for sequence "recut 1,2,1,4,6-":
a       b       a       d       f       g       h       i
1,1     1,2     1,1     1,4     1,6     1,7     1,8     1,9
2,1     2,2     2,1     2,4     2,6     2,7     2,8     2,9
3,1     3,2     3,1     3,4     3,6     3,7     3,8     3,9

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 3.16.0-4-amd64, x86_64
Distribution        : Debian 8.6 (jessie) 
bash GNU bash 4.3.30
recut - ( local: RepRev 1.1, ~/bin/recut, 2010-06-10 )
csvtool - ( /usr/bin/csvtool, 2014-08-06 )

Missing textutils:
http://www1.cuni.cz/~obo/textutils/ verified (2016.08)

The recut command allows dash-notation, replication of fields, etc. The details for recut are:
Code:
recut   Process fields like cut, allow repetitions and re-ordering. (what)
Path    : ~/bin/recut
Version : - ( local: RepRev 1.1, ~/bin/recut, 2010-06-10 )
Length  : 56 lines
Type    : Perl script, ASCII text executable
Shebang : #!/usr/bin/perl
Modules : (for perl codes)
 Getopt::Long   2.42

See the webpage noted above for 80 or so similar commands.

The command csvtool can also do this:
Code:
 Results for sequence "csvtool -t " " -u " " col 1,2,1,4-5,6-":
a b a d e f g h i
1,1 1,2 1,1 1,4 1,5 1,6 1,7 1,8 1,9
2,1 2,2 2,1 2,4 2,5 2,6 2,7 2,8 2,9
3,1 3,2 3,1 3,4 3,5 3,6 3,7 3,8 3,9

Best wishes ... cheers, drl

Last edited by drl; 11-28-2016 at 08:03 AM..
This User Gave Thanks to drl For This Post:
# 5  
Old 11-28-2016
Thank you for the suggestions.

I have tried the code posted by both RavinderSingh13 and Don Cragun and both work with the examples I have tested on openSuse 13.2 x86_64.

There doesn't seem to be any difference in performance.

I haven't been able to try the csvtool and recut suggestions posted by drl because I don't have either tool installed. Both of these tools appear to have an elegant and intuitive syntax for this kind of thing. It looks like csvtool is available in the default repositories but recut is not. I generally lean towards using awk and sed because they are always available. It looks like you need all of textlive to get csvtool, which is too bad because textlive is about 1500 packages.


---------- Post updated at 01:30 PM ---------- Previous update was at 12:23 PM ----------

It looks like I also need to add another column to the end of the file. The column would be named "target" and would have the same value for every row. The row values would be floating point numbers like 0.0 or 1.0.

For example, add a column "target" to the end
Code:
index   name    id      chg_p   chg_m
1       name,1  1       1       0
2       name,2  2       1       1
3       name,3  3       1       0
4       name,4  4       1       0
5       name,5  5       1       1

To look like this
Code:
index   name    id      chg_p   chg_m   target
1       name,1  1       1       0       1.0
2       name,2  2       1       1       1.0
3       name,3  3       1       0       1.0
4       name,4  4       1       0       1.0
5       name,5  5       1       1       1.0


The only way I can think of to do this is to generate a second file with the new column and then paste the files together.
Code:
# get the number of lines in file that column will be added to
lines_in_file=$(wc -l < "$temp_output")

# add header to new file
echo "target" > temp_output2

# add a dummy target value for each data row
for (( c=1; c<$lines_in_file; c++ ))
do
   echo "1.0" >> temp_output2
done
# add newline at end
echo >> temp_output2

# combine the files
paste  $temp_output  temp_output2 > output_file

This more or less works, but are there any suggestions for a better way?

LMHmedchem

Last edited by LMHmedchem; 11-28-2016 at 02:51 PM..
# 6  
Old 11-28-2016
I presume you are using awk anyhow, as proposed by RavinderSingh13 and Don Cragun? How about adapting their code, like
Code:
awk 'BEGIN { FS = OFS = "\t" } { $2 = $2 OFS (!(NR-1)?"id":$1); $(NF+1) = !(NR-1)?"target":1.3 } 1'  file
index	name	id	chg_p	chg_m	target
1	name,1	1	1	0	1.3
2	name,2	2	1	1	1.3
3	name,3	3	1	0	1.3
4	name,4	4	1	0	1.3
5	name,5	5	1	1	1.3


Last edited by RudiC; 11-28-2016 at 04:21 PM..
This User Gave Thanks to RudiC For This Post:
# 7  
Old 11-28-2016
Hi.

Here are additional results with tools available from repositories or from the net:
Code snippet:
Code:
pl " Results for sequence \"arrange -f 1 2 1 4-5 7\":"
arrange -f '1 2 1 4-5 7' $FILE |
align
pe " (\"arrange\" is a work in progress)"

pl " Results for sequence \"csvtool -t \" \" -u \" \" col 1,2,1,4-5,6-\":"
csvtool -t " " -u " " col 1,2,1,4-5,6- $FILE

pl " Results for spit - create fields of data:"
spit -r 3 -v "2.71828"

pl " Results for \"csvfix -f 1,2,1,4-5\":"
csvfix read_dsv -s " " -f 1,2,1,4:5 $FILE |
tee f1 |
csvfix put -v "1.0" |
tee f2 |
csvfix write_dsv -s " " |
tee f3

pl " Results for add_field, part of CRUSH toolset:"
add_field -l "target" -v "3.14" -A "i" -d " " $FILE

pl " Results for add_field, part of CRUSH toolset:"
add_field -l "target" -v "3.14" -f 100 -d " " $FILE

producing:
-----
Code:
 Results for sequence "arrange -f 1 2 1 4-5 7":
a   b   a   d   e   g
1,1 1,2 1,1 1,4 1,5 1,7
2,1 2,2 2,1 2,4 2,5 2,7
3,1 3,2 3,1 3,4 3,5 3,7
 ("arrange" is a work in progress)

-----
 Results for sequence "csvtool -t " " -u " " col 1,2,1,4-5,6-":
a b a d e f g h i
1,1 1,2 1,1 1,4 1,5 1,6 1,7 1,8 1,9
2,1 2,2 2,1 2,4 2,5 2,6 2,7 2,8 2,9
3,1 3,2 3,1 3,4 3,5 3,6 3,7 3,8 3,9

-----
 Results for spit - create fields of data:
2.71828 
2.71828 
2.71828 

-----
 Results for "csvfix -f 1,2,1,4-5":
a b a d e 1.0
1,1 1,2 1,1 1,4 1,5 1.0
2,1 2,2 2,1 2,4 2,5 1.0
3,1 3,2 3,1 3,4 3,5 1.0

-----
 Results for add_field, part of CRUSH toolset:
a b c d e f g h i target
1,1 1,2 1,3 1,4 1,5 1,6 1,7 1,8 1,9 3.14
2,1 2,2 2,3 2,4 2,5 2,6 2,7 2,8 2,9 3.14
3,1 3,2 3,3 3,4 3,5 3,6 3,7 3,8 3,9 3.14

-----
 Results for add_field, part of CRUSH toolset:
a b c d e f g h i target
1,1 1,2 1,3 1,4 1,5 1,6 1,7 1,8 1,9 3.14
2,1 2,2 2,3 2,4 2,5 2,6 2,7 2,8 2,9 3.14
3,1 3,2 3,3 3,4 3,5 3,6 3,7 3,8 3,9 3.14

recut   Process fields like cut, allow repetitions and re-ordering. (what)
Path    : ~/bin/recut
Version : - ( local: RepRev 1.1, ~/bin/recut, 2010-06-10 )
Length  : 56 lines
Type    : Perl script, ASCII text executable
Shebang : #!/usr/bin/perl
Home    : http://www1.cuni.cz/~obo/textutils/
Modules : (for perl codes)
 Getopt::Long   2.42

arrange Arrange fields, like cut, but in user-specified order. (what)
Path    : ~/bin/arrange
Version : 1.15
Length  : 355 lines
Type    : Perl script, ASCII text executable
Shebang : #!/usr/bin/perl
Modules : (for perl codes)
 warnings       1.23
 strict 1.08
 Carp   1.3301
 Getopt::Euclid 0.4.5

spit    Generate, create structured data sequence, -r rows of -v values in -c columns. (what)
Path    : ~/bin/spit
Version : - ( local: RepRev 1.10, ~/bin/spit, 2015-08-13 )
Length  : 80 lines
Type    : awk or perl script, ASCII text
Shebang : #!/usr/bin/env bash

csvtool tool for performing manipulations on CSV files from sh... (man)
Path    : /usr/bin/csvtool
Version : - ( /usr/bin/csvtool, 2014-08-06 )
Type    : ELF 64-bit LSB executable, x86-64, version 1 (SYSV ...)
Help    : probably available with --help
Home    : https://github.com/Chris00/ocaml-csv

csvfix  Manipulate csv files, file:///home/dennisl/src/csvfix/csvfix16/csvfix.html?Introduction.html (local html) (doc)
Path    : ~/executable/csvfix
Version : - ( local: ~/executable/csvfix, 2014-05-17 )
Type    : ELF 64-bit LSB executable, x86-64, version 1 (SYSV ...)
Home    : http://neilb.bitbucket.org/csvfix/

add_field       adds a field to a string-delimited log file. (man)
Path    : /usr/local/bin/add_field
Version : - ( local: /usr/local/bin/add_field, 2016-11-28 )
Length  : 273 lines
Type    : a /usr/bin/perl -w script, ASCII text executable
Shebang : #!/usr/bin/perl -w
Help    : probably available with [     ]-h,--help
Home    : https://github.com/google/crush-tools
Modules : (for perl codes)
 strict 1.08
 Getopt::Long   2.42
 Carp   1.3301

Of interest may be csvfix which allows a sub-command put to create a new field.

The local spit creates fields of values (or location in a matrix).

The CRUSH toolset includes add_field with a name and value.

The details above show the homes for the toolsets.

Best wishes ... cheers, drl
This User Gave Thanks to drl For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. War Stories

Data Centre meets Vacuum Cleaner

Hi Folks, I have just spent a couple of days resolving some problems at the remote DR data centre, sorting out the problems caused by the over zealous use of a Vacuum cleaner of all things. We have a backup server a SUN V480R with a Storedge 3510 and expansion attached which suffered a... (6 Replies)
Discussion started by: gull04
6 Replies

2. Shell Programming and Scripting

Maybe a cleaner way to generate a file?

greetings, to be clear, i have a solution but i'm wondering if anyone has a cleaner way to accomplish the following: the variable: LSB_MCPU_HOSTS='t70c7n120 16 t70c7n121 16 t70c7n122 16 t70c7n123 16 t70c7n124 16 t70c7n125 16 t70c7n126 16 t70c7n127 16 t70c7n128 16 t70c7n129 16 t70c7n130 16... (2 Replies)
Discussion started by: crimso
2 Replies

3. Shell Programming and Scripting

Cleaner way to use shell variable in awk /X/,/Y/ syntax?

$ cat data Do NOT print me START_MARKER Print Me END_MARKER Do NOT print me $ cat awk.sh start=START_MARKER end=END_MARKER echo; echo Is this ugly syntax the only way? awk '/'"$start"'/,/'"$end"'/ { print }' data echo; echo Is there some modification of this that would work? awk... (2 Replies)
Discussion started by: hanson44
2 Replies

4. Shell Programming and Scripting

Cleaner method for this if-then statement?

I have a script that runs once per month. It performs a certain task ONLY if the month is January, April, July, or October. MONTH=`date +%m` if || || || ; then do something else do a different thing fi Is there a neater way of doing it than my four separate "or" comparisons? That... (2 Replies)
Discussion started by: lupin..the..3rd
2 Replies

5. Shell Programming and Scripting

Grabbing the newest file, cleaner method?

Greetings, I'm doing a process whereby I need to search for all filenames containing a given bit of text and grab the newest file from what may be 20 results. In a script I'm writing, i've got a monster line to do the sort as follows: find /opt/work/reports/input -name "*$searchtarget*" |... (4 Replies)
Discussion started by: Karunamon
4 Replies

6. Programming

How to simplify this perl script to a cleaner simpler look?

my $branch_email_e = $FORM{r_Branch}; my $hostbranch_email_e = $FORM{r_Host_Branch}; my $branch_email_f = $FORM{r_Direction_generale}; my $hostbranch_email_f = $FORM{r_Direction_generale_daccueil}; my $branch_realname_e = ''; my $branch_realname_f = ''; ... (4 Replies)
Discussion started by: callyvan
4 Replies

7. Shell Programming and Scripting

rearrange the column names with comma as column delimiter

Hi, I am new to shell scripting, i have requirement can any one help me out in this regrads, in directory i have file like invoice1.txt, invoice2.txt in each file i have fixed number of columns, 62 in number but they are randomly arranged.like for first file invoice1.txt can have columns... (5 Replies)
Discussion started by: madhav62
5 Replies

8. Shell Programming and Scripting

script to rearrange data.

Hello. I have data in the following format (the spaces at the beginning of lines are included): 1 2 2 0.39621 0.00000 1 2 2 0.00000+-0.0000 * 1 2 ... (5 Replies)
Discussion started by: andersgs
5 Replies

9. Shell Programming and Scripting

rearrange a file

hi! in awk, i have a file like this: Trace1: WRIT,Trace2: BLAN,Trace3: BLAN, -47.2120018005371,,,39815.4809027778 -46.3009986877441,,,39815.4809027778 -46.277000427246,,,39815.4809143519 -46.7389984130859,,,39815.4809259259 -46.3460006713867,,,39815.4809259259... (10 Replies)
Discussion started by: riderman
10 Replies
Login or Register to Ask a Question