From my perspective, this is a csv manipulation problem. Consequently, a simple csv-aware tool seems appropriate. The dataset is transformed to csv format, and the header-name lines are collected into a csv-like string. The named columns are extracted, and the file is converted from csv format to TAB-separated format -- which the OP required,
Collecting these all together in a script and using dataset from ripat:
Code:
#!/usr/bin/env bash
# @(#) s1 Demonstrate extraction of fields, csvtool.
# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
LC_ALL=C ; LANG=C ; export LC_ALL LANG
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
em() { pe "$*" >&2 ; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C dixf sed pass-fail
pe
dixf csvtool
FILE=${1-data1}
E=expected-output.txt
H=headers
pl " Input data file $FILE:"
head $FILE
pl " Input file converted to csv:"
sed -r 's/\s+/,/g' $FILE |
tee t1
pl " Header name file $H, name list:"
head $H
h1=$( paste -s -d, $H )
pe " Header list = $h1"
pl " Expected output:"
cat $E
pl " Results, extract columns, convert csv to TAB-spaced:"
csvtool namedcol $h1 t1 |
tee t2 |
sed -r 's/,/\t/g' |
tee f1
pl " Verify results if possible:"
C=$HOME/bin/pass-fail
[ -f $C ] && $C || ( pe; pe " Results cannot be verified." ) >&2
exit 0
producing:
Code:
$ ./s1
Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 3.16.0-4-amd64, x86_64
Distribution : Debian 8.6 (jessie)
bash GNU bash 4.3.30
dixf (local) 1.21
sed (GNU sed) 4.2.2
pass-fail (local) 1.9
csvtool tool for performing manipulations on CSV files from sh... (man)
Path : /usr/bin/csvtool
Version : - ( /usr/bin/csvtool, 2014-08-06 )
Type : ELF 64-bit LSB executable, x86-64, version 1 (SYSV ...)
Help : probably available with --help
Home : https://github.com/Chris00/ocaml-csv
-----
Input data file data1:
H1 H2 H3 H4
01 02 03 04
11 12 13 14
21 22 23 24
31 32 33 34
-----
Input file converted to csv:
H1,H2,H3,H4
01,02,03,04
11,12,13,14
21,22,23,24
31,32,33,34
-----
Header name file headers, name list:
H3
H4
H1
H2
Header list = H3,H4,H1,H2
-----
Expected output:
H3 H4 H1 H2
03 04 01 02
13 14 11 12
23 24 21 22
33 34 31 32
-----
Results, extract columns, convert csv to TAB-spaced:
H3 H4 H1 H2
03 04 01 02
13 14 11 12
23 24 21 22
33 34 31 32
-----
Verify results if possible:
-----
Comparison of 5 created lines with 5 lines of desired results:
Succeeded -- files (computed) f1 and (standard) expected-output.txt have same content.
The command csvtool can be found in the Debian repository or at github as noted.
@LMHmedchem: with 300 posts, you should know that posting data samples, expected output, and your computing environment will help make replies easier and more likely to be applicable to your situation. Please do that in your future posts.
Hi,
I have several text files each containing some data as shown below:
File1.txt
>DataHeader
Data...
Data...
File2.txt
>DataHeader
Data...
Data...
etc.
What I want is to change the 'DataHeader' based on the file name. So the output should look like:
File1.txt
>File1
... (1 Reply)
Hi,
I need helping in finding some of the text in one file and some columns which have same column in file 1
EG
cat file_1
aaaa
bbbb
cccc
dddd
eeee
fffff
gggg
hhhh
cat file_2
aaaa,abcd,effgh,ereref,name,age,sex,...........
bbbb,efdfh,erere,afdafds,name,age,sex.............. (1 Reply)
Hi All,
I have some data like below.
Step1,Param1,Param2,Param3
1,2,3,4
2,3,4,5
2,4,5,6
3,0,1,2
3,0,0,0
3,2,1,3
........
so on
Where I need to find the median(arithmetic) of each column from Param1...to..Param3 for each set of Step1 values.
(Sort each specific column, if the... (5 Replies)
Hi All,
I want to remove the content based on the header information .
Please find the example below.
File1.txt
Name|Last|First|Location|DepId|Depname|DepLoc
naga|rr|tion|hyd|1|wer|opr
Nava|ra|tin|gen|2|wera|opra
I have to search for the DepId and remove the data from the... (5 Replies)
Hi All,
I need the modification for the below mentioned code (found in one more post https://www.unix.com/shell-programming-scripting/27161-script-generate-average-values.html) to find the average values for all the columns(but for a specific rows) and print the averages side by side.
I have... (4 Replies)
Hi,
I have two input files; file1 and file2. I compare them based on matched values in 1 column and print selected columns of the second file (file2). I got the result but the header was not printed. i want the header of file2 to be printed together with the result. Then i did below codes:-
... (3 Replies)
Hello,
I have some tab delimited text files with a three header rows. The headers look like, (sorry the tabs look so messy).
index group Name input input input input input input input input input input input... (9 Replies)
Hi Friends,
I have files with columns like this. This sample input below is partial.
Please check below for main file link. Each file will have only two rows.
... (8 Replies)
I've been struggling with this one for quite a while and cannot seem to find a solution for this find/replace scenario. Perhaps I'm getting rusty.
I have a file that contains a number of metrics (exactly 3 fields per line) from a few appliances that are collected in parallel. To identify the... (3 Replies)
Hi All,
i am trying to print required multiple columns dynamically from a fie.
But i am able to print only one column at a time.
i am new to shell script, please help me on this issue.
i am using below script
awk -v COLT=$1 '
NR==1 {
for (i=1; i<=NF; i++) {
... (2 Replies)