Problem in extraction when space is a field delimiter


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Problem in extraction when space is a field delimiter
# 1  
Old 08-09-2010
Problem in extraction when space is a field delimiter

I have more than 1000 files to parse. Each file contains few lines (number of lines varies) followed by a header line having all column's name (SPOT, NAME etc) and then values for those columns.
**Example File:
Code:
sdgafh
dfhaadfha
sfgaf dhah jkthdj
SPOT  NAME  GENE_NAME CH_MEAN   CHDN_MED  CH2B_MEAN
1         IYPR1         abc                      1.5                 3                      4.5
2         IYPR9         def                       3.6                                         6.3
3         IYPR11        ghi                      2.6                 4                      2.8
4         IYPR13        jkl                       1.6                                         6.7
5         IYPR19        mno                    2.5                 7                      4.3

. . . . . .
. . . . . .
Problems:

1) I need to remove all lines before the header line (for each file).
2) I need to extract 2 columns, let say 2nd (NAME) & 5th column (CHDN_MED). Since there is no values for 2nd & 4th rows for 5th column, it should give a blank space for each in output file. But its giving 6.3 and 6.7 respectively from the 6th column (I am tring with AWK).

The desired output file is :
Code:
IYPR1      3
IYPR9   
IYPR11    4
IYPR13   
IYPR19    7
.                 .
.                 .

**the actual files are with .xls extension and each file has more than 50 columns and 9,000 rows.

Please advise

Thanks

Last edited by Scott; 08-09-2010 at 10:03 AM.. Reason: Please use code tags
# 2  
Old 08-09-2010
Quote:
Originally Posted by AshwaniSharma09
**the actual files are with .xls extension and each file has more than 50 columns and 9,000 rows.
Does it mean that the files are in excel format?
# 3  
Old 08-09-2010
This script handles one of your files. You may change the rows to print out. (here: 2 and 5 -> RTP="2 5")
I assume that the number of rows is usually six and only row 5 can be empty. If not we need to enhance the case-structure.

You need to run the script with the filename as first parameter:

Code:
    ./script.sh file.txt

Code:
#! /bin/bash

RTP="2 5"

while read -a A ; do
    case ${#A[@]} in
        5) A[5]=${A[4]} ; A[4]="" ;;
        *) : ;;
    esac
    L=""
    for n in ${RTP} ; do
        L="${L} ${A[$((N=n-1))]}"
    done
    echo "${L}"
done < <(awk '/SPOT/,/EOF/ {print}' ${1})

# 4  
Old 08-09-2010
Assuming the format is a flat text file:
Code:
awk '
f{printf("%s%s\n", $2, NF>5? "\t" $5:"")}
/^SPOT/{f=1} 
' file

# 5  
Old 08-10-2010
Franklin52


thanks guys for quick replies.
@Franklin52 : yeah, all Input files are excel files. Its (script provided by you) working fine with " .txt " file but not with excel file, I afraid.

@elbrand : any row value for any column can be empty.

---------- Post updated at 12:44 PM ---------- Previous update was at 12:42 PM ----------

thanks guys for quick replies.
@Franklin52 : yeah, all Input files are excel files. Its (script provided by you) working fine with " .txt " file but not with excel file, I afraid.

@elbrand : any row value for any column can be empty.
# 6  
Old 08-10-2010
You could save the files as a csv file to process the files.

Regards
# 7  
Old 08-10-2010
I simply created a artificial ".txt" file with same content what I have given in my 1st post. your script is working very fine on it. But when I convert ".xls" file into ".txt "or ".csv" file, its not working. Again its taking values from other columns. This is what i did :

mv 10042.xls 10042.csv
OR
mv 10042.xls 10042.txt

Also, above command is for one file. I need it in a way so that it can be run on all files.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How can awk ignore the field delimiter like comma inside a field?

We have a csv file as mentioned below and the requirement is to change the date format in file as mentioned below. Current file (file.csv) ---------------------- empname,date_of_join,dept,date_of_resignation ram,08/09/2015,sales,21/06/2016 "akash,sahu",08/10/2015,IT,21/07/2016 ... (6 Replies)
Discussion started by: gopal.biswal
6 Replies

2. Shell Programming and Scripting

Need to use delimiter as : and space in awk

Hi , Please suggest me how do I use : (colon and one space) as a delimiter in awk Best regards, Vishal (2 Replies)
Discussion started by: Vishal_dba
2 Replies

3. Shell Programming and Scripting

Space as a delimiter

not sure if i'm doing this right i'm new tho this but i'm trying to use a space as a delimiter with the cut command my code is size=$( du -k -S -s /home/cmik | cut -d' ' -f1 ) i've also tried -f2 and switching the -d and -f around if that does anything (3 Replies)
Discussion started by: Cmik
3 Replies

4. Shell Programming and Scripting

Help needed XML Field Extraction

I had an immediate work to sort out the error code and error message which are associated within the log. But here im facing an problem to extract 3 different fields from the XML log can some one please help. I tried using different script including awk & nawk, but not getting the desired output. ... (18 Replies)
Discussion started by: raghunsi
18 Replies

5. Shell Programming and Scripting

deplace field delimiter

hi here my problem: i have 2 file: 1.tmp 111 222 555 2.tmp 1*TEST1**111*LA 2*TEST2**112*LA 3*TEST3**222*LA 4*TEST4**333*LA 5*TEST5**555*LA (5 Replies)
Discussion started by: saw7
5 Replies

6. Shell Programming and Scripting

Add field delimiter for the last field

I have a file with three fields and field delimiter '|' like: abc|12:13:45|123 xyz|12:87:32| qwe|54:21:09 In the file the 1st line has proper data -> abc|12:13:45|123 ,the 2nd line doesnt has data for the 3rd field which is okay , the 3rd line doesnt has data for the 3rd field as well the... (5 Replies)
Discussion started by: mehimadri
5 Replies

7. UNIX for Dummies Questions & Answers

Delimiter: Tab or Space?

Hello, Is there a direct command to check if the delimiter in your file is a tab or a space? And how can they be converted from one to another. Thanks, G (4 Replies)
Discussion started by: Gussifinknottle
4 Replies

8. UNIX for Dummies Questions & Answers

Problem Using Cut With A Space Delimiter

I am trying to extract 'postmaster' from the following string: PenaltyError:=554 5.7.1 Error, send your mail to postmaster@LOCALDOMAIN using the following command: cat /usr/share/assp/assp.cfg | grep ^PenaltyError:= | cut -d '@' -f1 | cut -f8 but it returns: PenaltyError:=554 5.7.1 Error,... (10 Replies)
Discussion started by: cleanden
10 Replies

9. Shell Programming and Scripting

delimiter appears in field

The typical line of the input file is as follows, 123|abcde|"xyz|mn"|ghelosa|3455hello| The delimiter is |. I need to change it to another delimiter, say ~. For the above line, the output should be: 123~abcde~xyz|mn~ghelosa~3455hello~ The challenge is when | appears in a field, it... (2 Replies)
Discussion started by: derekxu
2 Replies

10. Shell Programming and Scripting

Extraction of string from Stringlist using delimiter

Hi Experts, I need to extract some set of strings one be one using delimiter. Example: shellscript.sh|unix.sh|script_file.sh i need to extract this shellscript.sh,unix.sh,script_file.sh separately. I tried but couldn't get. Please help me.. Thanks & Regards :), Kanda (3 Replies)
Discussion started by: spkandy
3 Replies
Login or Register to Ask a Question