Extract column to a new file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Extract column to a new file
# 1  
Old 03-06-2011
Extract column to a new file

Hi All,

Using below command to extract text from a file
Code:
grep -E "^.{20}5004" filename.rtf >> 5004

This will give all lines with text 5004 starting at position 20.

The file filename.rtf contains several rows (millions). The four characters starting from 20 position is repeating in several rows. For example

Code:
000002              5004                        050083344101
000003              60646064                    910107702201
000004              50045004                    911106235001
000005              66076607                    911103471201

Need a command to extract each row to a new file. The new file name to be the extracted with four characters. For example

file 5004 to contain
Code:
000002              5004                        050083344101
000004              50045004                    911106235001

Similarly file 6064 to contain
Code:
000003              60646064                    910107702201

Worked on below incomplete solution...
Code:
for i in `cut -c21-24 filename.rtf`;do grep -E "^.{20}$i" filename.rtf >> terminal/$i; done

Its infinite loop. Need to break at EOF. Please help.
# 2  
Old 03-07-2011
Quote:
Originally Posted by hsehdar
The file filename.rtf contains several rows (millions). The four characters starting from 20 position is repeating in several rows. For example

Code:
000002              5004                        050083344101
000003              60646064                    910107702201
000004              50045004                    911106235001
000005              66076607                    911103471201

Need a command to extract each row to a new file. The new file name to be the extracted with four characters.
Base on your data sample you can try:
Code:
awk '{print $0 > substr($0,21,4)}' file

This User Gave Thanks to danmero For This Post:
# 3  
Old 03-07-2011
Quote:
Originally Posted by hsehdar
Code:
for i in `cut -c21-24 filename.rtf`;do grep -E "^.{20}$i"  filename.rtf >> terminal/$i; done

Its infinite loop. Need to break at EOF. Please help.

There's no infinite loop that I can see. It's just a very very very inefficient approach. You're reading a file with millions of lines in its entirety once per line? That's on the order of a trillion lines to read (short scale, 10^12). And probably an incorrect result since multiple lines with matching prefixes will each be matched by grep multiple times.

---------- Post updated at 11:35 PM ---------- Previous update was at 11:26 PM ----------

Quote:
Originally Posted by danmero
Base on your data sample you can try:
Code:
awk '{print $0 > substr($0,21,4)}' file

Depending on the variance in that 4 character string and on the account resource limits, a >>/close may be necessary. Just a heads up to the OP in case an open file descriptor limit is hit.

Regards,
Alister
This User Gave Thanks to alister For This Post:
# 4  
Old 03-07-2011
Thanks danmero for the command. It works.

Thanks alister for correcting me in the for loop.
# 5  
Old 03-07-2011
Quote:
Originally Posted by alister
Depending on the variance in that 4 character string and on the account resource limits, a >>/close may be necessary. Just a heads up to the OP in case an open file descriptor limit is hit.
Right, let's try this one:
Code:
awk '{file=substr($0,21,4);print >> file;close(file)}' file

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Extract a column and multiple by 1000 and replace it on same file

Hi All, I need to extract a position in the file and multiple the value by 1000 and the replace it . Original 0010001200084701217637306521200000000000010010000000 ---> 000847 * 1000 0010012700086001213437404323000000000000001001000000 ---> 000860 * 1000... (2 Replies)
Discussion started by: arunkumar_mca
2 Replies

2. Shell Programming and Scripting

Perl - Extract first column from file

Hi, I want to extract first column from a file and redirect the output to another file in perl. I am able to do this from command line by executing below commands. perl -anle 'print $F' Input.dat > Output.dat perl -ne '@F = split("\t", $_); print "$F\n";' Input.dat > Output.dat perl -anE... (7 Replies)
Discussion started by: Neethu
7 Replies

3. Shell Programming and Scripting

I need extract column pattern in file

Hi, it's my first time in this site. I've a file that look likes Edges 21 82 Edges 3 22 Edges 34 12 Edges 1 24 Edges 6 2 Edges 12 22 etc. I need extract just the second and third column with the space between them. Thanks:) Please use code tags next time for your code and data. (4 Replies)
Discussion started by: iMunk
4 Replies

4. Shell Programming and Scripting

Extract second column tab delimited file

I have a file which looks like this: 73450 articles and news developmental psychology 2006-03-30 16:22:40 1 http://www.usnews.com 73450 articles and news developmental psychology 2006-03-30 16:22:40 2 http://www.apa.org 73450 articles and news developmental psychology 2006-03-30... (1 Reply)
Discussion started by: shoaibjameel123
1 Replies

5. UNIX for Dummies Questions & Answers

How to extract one column from csv file in perl?

Hi everyone, i am new to perl programming, i have a problem in extracting single column from csv file. the column is the 20th column, please help me.. at present i use this code #!C:/perl/bin use warnings; use strict; my $file1 = $ARGV; open FILE1, "<$file1" or die "Can't... (13 Replies)
Discussion started by: kvth
13 Replies

6. UNIX for Dummies Questions & Answers

Extract records by column value - file non-delimited

the data in my file is has no delimiters. it looks like this: H52082320024740010PH333200612290000930 0.0020080131 D5208232002474000120070306200703060580T1502 TT 1.00 H52082320029180003PH333200702150001 30 100.0020080205 D5208232002918000120070726200707260580T1502 ... (3 Replies)
Discussion started by: jclanc8
3 Replies

7. Shell Programming and Scripting

To extract last column of file

Hi, I need to extract last column of each row of a file (may be 'cut' should do). And I don't know the number of last column. (2 Replies)
Discussion started by: DivyaG
2 Replies

8. Shell Programming and Scripting

How to extract only first column from the file

Dear All, I have a file name pointer.unl. It's contains the information below: O|A|4560333089|PBS|AU1|01/04/2003|30/04/2006|D|IGCD| O|A|4562222089|PBN|AU1|01/02/2006|31/01/2008|D|04065432| O|A|3454d00089|PKR|AU1|01/03/2008||R|sdcdc| I only need to extract first... (11 Replies)
Discussion started by: selamba_warrior
11 Replies

9. UNIX for Dummies Questions & Answers

Extract column data from File

I have a file containing the lines similar to the following entries: File1.txt: ..... -rw-r--r-- 1 root staff 4110 Aug 7 17:02 XXX_OrderNum1_date1_time1.txt -rw-r--r-- 1 root staff 4110 Aug 7 17:02 XXX_OrderNum2_date2_time1.txt -rw-r--r-- 1 root staff ... (3 Replies)
Discussion started by: sudheshnaiyer
3 Replies

10. Shell Programming and Scripting

I need to extract last column of a file and compare the values

Hi, I am new to unix and I need help in solving below mentioned issue, really appreciate ur help. I have a file sam, john, 2324, 07142007 tom, thomson, 2343, 07142007 john, scott, 2478, 07142007 its a comma delimited file, I need to extract the last column from each line and this... (4 Replies)
Discussion started by: vukkusila
4 Replies
Login or Register to Ask a Question