seds to extract fields based on positions


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting seds to extract fields based on positions
# 1  
Old 03-02-2011
seds to extract fields based on positions

Hi
My file has a series of rows up to 160 characters in length.

There are 7 columns for each row.

In each row, column 1 starts at position 4
column 2 starts at position 12
column 3 starts at position 43
column 4 starts at position 82
column 5 starts at position 86
column 6 starts at position 90 and
column 7 starts at position 90, and is variable in length

Are there sed commands which will allow me to strip out columns 2, 3 & 7, leaving a new file with just columns 1, 4, 5, and 6 ?

Many thanks

---------- Post updated at 12:09 PM ---------- Previous update was at 12:06 PM ----------

I just noticed an error in my original post : column 7 start in position 98.
# 2  
Old 03-02-2011
It could be done. Post your sample data and the expected output format to get an idea.
# 3  
Old 03-02-2011
seds to extract fields based on character positions

Thanks Michael

from input :
Code:
   8022098 abcdefgh                       abcdefgh DRYING AREAS                  COM CMP 04/02/11 Job has actual completion date
   8028237 050 abcdefg LANE               ETN- RADIATOR HANGING OFF WALL, LEAKIN        CMP CMP 04/02/11 Invalid status change
   8046881 027 abcdefg ROAD               BUILDING SERVICES - REPAIR REQUIRED FO         CNN CNO 04/02/11 Invalid status change
   8117944 027 abcdef STREET              JNR-HEAVY WATER INGRESS AT BACK BEDROO        CNO ALO 04/02/11 Invalid status change

output should be :
Code:
8022098,COM,CMP,04/02/11
8028237,CMP,CMP,04/02/11
8046881,CNN,CNO,04/02/11
8117944,CNO,ALO,04/02/11

Many thanks again

---------- Post updated at 12:36 PM ---------- Previous update was at 12:31 PM ----------

I have noticed that the input data has got squashed up in the display.
I can confirm, however, that the character positions in my earlier post are accurate !

Last edited by Franklin52; 03-02-2011 at 07:53 AM.. Reason: Please use code tags
# 4  
Old 03-02-2011
By the time you'd noted that the spaces were truncated I'd written a Perl based solution as the issue seemed complex enough to warrant it when they weren't fixed length fields, it should work regardless of spaces used
Code:
~/$ cat test.dat
8022098 abcdefgh abcdefgh DRYING AREAS COM CMP 04/02/11 Job has actual completion date
8028237 050 abcdefg LANE ETN- RADIATOR HANGING OFF WALL, LEAKIN CMP CMP 04/02/11 Invalid status change
8046881 027 abcdefg ROAD BUILDING SERVICES - REPAIR REQUIRED FO CNN CNO 04/02/11 Invalid status change
8117944 027 abcdef STREET JNR-HEAVY WATER INGRESS AT BACK BEDROO CNO ALO 04/02/11 Invalid status change
~/$ perl -e 'while(<>){@record=split(/\s+/,$_); for ($i=0;$i < $#record;$i++){$date_index=$i if $record[$i] =~m{^\d{2}\/\d{2}\/\d{2}$};}print "$record[0],$record[$date_index - 2],$record[$date_index -1 ],$record[$date_index]\n";}'  test.dat
8022098,COM,CMP,04/02/11
8028237,CMP,CMP,04/02/11
8046881,CNN,CNO,04/02/11
8117944,CNO,ALO,04/02/11

# 5  
Old 03-02-2011
Or the sed/tr solution Smilie:
Code:
echo "   8022098 abcdefgh                       abcdefgh DRYING AREAS                  COM CMP 04/02/11 Job has actual completion date" | sed 's/^.\{3\}\(.\{8\}\).\{70\}\(.\{16\}\).*/\1\2/' | tr ' ' ','

Assuming you have spaces around your elements ...
# 6  
Old 03-02-2011
Bug seds to extract fields based on positions

Hi everyone
I went with Dahus' solution and it works great.
Many thanks to all contributions.
It is nice to know that there are so many warm-hearted and genuine colleagues who are willing to help novices.
# 7  
Old 03-02-2011
Quote:
Originally Posted by malts18
It is nice to know that there are so many warm-hearted and genuine colleagues who are willing to help novices.
I hope you understood why it works and how you can enhance the sed expression Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Filter lines based on values at specific positions

hi. I have a Fixed Length text file as input where the character positions 4-5(two character positions starting from 4th position) indicates the LOB indicator. The file structure is something like below: 10126Apple DrinkOmaha 10231Milkshake New Jersey 103 Billabong Illinois ... (6 Replies)
Discussion started by: kumarjt
6 Replies

2. Shell Programming and Scripting

awk sort based on difference of fields and print all fields

Hi I have a file as below <field1> <field2> <field3> ... <field_num1> <field_num2> Trying to sort based on difference of <field_num1> and <field_num2> in desceding order and print all fields. I tried this and it doesn't sort on the difference field .. Appreciate your help. cat... (9 Replies)
Discussion started by: newstart
9 Replies

3. UNIX for Dummies Questions & Answers

Filling positions based on frequency

I have files with hundreds of sequences with frequency values reported as "Freq X" and missing characters represented by a dash ("-"), something like this >39sample Freq 4 TAGATGTGCCCGTGGGTTTCCCGTCAACACCGGATAGTAGCAGCACTA >22sample Freq 15 T-GATGTCGTGGGTTTCCCGTCAACACCGGCAAATAGTAGCAGCACTA... (12 Replies)
Discussion started by: Xterra
12 Replies

4. Shell Programming and Scripting

Join based on positions

I have two text files as shown below cat file1.txt Id leng sal mon 25671 34343 56565 5565 44888 56565 45554 6868 23343 23423 26226 6224 77765 88688 87464 6848 66776 23343 63463 4534 cat file2.txt Id number 25671 34343 76767 34234 23343 23423 66776 23343 (4 Replies)
Discussion started by: halfafringe
4 Replies

5. Shell Programming and Scripting

Sort based on positions in flat file

Hello, For example: 12........6789101112..............20212223242526..................50 ( Positions) LName FName DOB (Lastname starts from 1 to 6 , FName from 8 to 15 and date of birth from 21 to29) CURTIS KENNETH ... (5 Replies)
Discussion started by: duplicate
5 Replies

6. Linux

Problem in matching 2 fields with in consisten positions

hii i have a file that contains lines like this 4829:71370 1:N:0:CGATGT + chr6 126912761 GAAGGCATAGCCCGTTGGGCTGTGGTCATCAGCCTC CCCFFFFFHGHHHJHIJJJHIJIGHCGIIJJJJIJI 0 4829:71370 2:N:0:CGATGT + chr7 89349071 AGCCCTGCCCCCACCCCCCATTCTTCTTGACTGTCT C@@FFFFFHHHGHJ JIJIJIIIIJJJJJJJJIIJIJ 0 Now i... (4 Replies)
Discussion started by: anurupa777
4 Replies

7. Shell Programming and Scripting

Manipulate fields with AWK whose positions are changable

Hello Friends, I've been working with lots of different CDR-EDR files, before testing i need to manipulate my test files rather than requesting new files(to prepare them for next tests) which are different kind of CDRs,EDRs. In order to do this i might have to change more than a field in a... (3 Replies)
Discussion started by: EAGL€
3 Replies

8. Shell Programming and Scripting

Extract text between two character positions

Greetings. I need to extract text between two character positions, e.g: all text between character 4921 and 6534. The text blocks are FASTA-format sequence of whole chromosomes, so basically a million A, T, G, C, combinations. E.g: >Chr_1 ACCTGTTCAACTCTCAGGACTCTCAGGTCAACTCTCAG... (3 Replies)
Discussion started by: Twinklefingers
3 Replies

9. Shell Programming and Scripting

Filling positions based on consensus character

I have files with hundreds of sequences with missing characters represented by a dash ("-"), something like this I need to go sequence by sequence and if a dash is found, it should be replaced with the most common character in that particular position. Thus, in my example the dash in the second... (6 Replies)
Discussion started by: Xterra
6 Replies
Login or Register to Ask a Question