How to target certain delimiter to split text file?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to target certain delimiter to split text file?
# 1  
Old 06-24-2015
How to target certain delimiter to split text file?

Hi, all.

I have an input file. I would like to generate 3 types of output files.

Input:
Code:
LG10_PM_map_19_LEnd_1000560
LG10_PM_map_6-1_27101856
LG10_PM_map_71_REnd_20597718
LG12_PM_map_5_chr_118419232
LG13_PM_map_121_24341052
LG14_PM_1a_456799
LG1_MM_scf_5a_opt_abc_9029993

Output_file_1 (replace the last occurrence of delimiter with tab):
Code:
LG10_PM_map_19_LEnd	1000560
LG10_PM_map_6-1	27101856
LG10_PM_map_71_REnd	20597718
LG12_PM_map_5_chr	118419232
LG13_PM_map_121	24341052
LG14_PM_1a	456799
LG1_MM_scf_5a_opt_abc	9029993

Output_file_2 (replace the first occurrence of delimiter with tab):
Code:
LG10	PM_map_19_LEnd_1000560
LG10	PM_map_6-1_27101856
LG10	PM_map_71_REnd_20597718
LG12	PM_map_5_chr_118419232
LG13	PM_map_121_24341052
LG14	PM_1a_456799
LG1	MM_scf_5a_opt_abc_9029993

Output_file_3 (replace the second occurrence of delimiter with tab):
Code:
LG10_PM	map_19_LEnd_1000560
LG10_PM	map_6-1_27101856
LG10_PM	map_71_REnd_20597718
LG12_PM	map_5_chr_118419232
LG13_PM	map_121_24341052
LG14_PM	1a_456799
LG1_MM	scf_5a_opt_abc_9029993


Thanks in advance.
# 2  
Old 06-24-2015
Is this a homework assignment?
# 3  
Old 06-24-2015
And, if i may be so bold to add, what have you tried so far?

bakunin
# 4  
Old 06-24-2015
I have tried a few codes. But these codes involved separate commands

To generate the first output file:
Code:
cat input | rev | cut -d"_" -f1 | rev > last_field  #this generates file containing the last field
cat input | rev | cut -d"_" -f2- | rev > without_last_field #this generates file containing all fields except the last one
paste -d"\t" without_last_field last_field > output_1

To generate the second output file:
Code:
cat input | cut -d"_" -f1 > first_field  #this generates file containing the first field
cat input | cut -d"_" -f2-> without_first_field #this generates file containing all fields except the first one
paste -d"\t" first_field without_first_field > output_2

To generate the third output file:
Code:
 cat input | cut -d"_" -f1,2 > first_second_field  #this generates file containing the first and second field
cat input | cut -d"_" -f2-> without_first_second_field #this generates file containing all fields except the first and second field
paste -d"\t" first_second_field without_first_second_field > output_3

Is there any improved one-liner commands to generate the above output files?

Thanks.

---------- Post updated at 04:48 AM ---------- Previous update was at 04:46 AM ----------

Quote:
Originally Posted by Don Cragun
Is this a homework assignment?
This is not an assignment. I am learning linux by myself. I thought that I might face the similar situation in the future. I have come out with a few solutions. But they are rather complicated.

Last edited by huiyee1; 06-24-2015 at 06:59 AM..
# 5  
Old 06-24-2015
Quote:
Originally Posted by huiyee1
This is not an assignment. I am learning linux by myself. I thought that I might face the similar situation in the future. I have come out with a few solutions. But they are rather complicated.
This is OK. We want to help people help themselves. This is why we ask for what they have done - even if didn't work - to show them where they have gone wrong.

Further, we have a special forum for "Homework and Coursework" because we do help students alike. The difference is that special rules apply there and we (try to) help in a different way so that the stdent takes the most education out of our help. This was the background of Don Craguns and my questions.

Quote:
Originally Posted by huiyee1
To generate the first output file:
Code:
cat input | rev | cut -d"_" -f1 | rev > last_field  #this generates file containing the last field

Notice that you do not need "cat" to generate a stream usually. If you look at the man page of "rev" you will notice (this is taken from an AIX man page, yours might look slightly different):

Code:
rev Command

Purpose

       Reverses characters in each line of a file.

Syntax

       rev [ File ... ]

This means the following two lines do the same, but the second one uses one command ("cat") less, which is why it is preferable:

Code:
cat /path/to/file | rev
rev /path/to/file

When you look up "useless use of cat" on the internet you will find many more examples for the same error, because it is a very common one, which made it part of the "UNIX culture".


Quote:
Originally Posted by huiyee1
Is there any improved one-liner commands to generate the above output files?
As a matter of fact there are: you might want to learn a bit of sed (see "man sed" for help) and look around here in the forum. Here a link to some introductory article:

Regular expression introduction

sed ("stream editor") is a non-interactive text editor or, looking at it differently, a programmable text manipulation program. The most basic procedure for this is to look out for some pattern in a text and then manipulate it (delete or add parts, etc.).

Here is a simple sed program:

Code:
sed 's/abc/def/' /path/to/input > /path/to/output

It takes a file "/path/to/input", executes the program "s/abc/def/" on it and writes the result to file "/path/to/output". The program itself does a "substitution" ("s") of a fixed string "abc" by a fixed string "def". This replacement is done in every line once - for the first occurrence of "abc". It is possible to replace every occurrence instead by adding a "g" (global) to the end of the command:

Code:
sed 's/abc/def/g' /path/to/input > /path/to/output

It should be easy to see how you could do the text manipulation you have in mind with such a substitution, given that you craft the search- and substitution patterns correctly. Since your intention is to learn UNIX i won't tell you outright what the solution is. You might want to try yourself. If you have further questions feel free to ask.

I hope this helps.

bakunin
# 6  
Old 06-24-2015
On top of what bakunin said, you could use shell's parameter expansion (e.g. "remove matching pattern") to achieve the goals.
And, yes, there is a sed one liner to produce all three output files (at least with GNU sed).
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Shell script to Split matrix file with delimiter into multiple files

I have a large semicolon delimited file with thousands of columns and many thousands of line. It looks like: ID1;ID2;ID3;ID4;A_1;B_1;C_1;A_2;B_2;C_2;A_3;B_3;C_3 AA;ax;ay;az;01;02;03;04;05;06;07;08;09 BB;bx;by;bz;03;05;33;44;15;26;27;08;09 I want to split this table in to multiple files: ... (1 Reply)
Discussion started by: trymega
1 Replies

2. UNIX for Advanced & Expert Users

How to split large file with different record delimiter?

Hi, I have received a file which is 20 GB. We would like to split the file into 4 equal parts and process it to avoid memory issues. If the record delimiter is unix new line, I could use split command either with option l or b. The problem is that the line terminator is |##| How to use... (5 Replies)
Discussion started by: Ravi.K
5 Replies

3. Shell Programming and Scripting

Split a text file into multiple text files?

I have a text file with entries like 1186 5556 90844 7873 7722 12 7890.6 78.52 6679 3455 9867 1127 5642 ..N so many records like this. I want to split this file into multiple files like cluster1.txt, cluster2.txt, cluster3.txt, ..... clusterN.txt. (4 Replies)
Discussion started by: sammy777
4 Replies

4. Shell Programming and Scripting

Split file into multiple files using delimiter

Hi, I have a file which has many URLs delimited by space. Now i want them to move to separate files each one holding 10 URLs per file. http://3276.e-printphoto.co.uk/guardian http://abdera.apache.org/ http://abdera.apache.org/docs/api/index.html I have used the below code to arrange... (6 Replies)
Discussion started by: vel4ever
6 Replies

5. Shell Programming and Scripting

Shell script to put delimiter for a no delimiter variable length text file

Hi, I have a No Delimiter variable length text file with following schema - Column Name Data length Firstname 5 Lastname 5 age 3 phoneno1 10 phoneno2 10 phoneno3 10 sample data - ... (16 Replies)
Discussion started by: Gaurav Martha
16 Replies

6. Shell Programming and Scripting

split file by delimiter with csplit

Hello, I want to split a big file into smaller ones with certain "counts". I am aware this type of job has been asked quite often, but I posted again when I came to csplit, which may be simpler to solve the problem. Input file (fasta format): >seq1 agtcagtc agtcagtc ag >seq2 agtcagtcagtc... (8 Replies)
Discussion started by: yifangt
8 Replies

7. Shell Programming and Scripting

Help- counting delimiter in a huge file and split data into 2 files

I’m new to Linux script and not sure how to filter out bad records from huge flat files (over 1.3GB each). The delimiter is a semi colon “;” Here is the sample of 5 lines in the file: Name1;phone1;address1;city1;state1;zipcode1 Name2;phone2;address2;city2;state2;zipcode2;comment... (7 Replies)
Discussion started by: lv99
7 Replies

8. Shell Programming and Scripting

Adding a delimiter to a text file

Im writing a KSH script to read a simple text file and add a delimiter. Ive written the following script but it runs very slow. I initially used the cut command to substring the input record then switched to this version using awk to substring... both run too slow. Any ideas how to make this more... (2 Replies)
Discussion started by: lock
2 Replies

9. UNIX for Dummies Questions & Answers

extract fields from text file using delimiter!!

Hi All, I am new to unix scripting, please help me in solving this assignment.. I have a scenario, as follows: 1. i have a text file(read1.txt) with the following data sairam,123 kamal,122 etc.. 2. I have to write a unix... (6 Replies)
Discussion started by: G.K.K
6 Replies
Login or Register to Ask a Question