How to target certain delimiter to split text file?
Hi, all.
I have an input file. I would like to generate 3 types of output files.
Input:
Output_file_1 (replace the last occurrence of delimiter with tab):
Output_file_2 (replace the first occurrence of delimiter with tab):
Output_file_3 (replace the second occurrence of delimiter with tab):
I have tried a few codes. But these codes involved separate commands
To generate the first output file:
To generate the second output file:
To generate the third output file:
Is there any improved one-liner commands to generate the above output files?
Thanks.
---------- Post updated at 04:48 AM ---------- Previous update was at 04:46 AM ----------
Quote:
Originally Posted by Don Cragun
Is this a homework assignment?
This is not an assignment. I am learning linux by myself. I thought that I might face the similar situation in the future. I have come out with a few solutions. But they are rather complicated.
This is not an assignment. I am learning linux by myself. I thought that I might face the similar situation in the future. I have come out with a few solutions. But they are rather complicated.
This is OK. We want to help people help themselves. This is why we ask for what they have done - even if didn't work - to show them where they have gone wrong.
Further, we have a special forum for "Homework and Coursework" because we do help students alike. The difference is that special rules apply there and we (try to) help in a different way so that the stdent takes the most education out of our help. This was the background of Don Craguns and my questions.
Quote:
Originally Posted by huiyee1
To generate the first output file:
Notice that you do not need "cat" to generate a stream usually. If you look at the man page of "rev" you will notice (this is taken from an AIX man page, yours might look slightly different):
This means the following two lines do the same, but the second one uses one command ("cat") less, which is why it is preferable:
When you look up "useless use of cat" on the internet you will find many more examples for the same error, because it is a very common one, which made it part of the "UNIX culture".
Quote:
Originally Posted by huiyee1
Is there any improved one-liner commands to generate the above output files?
As a matter of fact there are: you might want to learn a bit of sed (see "man sed" for help) and look around here in the forum. Here a link to some introductory article:
sed ("stream editor") is a non-interactive text editor or, looking at it differently, a programmable text manipulation program. The most basic procedure for this is to look out for some pattern in a text and then manipulate it (delete or add parts, etc.).
Here is a simple sed program:
It takes a file "/path/to/input", executes the program "s/abc/def/" on it and writes the result to file "/path/to/output". The program itself does a "substitution" ("s") of a fixed string "abc" by a fixed string "def". This replacement is done in every line once - for the first occurrence of "abc". It is possible to replace every occurrence instead by adding a "g" (global) to the end of the command:
It should be easy to see how you could do the text manipulation you have in mind with such a substitution, given that you craft the search- and substitution patterns correctly. Since your intention is to learn UNIX i won't tell you outright what the solution is. You might want to try yourself. If you have further questions feel free to ask.
On top of what bakunin said, you could use shell's parameter expansion (e.g. "remove matching pattern") to achieve the goals.
And, yes, there is a sed one liner to produce all three output files (at least with GNU sed).
I have a large semicolon delimited file with thousands of columns and many thousands of line. It looks like:
ID1;ID2;ID3;ID4;A_1;B_1;C_1;A_2;B_2;C_2;A_3;B_3;C_3
AA;ax;ay;az;01;02;03;04;05;06;07;08;09
BB;bx;by;bz;03;05;33;44;15;26;27;08;09
I want to split this table in to multiple files:
... (1 Reply)
Hi,
I have received a file which is 20 GB. We would like to split the file into 4 equal parts and process it to avoid memory issues.
If the record delimiter is unix new line, I could use split command either with option l or b.
The problem is that the line terminator is |##|
How to use... (5 Replies)
I have a text file with entries like
1186
5556
90844
7873
7722
12
7890.6
78.52
6679
3455
9867
1127
5642
..N so many records like this.
I want to split this file into multiple files like cluster1.txt, cluster2.txt, cluster3.txt, ..... clusterN.txt. (4 Replies)
Hi,
I have a file which has many URLs delimited by space. Now i want them to move to separate files each one holding 10 URLs per file.
http://3276.e-printphoto.co.uk/guardian http://abdera.apache.org/ http://abdera.apache.org/docs/api/index.html
I have used the below code to arrange... (6 Replies)
Hi,
I have a No Delimiter variable length text file with following schema -
Column Name Data length
Firstname 5
Lastname 5
age 3
phoneno1 10
phoneno2 10
phoneno3 10
sample data - ... (16 Replies)
Hello,
I want to split a big file into smaller ones with certain "counts". I am aware this type of job has been asked quite often, but I posted again when I came to csplit, which may be simpler to solve the problem.
Input file (fasta format):
>seq1
agtcagtc
agtcagtc
ag
>seq2
agtcagtcagtc... (8 Replies)
I’m new to Linux script and not sure how to filter out bad records from huge flat files (over 1.3GB each). The delimiter is a semi colon “;”
Here is the sample of 5 lines in the file:
Name1;phone1;address1;city1;state1;zipcode1
Name2;phone2;address2;city2;state2;zipcode2;comment... (7 Replies)
Im writing a KSH script to read a simple text file and add a delimiter. Ive written the following script but it runs very slow. I initially used the cut command to substring the input record then switched to this version using awk to substring... both run too slow. Any ideas how to make this more... (2 Replies)
Hi All,
I am new to unix scripting, please help me in solving this assignment..
I have a scenario, as follows:
1. i have a text file(read1.txt) with the following data
sairam,123
kamal,122
etc..
2. I have to write a unix... (6 Replies)