I have about a million tables stored as .txt files in a directory. The content of the files look like this:
Other examples,
My aim is to change the format of text in each file to a tabular form. The output that I expect is:
Note that above is an example with just two columns, there can be tables more than two columns, like the one below:
I have written a bash script, which does the following in a for loop for each file:
1. Count number of times CONTINUE occurs in a file:
2. Save text above CONTINUE in separate temp files using this command, which does not seem to work:
3. Delete DONE from the "temp_file"'s using this command:
4. Paste the contents in another file but in a same order as the columns occur in the main file, which I am finding it difficult to do too. I think I need another for loop here:
5. Delete all blank lines from output_file
The above process, as I suspect, will be too slow too on millions of files. I also wrote a Python program, but that is also not giving the desired output. I am using Linux and BASH.
Unfortunately, your files' structure doesn't seem to be clear nor understandable to me. How would the first CONTINUE in file2 influence the desired output? And what should be the output of file3?
I agree with RudiC that the leading CONTINUE line makes no sense in your 1st sample input file. The following code silently ignores that line. Maybe something like:
will give you something close to what you want???
If your sample input files are named 1.txt, 2.txt, and 3.txt, respectively, running the above code produces the output:
and changes the contents of 1.txt to: 2.txt to:
and 3.txt to:
If someone else would like to try this code on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk.
This User Gave Thanks to Don Cragun For This Post:
This
would work on several files, and it would distribute the multiple "columns" differently... remove the quotes around the redirection operator if happy and want to produce new result files.
The table data has been obtained from web pages, which are very noisy. In the first table, there are two columns where the first few strings could not be obtained in both columns. This could be because there might be some non-ASCII characters or characters other than English. But the second table i.e. the second example above, consists of three columns separated by CONTINUE, where the first column data might consist of non-ASCII characters or something else, but the HTML structure suggested the web page parsing program that there are three columns. I can understand that this is a bit tricky. The trick is to consider each CONTINUE to separate one column in a file The output that I expect from file3 is:
pre.cjk { font-family: "Nimbus Mono L",monospace; }p { margin-bottom: 0.1in; line-height: 120%; }
Hello to all,
I'd like to tabulate the content of the XML file below, I've been trying with AWK putting the Top node (<KIND>) as RS and so far I'm able to print the first and second field of each record, but I don't know how to continue with the following nodes that have more inner nodes.
The... (4 Replies)
Hi,
I have a csv file from which i am fetching few columns as below:
IFILE=/home/home1/Report1.csv
OFILE=/home/home1/`date +"%m%d%y%H%M%S"`.dat
if #Checks if file exists and readable
then
awk -F "," '(NR>4) {print $1,$6,$2,$3,$4,$5,$6}' ${IFILE} >> ${OFILE}
fi
cat $OFILE | mail... (7 Replies)
Hello All,
I have following data into my file named record.
Name City phone number email
Jhon Newyork 123456987 jhon@gmail.com
Maria Texas 569865612 Maria_Sweet@rediffmail.com
Chan Durben NA Chan123@gmail.com
The output should be in straight columns.. There should not be any... (1 Reply)
Hello All,
I have following data into my file named record.
Name City phone number email
Jhon Newyork 123456987 jhon@gmail.com
Maria Texas 569865612 Maria_Sweet@rediffmail.com
Chan Durben NA Chan123@gmail.com
|---------------------------------------------------------------|
|Name ... (2 Replies)
Hi all ,
am using unix aix ..
Actually i have a table called table 1
in that table
year period startdate enddate
2013 1 26/03/2012 29/04/2012
2013 2 30/04/2012 27/05/2012
2013 3 28/05/2012 28/06/2012
2013 4 25/06/2012 ... (10 Replies)
Hi.
I have 2 create 2 temporary tables.the data will be same with same cols..but after creating 2 tables..i have to merge data in file and send..however the query is after merging data no duplicates shud be present..and only 1 record for a entity must be present..
for eg:
table1 has foll cols... (3 Replies)
Hi All,
in bash I have a text file which is something like
7.96634E-07 1.0000 0.00000E+00 0.0000 0.00000E+00 0.0000 1.59327E-06 0.7071
2.23058E-05 0.1890 6.61207E-05 0.1098 1.13919E-04 0.0865 1.47377E-04 0.0747
....
....
0.00000E+00 0.0000 0.00000E+00 0.0000 ... (6 Replies)
hi again :$
i am creating a script to be able to monitor a machine's performance.
the code in the file is:
================q==============
sar 2 3 |awk '{print $3}'
vmstat 2 3 |awk '{print $19 " " $21}'
iostat -cx 2 3 |awk '{print $8 " " $10 " " $13}'
================uq=============
the... (4 Replies)