In a first step I'd start to extract everything within the <body>...</body>-tags (the "content" in a narrower sense) and put that into into one document, like that:
Here is a script that should do that. Note that it might fail because to fully "understand" HTML as a language it would have to use a recursive parser, which is too much effort to put it into a casual solution here. It should provide a starting point for you, though.
Note that you might want to refine especially the parameter handling, right now it uses the bare minimum. The same goes for error handling (mistyped file names, ...). Call it like:
Hi ,
I want to join 2 files based on 2 column join condition.
a11
john 2230 5000
a12
XXX 2230 A B 200 345
Expected O/P
John 2230 5000 A B 200
I have tried this
awk 'NR==FNR{a=$1;next}a&&sub($1,a)' a11 a12 > a13 (3 Replies)
Earlier I was unable to edit a line in a file because it was too large. I ended up spliting the file(using split command), which produced multiple files (newfileaa newfilebb ....).
Now that I have made my edit, I would like to rejoin the files to original form.
How can I do this ?
Thanks in... (5 Replies)
Hello,
My apologies if this has been posted elsewhere, I have had a look at several threads but I am still confused how to use these functions. I have two files, each with 5 columns:
File A: (tab-delimited)
PDB CHAIN Start End Fragment
1avq A 171 176 awyfan
1avq A 172 177 wyfany
1c7k A 2 7... (3 Replies)
i have two files and i want to join the contents like:
file a has content
my name is
i am
i work at
and file b has
John sims
43 years old
maximu ltd
and i want to join the two files to get a third file with content reading
my name is John sims
i am 43 years old
i work at... (2 Replies)
Hi,
I have 20 tab delimited text files that have a common column (column 1). The files are named GSM1.txt through GSM20.txt. Each file has 3 columns (2 other columns in addition to the first common column).
I want to write a script to join the files by the first common column so that in the... (5 Replies)
I have file1.txt
BGE179W1
BGE179W2
BGE179W3
BGE187W1
BGE187W2
BGE187W3
BGE194W1
BGE194W2
BGE194W3
BGE227W1
BGE227W2
BGE227W3
BGE288W1
BGE288W2
BGE288W3
BGE650W1
---------- Post updated at 12:41 AM ---------- Previous update was at 12:39 AM ---------- (5 Replies)
Hi,
I have two files Files, FileA and FileB which are attached.Each row in the files have 8 tab delimited columns. The two files have to be compared and joined based on first two columns. The resulting file FileC should have:
1. if the data in the first two columns is same in both the... (3 Replies)
Hello,
This post is already here but want to do this with another way
Merge multiples files with multiples duplicates keys by filling "NULL" the void columns for anothers joinning files
file1.csv:
1|abc
1|def
2|ghi
2|jkl
3|mno
3|pqr
file2.csv:
1|123|jojo
1|NULL|bibi... (2 Replies)
Discussion started by: yjacknewton
2 Replies
LEARN ABOUT OSX
textutil
TEXTUTIL(1) BSD General Commands Manual TEXTUTIL(1)NAME
textutil -- text utility
SYNOPSIS
textutil [command_option] [other_options] file ...
DESCRIPTION
textutil can be used to manipulate text files of various formats, using the mechanisms provided by the Cocoa text system.
The first argument indicates the operation to perform, one of:
-help Show the usage information for the command and exit. This is the default command option if none is specified.
-info Display information about the specified files.
-convert fmt Convert the specified files to the indicated format and write each one back to the file system.
-cat fmt Read the specified files, concatenate them, and write the result out as a single file in the indicated format.
fmt is one of: txt, html, rtf, rtfd, doc, docx, wordml, odt, or webarchive
There are some additional options for general use:
-extension ext Specify an extension to be used for output files (by default, the extension will be determined from the format).
-output path Specify the file name to be used for the first output file.
-stdin Specify that input should be read from stdin rather than from files.
-stdout Specify that the first output file should go to stdout.
-encoding IANA_name | NSStringEncoding
Specify the encoding to be used for plain text or HTML output files (by default, the output encoding will be UTF-8).
NSStringEncoding refers to one of the numeric values recognized by NSString. IANA_name refers to an IANA character set name
as understood by CFString. The operation will fail if the file cannot be converted to the specified encoding.
-inputencoding IANA_name | NSStringEncoding
Force all plain text input files to be interpreted using the specified encoding (by default, a file's encoding will be deter-
mined from its BOM). The operation will fail if the file cannot be interpreted using the specified encoding.
-format fmt Force all input files to be interpreted using the indicated format (by default, a file's format will be determined from its
contents).
-font font Specify the name of the font to be used for converting plain to rich text.
-fontsize size Specify the size in points of the font to be used for converting plain to rich text.
-- Specify that all further arguments are file names.
There are some additional options for HTML and WebArchive files:
-noload Do not load subsidiary resources.
-nostore Do not write out subsidiary resources.
-baseurl url Specify a base URL to be used for relative URLs.
-timeout t Specify the time in seconds to wait for resources to load.
-textsizemultiplier x
Specify a numeric factor by which to multiply font sizes.
-excludedelements (tag1, tag2, ...)
Specify which HTML elements should not be used in generated HTML (the list should be a single argument, and so will usually
need to be quoted in a shell context).
-prefixspaces n Specify the number of spaces by which to indent nested elements in generated HTML (default is 2).
There are some additional options for treating metadata:
-strip Do not copy metadata from input files to output files.
-title val Specify the title metadata attribute for output files.
-author val Specify the author metadata attribute for output files.
-subject val Specify the subject metadata attribute for output files.
-keywords (val1, val2, ...)
Specify the keywords metadata attribute for output files (the list should be a single argument, and so will usually need to be
quoted in a shell context).
-comment val Specify the comment metadata attribute for output files.
-editor val Specify the editor metadata attribute for output files.
-company val Specify the company metadata attribute for output files.
-creationtime yyyy-mm-ddThh:mm:ssZ
Specify the creation time metadata attribute for output files.
-modificationtime yyyy-mm-ddThh:mm:ssZ
Specify the modification time metadata attribute for output files.
EXAMPLES
textutil -info foo.rtf
displays information about foo.rtf.
textutil -convert html foo.rtf
converts foo.rtf into foo.html.
textutil -convert rtf -font Times -fontsize 10 foo.txt
converts foo.txt into foo.rtf, using Times 10 for the font.
textutil -cat html -title "Several Files" -output index.html *.rtf
loads all RTF files in the current directory, concatenates their contents, and writes the result out as index.html with the HTML title set to
"Several Files".
DIAGNOSTICS
The textutil command exits 0 on success, and 1 on failure.
CAUTIONS
Some options may require a connection to the window server.
HISTORY
The textutil command first appeared in Mac OS X 10.4.
Mac OS X September 9, 2004 Mac OS X