I'm working with a multi column tab separated csv file. All columns are small except one, that one includes all the notes and is an html encoded script. The max length for that column is 8192 (database limitation)
Here is a sample input, you already know the other possibilities of <IMG >
If you look at the 8K byte sample input in your posting (not the abbreviated sample in the quote above), you will note that <IMG never appears, <img never appears, and src= never appears. Also note that your 8K byte database field length limited data ends in the middle of a SPAN tag.
If you want us to "fix" fields that have been thrown away by truncation when you put your data into a database, how do you expect us to recreate them from the truncated data?
Showing us a sample of data that contains absolutely nothing that you say you want changed doesn't help. What we need is an accurate description of the input that is to be processed. In particular:
What type of system are you using? (I.e., What is the output from the command uname -a.)
What is the value of LINE_MAX on your system. (I.e., What is the output from the command: getconf LINE_MAX.)
What is the maximum length of your HTML code? We know it is more than 8Kb, but how much more?
Is the total length of any text in your input file before the start of the first tag in your HTML file (i.e., before the 1st < in a line in your TSV [Tab Separated Values] file), the total length of any data from the start of one tag in your HTML to the start of the next tag (e.g. a paragraph of text between tags), or the total length of any text after the start of the last tag in your HTML to the end of your TSV file ever greater than (LINE_MAX - 2) bytes?
If the HTML field was removed from your TSV, would the remaining content still have lines longer than LINE_MAX bytes?
Are there any tags other than <IMG.*> (with mixed case I, M, and G) that contain the string src=" in your HTML? If so, what are they?
Can the strings <IMG (mixed case) or src= appear in any other fields in your TSV (i.e., not in the field that contains the HTML)?
Which field (or fields) in your TSV contain the HTML code that you want processed?
This is a complex problem made much harder by your refusal to accurately describe your input.
Note that Scrutinzer gave an excellent suggestion for a way of approaching this problem. (It is missing a double quote in the output after src= and it adds an unwanted < at the end of file, but those are trivial details given the other problems that need to be addressed.) I have something based on Scrutinizer's suggestion that may come close to what you need, but until I get answers to the above questions, there is no need to guess at whether or not it might work.
This User Gave Thanks to Don Cragun For This Post:
I was using the following code to replace the path names and it works when it is
echo "$PWD/$f" | sed -e 's/^.*chris\.domain\.com/chris.domain.com/'
IN fact it works great
However I tried to incorporate a variable
echo "$PWD/$f" | sed -e... (3 Replies)
hii,
i need a unix command which replaces all occurrences of a substring within a string with another substring.
My solution:
string="plalstalplal"
sub1="al"
sub2="mlkl"
echo sed 's/$s2/$s3/g' < s1 > p
i want to know how to read the variables s2 and s3..
thaks a lot
bye (1 Reply)
Hi Guys,
I am needing some help writing a shell script to replace the following in a text file
/opt/was/apps/was61
with some other path eg
/usr/blan/blah/blah.
I know that i can do it using sed or perl but just having difficulty writing the escape characters for it
All Help... (3 Replies)
Hi there,
I'd like to replace STRING_ZERO in FILE_ZERO.txt with the value of VALUEi-th by using something like that:
VALUE1=1000
VALUE2=2000
VALUE3=3000
for((i=1;i<=3;i++));
do
sed "s/STRING_ZERO/$VALUE'$i'/" FILE_ZERO.txt >> FILE_NEW.txt;
done
but it doesn't work...
Any help... (9 Replies)
I have following set of dirs:
/dir1/dir2/subdir1
file1
file2
/dir1/dir3/subdir1
file4
file5
/dir1/dir4/subdir1
file6
file7
All of these files have a common string in them say "STRING1", How can I... (3 Replies)
Help!
I'm trying this command but keep getting illegal syntax etc.
awk '{ sub(/00012345/,"000123456"); print}' >newfile
I don't understand. It works on one unix machine but not another! (4 Replies)
Dear all,
I have a file like below. I want to replace all the '.' in the 3rd column with 'NA'. I don't know how to do that. Anyone has an iead? Thanks a lot!
8 70003200 21.6206
9 70005700 17.5064
10 70002200 .
11 70005100 19.1001
17 70008000 16.1970
32 70012400 26.3465
33... (9 Replies)
Hi All,
I have below requirement:
I need to read each line in file.txt and replace string starting from position 9 to 24 {111111111111111,222222222222222,333333333333333} by common string "444444444444444" and save file.
File.txt:
03000003111111111111111 ... (3 Replies)
Hi All,
I am facing an issue... I need to replace some string in a text file while the same file is read by some other user at the same time. The other user is using it in the Read only mode. So I can't create a temporary file and write the content first and then write it back into the original... (2 Replies)
Hi All,
I have a many folders in a directory under which there are many subdirectories containing text files containing the word "shyam" in them.I want all the files in all the directories containing "shyam to "ram" ??
sed "s/shyam/ram/g" does it ??But anyone can help me with the script ??
... (3 Replies)