Help with sed and replacing white spaces with commas


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Help with sed and replacing white spaces with commas
# 1  
Old 03-13-2010
Help with sed and replacing white spaces with commas

Dear all,

I am in a bit of a quandary. I have 400 text files which I need to edit and output in a very specific way.

Here is a sample text file copied from gedit ... The columns will come out a bit messed up but when I cat <file>, it gives a table with six columns (0-28, tot lob vol, vcsf, scsf, gray, white):

Code:
     tot lob vol        Vcsf           Scsf         gray          white
  0                      0.000000       0.000000       0.000000       0.000000
  1                      0.000000       0.000000       0.000000       0.000000
  2                      0.000000       0.000000       0.000000       0.000000
  3   26355.514477       0.000000    7986.582205   13035.582840    5333.349432
  4   61027.455763       0.000000    7617.445767   26799.098600   26610.911396
  5   16966.834205       0.000000    2342.000090    9508.623764    5116.210351
  6   16675.247439       0.000000    2196.206707    9037.121760    5441.918973
  7   57706.261817       0.000000   16951.324271   24216.177530   16538.760016
  8   91215.992021    2102.113105    7351.708891   41168.535796   40593.634229
  9   56586.444555       0.000000    6690.985687   29733.578183   20161.880685
 10   24258.571350       0.000000    4461.691120   13908.275147    5888.605082
 11   99730.945990    2593.261027   10288.256465   52107.175509   34742.252989
 12    8293.678905     967.819905      55.835764    4209.396188    3060.627049
 13   13760.413773    2139.336947      21.713908    6338.393179    5260.969738
 14   25516.944025       0.000000    7529.556139   10519.871485    7467.516401
 15   30830.647540     451.856088    3489.735233   14415.966999   12473.089220
 16   26945.925979       0.000000    7874.910677   12887.721466    6183.293835
 17   65250.293894       0.000000    7736.355264   29207.274410   28306.664221
 18   17500.375947       0.000000    2880.711810    9932.561970    4687.102167
 19   17848.832473       3.101987    2632.552860    9684.403021    5528.774605
 20   51296.522939       0.000000   14967.086667   21813.171698   14516.264575
 21   92378.203103    3180.570541    8504.614013   40641.198027   40051.820521
 22   58887.084820      14.475939    8116.865654   29876.269579   20879.473649
 23   25229.493242       0.000000    3892.993527   14250.527699    7085.972016
 24   90686.586261    1761.928544   10282.052492   48120.088380   30522.516844
 25   10664.630873    1746.418610     201.629147    5669.398010    3047.185106
 26   12702.636249    2146.574917      26.883886    5388.151200    5141.026246
 27   30967.134963       0.000000    9628.567257   13284.775786    8053.791920
 28   34047.407929     150.963361    4818.419611   15476.846510   13601.178447

If you've read that far, I thank you already.

Basically, I need to do the following:
1. Delete the first four rows
2. Delete the first column (numbers 1-28)
3. Paste each row to the end of the previous row, so one file comes out as one super long line
4. Comma-delimit all numbers
5. Add new first field at the beginning of the line containing the file name
6. Output all of this to a new text file which will take each of the 400 or so text files I have as above, convert them each to their own line consisting of steps 1-5.

So, an example output file from just the above file would be:
Code:
file1.txt, 26355.514477, 0.000000, 7986.582205, 13035.582840, 5333.349432, ..., 34047.407929, 150.963361, 4818.419611, 15476.846510, 13601.178447 <carriage return>
file2.txt, etc....

I know zippo about programming but pasting together various snippets I've found online have come up with the following which does not seem to working exactly right ... my output is joining some numbers, particularly those from the end of one line and the beginning of another.

Code:
#!/bin/sh

#script file placed in the directory above that which contains all the text files
#script run by typing ../script.sh from within the directory containing the files

files=`ls -l *_stats.txt` 
# all files end in _stats.txt

for file in ${files}
do

  cat ${file} | sed '1,4d'| awk '{print $2, $3, $4, $5, $6}' | sed 's/[:space:]+/,/g' >> output.txt  

#read file, delete lines 1-4, print all but the first column, delete all whitespace #between numbers including carriage returns and put them all on one line with #comma-delimiters; then new line with the output of the next file  


done


Any help would be most appreciated. It is sad, but I have spent HOURS trying to figure this out and I am pretty sure it would be ridiculously simple for anyone with basic bash scripting knowledge. Please help!

Best wishes,
Anthonz

Last edited by Franklin52; 03-13-2010 at 10:41 AM.. Reason: Please use code tags!
# 2  
Old 03-13-2010
For removing the first 4 rows you can use the sed command as follows
Code:
sed '1,4d' input_file

For removing the first field which contains the numbers (0-28) you can use the sed command as follows
Code:
sed 's/\(^[ ]*[0-9]\{1,2\}\)//g'

Quote:
3. Paste each row to the end of the previous row, so one file comes out as one super long line
I think you want to join all the lines

Heres a sed solution for this
Code:
sed '$!N;s/\n/ /'

Quote:
4. Comma-delimit all numbers
Code:
tr -s ' ' | sed 's/ /,/g'

Here why I am using tr -s means it contains repeated space characters. I am squeezing the repeated characters
into single space and I am converting the space to comma character.

Last edited by thillai_selvan; 03-13-2010 at 05:07 AM..
# 3  
Old 03-13-2010
Use this ,

Code:
files=`ls *_stats.txt`
for file in ${files}
do
        content=`cat ${file} | sed '1,4d'| sed  -r "s/^[0-9]+\s(.+$)/\1/g" | sed -r "s/\s+/,/g" | tr '\n' '\0'`
        echo "$file,$content" >> output.txt

done

# 4  
Old 03-13-2010
Just modifying your code a little bit...

Code:
files=$(ls -1 *_stats.txt)
for file in ${files}
do
cat ${file} | sed '1,4d' | awk  '{printf ("'${file}'"==pfile?","$2","$3","$4","$5","$6:"\n""'${file}'"","$2","$3","$4","$5","$6)}{pfile="'${file}'"}' >> output.txt
done


Last edited by malcomex999; 03-13-2010 at 06:32 AM..
# 5  
Old 03-13-2010
Many thanks for your thoughtful replies. I will give them a try after some sleep and post the results.
cheers
# 6  
Old 03-13-2010
Hi, antonz:

Welcome to the forums. The following generates output identical to your sample desired output.
Code:
for f in *_stats.txt; do
    printf '%s, ' "$f";
    sed '1,4d' "$f" | cut -d' ' -f2- | paste -sd' '  - | sed 's/ /, /g'
done > output.txt

or
Code:
for f in *_stats.txt; do
    printf '%s, ' "$f";
    sed -n '1,4d;s/[^ ]* //;H;5h;${x;s/[ \n]/, /gp;}' "$f"
done > output.txt

Safely prepending the filename to each line using sed is problematic, hence the lingering printf.

Cheers,
Alister

Last edited by alister; 03-13-2010 at 12:12 PM..
# 7  
Old 03-13-2010
Another way:
Code:
awk 'FNR < 5{next}
FNR==5{printf("%s%s", f?"\n":"", FILENAME); f=1}
{$1="";printf $0}
END{printf "\n"}
' ORS=", " OFS=", " *_stats.txt > output.txt

Regards
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Replacing white spaces in filename

Hi; In following code find LOG_DIR -type f | while read filename; do echo $filename; done I want to precede each white space encountered in filename with \ so that when i use $filename for running some commands in do...done,it wont give me an error. will appreciate ur help in this.... (1 Reply)
Discussion started by: ajaypadvi
1 Replies

2. Shell Programming and Scripting

How to Use Sed Command to replace white spaces with comma from between two fields - Mayank

SHELL SCRIPT Hi I have a file in the following format Mayank Sushant Dheeraj Kunal ARUN Samir How can i replace the white space in between and replace them with a comma?? The resultant output should be Mayank,Sushant Dheeraj,Kunal ARUN,Samir i tried using sed -e... (8 Replies)
Discussion started by: mayanksargoch
8 Replies

3. UNIX for Dummies Questions & Answers

Inserting commas and replacing backslashes with commas

Hi, Newbie here. I have a file that consists of data that I want to convert to a csv file. For example: Jul 20 2008 1111 / visit home / BlackBerry8830/4.2.2 Profile/MIDP-2.0 Configuration/CLOC-1.1 VendorID/105 Jul 21 2008 22222 / add friend / BlackBerry8830/4.2.2 Profile/MIDP-2.0... (3 Replies)
Discussion started by: kangaroo
3 Replies

4. Linux

How do i remove commas(,) & spaces

Hey guys, I am very much new to shell scripts. So you ppl may feel that i am asking stupid question here. :D 1. I am using command line argument as an input variable. The user gets this value in his mail from client which has commas n spaces (Eg. 12,34,56,789) and the scripts... (5 Replies)
Discussion started by: anushree.a
5 Replies

5. Shell Programming and Scripting

replacing commas with tilde in csv file.

hello all, i have a comma delimited file. i want to replace the commas in the file with the tilde symbol using sed. how can i do this? thanks. (4 Replies)
Discussion started by: femig
4 Replies

6. Programming

Removing empty spaces and adding commas

I have a file which contains numbers as follows: 1234 9876 6789 5677 3452 9087 4562 1367 2678 7891 I need to remove the empty spaces and add commas between the numbers like: 1234,9876,6789,5677,3452, 9087,4562,1367,2678,7891 Can anyone tell me the command to do... (4 Replies)
Discussion started by: jazz
4 Replies

7. Shell Programming and Scripting

Two or more white spaces in string

Hi, Can anybody suggest me how to combine two strings with two or more white spaces and assign it to a variable? E.g. first=HAI second=HELLO third="$first $second" # appending strings with more than one white spaces echo $third this would print HAI HELLO Output appears... (2 Replies)
Discussion started by: harish_oty
2 Replies

8. Shell Programming and Scripting

trimming white spaces

I have a variable that calls in a string from txt file. Problem is the string comes with an abundance of white spaces trailing it. Is there any easy way to trim the tailing white spaces off at the end? Thanks in advance. (9 Replies)
Discussion started by: briskbaby
9 Replies

9. Shell Programming and Scripting

delete white spaces

hi all... i have the next question: i have a flat file with a lot of records (lines). Each record has 10 fields, which are separated by pipe (|). My problem is what sometimes, in the first record, there are white spaces (no values, nothing) in the beginning of the record, like this: ws ws... (2 Replies)
Discussion started by: DebianJ
2 Replies

10. UNIX for Dummies Questions & Answers

deleting white spaces

How would I delete white spaces in a specified file? Also, I'd like to know what command I would use to take something off a regular expression, and put it onto another. ie. . . . expression1 <take_off> . . . expression2 (put here) . . . Any help would be great, thanks! (10 Replies)
Discussion started by: cary530
10 Replies
Login or Register to Ask a Question