OK, I just want to be clear. In your example, it looks like you are pointing the script directly to the files, rather than a filelist. Is this correct?
---------- Post updated at 05:44 PM ---------- Previous update was at 05:24 PM ----------
OK, I fixed the file names, which is something I wanted to do anyway:
Then I generated new file lists, and edited the lists so that they were properly escaped:
Now we have a perfectly formed listfile! Let's concatenate using Bakunin's solution!
OK, clearly I'm missing something. I don't know why it can't find my listfile - OTlistfile.txt I tried it a couple of different ways, with a ./, etc.
Also, I thought resultfile would be created. Do I need to create an empty file for it to work with? You guys gotta remember, I really know nothing. You must explain as to an overgrown child.
A word to Bakunin as to my use of perl as a regex engine:
Possibly, I am killing flies with cannons, but when I began learning regex, I found out, much to my dismay, that every app understood regexes differently. So, I learned it for grep, (which also has the -P flag), and for perl. In other words, I use perl because I know how to write regexes for it, and I'm never clear what other apps will understand.
OK, I just want to be clear. In your example, it looks like you are pointing the script directly to the files, rather than a filelist. Is this correct?
No. alister is creating a list of values which he supposes to correspond to directory- and filenames. This is because of the way you laid out the problem in your previous posts.
First off, how to find your listfile:
The same way you searched for all the other files:
I will not really matter where it is stored. You could use full paths:
Second, here are some general tips, some of them digressing from the problem at hand to some more generalized angle:
Present your problem as concisely as possible.
Your description of the problem (the directory layout, how the files are organised, etc.) changed somewhat over the course of the thread. You didn't contradict yourself directly, but you left out critical information in your first description(s) which you gave out one at a time in your later posts.
Problems in shell scripting - and what you are attempting is shell-scripting, despite your claims it is way above your capabilities - are like any other programming problem mostly depending on a clear and precise definition. Once you have precisely defined what you want to do and how you want it to be done the solution is in most cases obvious and easy to implement. Have a look in the "Shell Programming and Scripting" forum and compare threads with many answers with the ones with few answers. One would expect the threads with many answers to be more interesting, but the opposite is the case: the ones with many answers are the ones which usually go like this:
Q: i need to produce X
A: do THIS
Q: ah, yes, fine, but i need the Xs to be different, more like Ys
A: modify THIS to be THAT to produce Ys
Q: many thanks, but my Ys should have a special quality of Z
A: *sigh* do THAT, but modify the FOO part to BAR
... rinse and repeat ad nauseam
The fifth answer was not at all more "complex" or "hard" to give than the first - it was just the realization of having come up with 4 answers completely unnecessarily that caused the sigh.
So, analyze the problem you have as exactly and meticulously as possible and you will be on the fast lane to programmers ascension. What we do is not an arcane art, but just this skill of defining problems precisely and abstractly, mixed with some common sense - trust me, i'm bakunin! ;-))
Second, you sure might want to know how alisters script works (which is, btw., based on a better idea than my own solution, so you should go with it).
Here is the short version of "Introduction to programming logic 101:
The core part is a loop, into which a "here-document" is fed. "Here-documents" are shell-constructs, which are similar to files but have fixed contents, so that they are incorporated into scripts directly. Cosider the following line:
A file "x" is read by "cat" and its contents are dumped into file "y". What exactly ends up in "y" depends on what was in "x" in first place. But if you want "y" to have a fixed content you could create a here-document replacing the file:
This says: treat everything you read until a line that reads "EOF" as the content of a (virtual) file. We could have the three lines in file "x" and used the above command to the same effect.
So lets see the relevant part of alisters script:
The core part into which this here-document is fed is this:
This takes one line at a time, fills it into a variable named "b" and does whatever is between "do" and "done". Because with every loop the variable content of "b" changes we can use it for out purposes. Let us say we want to surround the name with equal signs. We could do this:
The command echo == $b == does nothing else than print "==", than the content of variable b (this is what "$b" stands for) and then "==" again. Now, alister does something more sophisticated with "$b", but basically this is it. Let us see what he does:
He is abbreviating here, so it is not that obvious. Let us write it in the long form and it will become clearer:
The -z "$b" means: if "$b" is empty. This is true exactly one time: when the loop reads in the empty line in the middle of the document. In this case the variable "t" is filled with "$nt" (the contents of variable "nt") and the enclosing while-loop is immediately started over again ("continue").
Now, take stock: what are the various variables filled with:
$ot=path to old testament books, probably "./01_Old Testament"
$nt=path to old testament books, probably "./01_New Testament"
$t=either $ot (at start) or $nt (after the blank line is processed)
OK, on we go. What else does the while-loop do:
First, a variable "i" is set to "1". Then, there is another loop:
I have to explain something about while-loops here: the general form is
This loop will run "<command>" and if this returns 0 (=TRUE) it will run the body of the loop. The same was true when we used:
"read" is a command and it returns TRUE when there is something to read and FALSE if not - this is why the loop stops at the end of the list we feed into.
Inside this loop there is nothing special done, except for incrementing "i" by 1. This is simply counting: 1, 2, 3, 4, 5, .... Every time the command
is issued. Replacing the various variables with their content (see the list above), this is:
(after incrementing i by 1)
etc. at some point, this will give us a filename which doesn't exist. If Genesis has 51 chapters (haven't bothered to look), this would be:
This time, "cat" would return a non-zero return value, meaning "FALSE" and the loop would stop.
Now there is one last question left: without redirection "cat" will display the content of the file to the screen (try it!). We haven't used any redirection, so why does the output not land on the screen?
This is why: it is not only possible to redirect individual commands but also whole scripts. Without the last "> bible.txt" the text would indeed land on the screen. You could replace the redirection with a pipeline:
Will seend the output to "more", which will display it on screen, but pagewise (hit any key to display another page, CTRL-C to end) or
to filter only for lines with "someword" in them.
I hope this helps.
bakunin
/PS: i refrained from giving you any practical solution because i figured you are here to learn foremost and to solve your problem at hand second. I hope to have served your intentions best in enabling you to understand and write scripts yourself instead of just throwing something miraculously working at your feet.
Once you overcome your reservations i am sure you will find neverending joy in programming the shell. Don't be shy, there may be only a few chosen, but an awful lot are invited. ;-)
These 4 Users Gave Thanks to bakunin For This Post:
Boy, I've avoided scripting because it's a whole other language to learn. I barely have a grasp on the command line. But you guys have tossed me into the deep end - I may as well learn to swim.
Two things occur to me:
1.) Possibly, I have malformed, (if that's the word), text files. It is likely they were created on a Windows machine, so I will check that out, and convert them if necessary. Perhaps that is the problem? Then, I think I have a way to test whether each file ends in a newline.
2.) Forgive my ignorance, but are the variables defined correctly? How does the script know, for instance, that "ot" equals "./01_Old Testament"?
Boy, I've avoided scripting because it's a whole other language to learn. I barely have a grasp on the command line.
Scripting is nothing else than command line. You could every program written here cut-&-paste to the command line and it would work. On the other hand you could paste any command line content to a file, make it executable and you have a script. So, again: you are already scripting, like it or not.
Quote:
Originally Posted by sudon't
1.) Possibly, I have malformed, (if that's the word), text files. It is likely they were created on a Windows machine, so I will check that out, and convert them if necessary. Perhaps that is the problem? Then, I think I have a way to test whether each file ends in a newline.
Yes, this is always a problem. When you edit files intended to be used on a Unix system better stay away from Windows editors, "notepad" foremost. Use an editor under Unix instead, it has an awful abundance of them.
Quote:
Originally Posted by sudon't
2.) Forgive my ignorance, but are the variables defined correctly? How does the script know, for instance, that "ot" equals "./01_Old Testament"?
This is actually a good question, because i left it out in my explanation of the script. Let me correct that error now. When we have a look at the start of the script we see:
The first two lines use special variables which are filled by the shell automatically. When you provide command line arguments to a script the shell will use the variables "1", "2", etc., and fill these with the first, second, third, ... argument. Example:
now inside "script" "$1" would be "one", "2" would be "two" and "$3" would be "three". Have a look how alister has suggested to start his script and you will know how these two variables were filled.
A proper description of how alisters script is to be called would be:
The third line fills variable "t" with the first of these two values (you remember, it will be filled with the other upon encountering the empty line).
I think I get it now. The first argument, (the path to the target dir), automatically becomes the first variable, and so on?
Here is the difference - to me - between scripting and the command line. With the CL, you have an application, maybe a flag or two, an argument, a target file. Very simple, and you can pipe that output to another app, etc...
With a script, there are all these strange symbols whose meanings I don't understand, and formatting whose purpose I don't understand. Why is there sometimes a bracket sitting on a line by itself? Why are some lines indented, and some not? I have no clue. Scripting is an area where I'll really have to start at the beginning. Not that I don't want to learn - I just haven't, yet. ; )
Here is where I'm at now. Turns out they were dos files. I checked this by opening a couple in vim. So I used find to pipe into dos2unix:
I again used vi to check a couple of the files, and it no longer says [dos] on the bottom.
Then I wanted to see if all had an EOF newline. I'm not sure how to do this, but I opened a couple files in TextWrangler, showing invisibles, and they do seem to have a last newline. Anyway, the last lines of the files I looked at have that 'capital L laying on it's side' symbol, and the cursor will travel one line below that last line. How's that for scientific?
The way I invoked Alister's script is a little different than you show because I packed it away where I thought it should go:
But I think it's correct:
I even used the full paths, but I still end up with an empty file:
It shouldn't be a problem with the eof newline - we would simply end up some lines stuck together. Did the fact that I changed directories affect the script? It doesn't seem like that should be the case. If I understand correctly, the script expects that input from the CL. I'm not sure how to proceed.
---------- Post updated 10-31-12 at 12:02 AM ---------- Previous update was 10-30-12 at 06:38 PM ----------
I also thought that, since we have proper list files, why not go back and try earlier solutions?
Elixir Sinari's solution:
And so on, all the way through.
And Bakunin's again:
What happened here was that the first time, I had a blinking cursor as if it was working. Finally! - I thought. Then I noticed I wasn't using any processor. I looked, but cat did not seem to be a running process, but I let it go for about eight minutes, then killed it. I ended up with a 4 k empty file named 'resultfile'.
I ran it again, but it stopped itself after a couple moments, leaving me again with a 4 k file.
It occurred to me - why not just pipe find's output directly to cat? It worked with dos2unix, (and other programs), so why not?
But it only printed another file list named OT.txt
I don't know if this is useful, but here is what's in the directory, and where it's at. The two directories containing the files, and the two perfectly ordered and escaped file lists. I tossed the worthless files that were created by various attempts.
And here's a sample of what the filelist - OTfilelist.txt - looks like, in case there is a problem with it: ---------- Post updated at 11:43 AM ---------- Previous update was at 12:02 AM ----------
After much wailing and gnashing of teeth, I was struck with the inspiration that a big problem was with the file names, namely spaces. I replaced all spaces with connecting underlines:
Then I tossed in xargs:
Success! So we do the same with the New Testament directory, then concatenate those two files:
If it warms up a bit, I will make a burnt offering on the grill.
I thank everyone for their help and guidance and hope I didn't completely wear out my welcome. I really learned a lot!
---------- Post updated at 12:41 PM ---------- Previous update was at 11:43 AM ----------
Quote:
Originally Posted by elixir_sinari
When you added double quotes to $file, was that an attempt to deal with the spaces in the file names?
I think I get it now. The first argument, (the path to the target dir), automatically becomes the first variable, and so on?
Correct.
Quote:
Originally Posted by sudon't
Here is the difference - to me - between scripting and the command line. With the CL, you have an application, maybe a flag or two, an argument, a target file. Very simple, and you can pipe that output to another app, etc...
With a script, there are all these strange symbols whose meanings I don't understand, and formatting whose purpose I don't understand.
Again: all these devices in scripts work (at least in principle) on the command line too and everything you write on teh commandline could come from a script. In fact every shell is a "command language" which you either type in by hand or have typed in from a script file. A script file is basically just a file with stored commands you don't have to type again should you need them a second time.
Quote:
Originally Posted by sudon't
Why is there sometimes a bracket sitting on a line by itself?
You mean "{"? It is for command grouping. Basically everything between "{ ... }" works like a single command from the outside. It is like you could write a script, give it a name and then use this script in another script like any other command. This works the same, just inside scripts.
Quote:
Originally Posted by sudon't
Why are some lines indented, and some not?
Simple: to be easier readable (for humans). If you have a loop (for ... done or while ... done) or a branch (if ... else ... fi or case ... esac) you indent the body of this to see easily what this body of the construct is. For the shell executing the command it makes no difference at all. You can write:
and the definition of the loop as well as the body immediately will stand out. This:
is syntactically the same but is a lot harder to find out what is loop definition and what is its body.
Quote:
Originally Posted by sudon't
So I used find to pipe into dos2unix:
Very well. There is a "-exec" flag for find, which you could use. It takes a "command template" where the filename "find" has found is represented by "{}". Suppose you have a command
and you want to execute this command with every txt-file in all subdirectories of "/there/too". You would write:
"find" will find all the filenames and for each filename found that way execute the command up to "\;" (this signifies the end of the template command) with "{}" replaced by the actual filename. You might find this device extremely useful and - i can promise - once you got the hang of it you can easily outperform every filemanager there might be. Unix aficionados are not command line fetishists because they are masochists, but because they can type in seconds what you can do with a mouse in hours.
Quote:
Originally Posted by sudon't
I again used vi to check a couple of the files, and it no longer says [dos] on the bottom.
Then I wanted to see if all had an EOF newline. I'm not sure how to do this,
We have a several "vi" resources here, threads where people tried to explain its usage in general or specific aspects of it. "vi" is most times a "love on third sight". First, you think it is overly complicated, the first half year you use it you think its giving you the third degree and after this time you start missing its features in every other program you type more than two keystrokes in.
Here is how to make the unvisible characters visible in "vi":
open the file. Hit "<ESC>" (maybe repeatedly, it can't hurt) to make sure you are in command mode. Press ":". A line with ":" at the beginning will appear in the last line of the screen. Enter "set list" and press <ENTER>. The line will disappear and you will see the non-printing characters now. Press ":" again and enter "set nolist" to switch that mode off or simply leave the editor. These characters will appear and have the following meaning:
Notice that "^I" and "^M" are ONE character, as you can see when you pass over them with the cursor.
To remove the DOS line ends, you can do the following magic:
Again, from the command mode (<ESC>, you remember, will always take you there) press ":" and type:
Enter the "^M" by pressing "<CTRL>-<V>" (take the next input character verbosely) and then either "<CTRL>-<M>" or "<ENTER>". You will notice that the "^M" appears.
You can do this with "list" mode set and you will see the "^M"s at the line ends (ALL line ends!) disappear. The command says: from the first to the last line ("1,$") substitute ("s") a ^M character, followed by a line end ("/^M$/") with nothing ("//").
Quote:
The way I invoked Alister's script is a little different than you show because I packed it away where I thought it should go:
Very good. in Unix every directory has its rationale and purpose and this is even (informally) standardized (we use the word "canonical" for this type of informal standard). Indeed "/usr/local/bin" is the directory where executable files which do not belong to the OS go (OS executables go to "/usr/bin"). "/usr", btw., is for "Unix Software Resources", even if it is usually pronounced like "user".
Quote:
If it warms up a bit, I will make a burnt offering on the grill.
LOL! Well, i think christians have burned already enough things in history, so it might even work if it doesn't warm up enough. Bishop Theophilus of Alexandria is quoted to have said, while burning down the library of Alexandria, that the books in there either agree with the bible and are superfluous or disagree with the bible and are rightfully burned. I am not divinely inspired to the same degree as this venerable bishop, but i can suggest a book about (Korn) shell and shell programming. It helps best, btw., if being read instead of being burned:
You will find the read informative as well as very entertaining.
Quote:
When you added double quotes to $file, was that an attempt to deal with the spaces in the file names?
Exactly. This is the common method of dealing with blanks, because blanks are the shells default way of separating things. If one doesn't the blank (or any other) character to do that you use quoting.
Oh, could we have just used this with cat in the first place? It's kinda like how I did it in the end, isn't it?
Quote:
Originally Posted by bakunin
I hope this helps.
Indeed it does. This whole exercise has been very helpful to me. And don't worry about any books, my offerings to the gods are usually pork chops or steaks, more medium rare than burnt - I learned what the gods really appreciate, from Homer.
Last edited by sudon't; 11-01-2012 at 01:39 AM..
Reason: grammar
- Concatenate files and delete source files. Also have to add a comment.
- I need to concatenate 3 files which have the same characters in the beginning and have to remove those files and add a comment and the end.
Example:
cat REJ_FILE_ABC.txt REJ_FILE_XYZ.txt REJ_FILE_PQR.txt >... (0 Replies)
Hi
I am trying to learn linux step by step an i am wondering
can i use cat command for concatenate files but i want to place context of file1 to a specific position in file2 place of file 2 and not at the end as it dose on default?
Thank you. (3 Replies)
Hi All,
Need your help.
I will need to concatenate around 100 files but each end of the file I will need to insert my name DIRT1228 on each of the file and before the next file is added and arrived with just one file for all the 100files.
Appreciate your time.
Dirt (6 Replies)
I have a file named "file1" which has the following data
10000
20000
30000
And I have a file named "file2" which has the following data
ABC
DEF
XYZ
My output should be
10000ABC
20000DEF (3 Replies)
Hi, I want to create a batch(bash) file to combine 23 files together. These files have the same extension. I want the final file is save to a given folder. Once it is done it will delete the 23 files.
Thanks for help. Need script. (6 Replies)
I have directory structure sales_only under which i have multiple directories for each dealer
example:
../../../Sales_Only/xxx_Dealer
../../../Sales_Only/yyy_Dealer
../../../Sales_Only/zzz_Dealer
Every day i have one file produce under each directory when the process runs.
The requirement... (3 Replies)
I have 2 files
FILEA
1232342
1232342
2344767
4576823
2325642
FILEB
3472328
2347248
1237123
1232344
8787890
I want the output to go into a 3rd file and look like:
FILEC
1232342 3472328 (1 Reply)
I need a script to concatenate several files in one step, I have 3 header files say file.S, file.X and file.R, I need to concatenate these 3 header files to data files, say file1.S, file1.R, file1.X so that the header file "file.S" will be concatenated to all data files with .S extentions and so on... (3 Replies)
Hi, I'm totally new to Unix. I'm an MVS mainframer but ran into a situation where a Unix server I have available will help me. I want to be able to remotely connect to another server using FTP, login and MGET all files from it's root or home directory, logout, then login as a different user and do... (1 Reply)
Hi there,
I have numerous files in a directory (approx 2500) that I want to delete although I get the following:-
Server> rm *.*
Arguments too long
Is there a proper way of deleting this rather than breaking it down further through the list of files
rm *10.*
rm *11.*
rm *12.*
... (10 Replies)