Newbie help - parsing through a file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Newbie help - parsing through a file
# 1  
Old 05-09-2017
Newbie help - parsing through a file

Hello guys,

I am a newbie to all of this - I'd like some help with a file I have. It's a ~100mb CSV file with approximately 30 columns.

What I'd like to do is to search through the file and REMOVE any lines with a certain case insensitive string in any of the columns:

So my file looks like this:

Code:
1, Mike Smith, 12, Philly
2, John Smith, Right, New York
3, Tommy, $@, Atlanta
4, Nate New, $@, Atlanta

I'd like to search through this file, and remove any line with the word "new" in it, so my final file would look like this:

Code:
1, Mike Smith, 12, Philly
3, Tommy, $@, Atlanta

Moderator's Comments:
Mod Comment edit by bakunin: please use CODE-tags also for data. Thank you.

Last edited by bakunin; 05-10-2017 at 01:41 AM..
# 2  
Old 05-10-2017
Perl
Code:
perl -ne 'print unless /new/i' lokhtar.example

Gnu sed
Code:
sed '/new/Id' lokhtar.example

sed
Code:
sed '/[Nn][Ee][Ww]/d' lokhtar.example

AWK
Code:
awk 'BEGIN{IGNORECASE=1} ! /new/' lokhtar.example

grep
Code:
grep -iv new lokhtar.example

Ruby
Code:
ruby -pe 'next if /new/i' lokhtar.example


Last edited by Aia; 05-10-2017 at 12:46 AM..
# 3  
Old 05-10-2017
Quote:
Originally Posted by Lokhtar
I'd like to search through this file, and remove any line with the word "new" in it, so my final file would look like this:
Aia has already given you a rather extensive collection of how to tackle this in various script languages and text processing programs. Still, you fall here for one thing most newbies don't take into account. Hence, in the hope to make you aware of a problem you may have already now or maybe only in other similar problems, here it goes:

Quote:
Originally Posted by Lokhtar
with the word "new" in it
Your problem is the lack of a definition of what a "word" constitutes. Take, for example, Aias grep-solution:

Code:
grep -iv new lokhtar.example

What this does is to search for lines containing the (-i, case insensitive) sequence n-e-w and filter these lines out (-v). Consider the following lines:

Code:
new
bla
Newell

The command will filter out line 1 and 3 but chances are you might only want it to filter line 1. This is because grep doesn't deal with "words" on an instinctive level like you do, it deals with characters and sequences of characters. And if you want to make it understand what "word" means, you need to tell it.

Here are a few (naive) tries and why they will not always do what they are supposed to do:

1) we could start by adding empty space (blanks or tabs) before and after the word we search for. Instead of "new" we could search for "<blank-or-tab>new<blank-or-tab>". This will work in the middle of a line, but fail if the word is the first or last in a line.

2) look at the following sentences, all containing the word "new" and neither as last nor first word - and still the pattern from 1) would fail to recognize them:

Code:
This is new, this is different!
Something new: a word followed by a colon.
Should composites like "new-old" be considered?
Is "new" in quotes still considered the word we look for?

Bottom line: you will have to answer for yourself what exactly you consider to be "the word 'new'" before you can construct an accordig pattern you can search for - whatever you decide can be phrased as regular expression - but you need to decide first, what your decision is.

I hope this helps.

bakunin
This User Gave Thanks to bakunin For This Post:
# 4  
Old 05-10-2017
Thank you guys - you guys are amazing!! Now I just need an inverse of that too - e.g. remove any lines that does NOT have "new" in it. Any one of those scripting/shell languages will be fine.

As for Bakunin, that's very helpful - thank you. I want to keep any line with the letters "new" whether it is a whole word or part of a word, eg if it's "newton", I'd want that line kept.

Thank you again guys for your amazing help!
# 5  
Old 05-10-2017
Quote:
Originally Posted by Lokhtar
Now I just need an inverse of that too - e.g. remove any lines that does NOT have "new" in it.
This is amazingly easy: just remove the -v option from the grep:

Code:
grep -iv new  /some/file       # output all lines NOT having "new" in them
grep -i  new  /some/file       # output all lines having "new" in them

In general: look at the man page of commands you are not sure about:

Code:
man grep         # displays the man page for grep

Unlike Windozw, where "help" is trying to tell you things you don't want to know using methods you detest to reach goals you are not after in first place (the usual modus operandi) the man pages of UNIX systems are for reference - they will not teach you things you don't know, but if you need a detail you can be sure to find it there. This goes especially for options to commands and what they do.

I hope this helps.

bakunin
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. OS X (Apple)

Newbie needs to find file

I need to find a file using Applescript. Applescript is so slow. Someone on the Apple forums gave me some unix code and it works for the most part. The ls command is really list and not find but when it works, it returns the path to the file instantly, NOT 45 seconds Applescript takes. here is... (11 Replies)
Discussion started by: sbrady
11 Replies

2. Shell Programming and Scripting

Looking for help with parsing file contents in bash [newbie]

Hi I'm just messing around with bash and trying to learn it because I have a course next semester dealing with OS design where we need to know how to use SSH client and either bash or ksh. I've never done shell scripting before. I just started today and I was wondering how parsing files... (1 Reply)
Discussion started by: mehungry
1 Replies

3. Shell Programming and Scripting

Newbie.. Find if a file exists and open, if not create the desired file..

Hey all, I'm brand new to script writing, I'm wanting to make a script that will ask for a file and then retrieve that file if it exists, and if it doesn't exist, create the file with the desired name, and I'm completely stuck.. so far.. #! bin/bash echo "Enter desired file" read "$file" if ... (5 Replies)
Discussion started by: Byrang
5 Replies

4. Shell Programming and Scripting

Parsing Array - Newbie

Hello, I'm a newbie to the world of programming and so i decided to learn perl. I'm working on a project that telnets into a Cisco router and eliminates a specific line but i can't seem to get the thing to work. I dump the output into an array but i'm having a difficult time looking for the... (1 Reply)
Discussion started by: xmaverick
1 Replies

5. UNIX for Dummies Questions & Answers

UNIX newbie NEWBIE question!

Hello everyone, Just started UNIX today! In our school we use solaris. I just want to know how do I setup Solaris 10 not the GUI one, the one where you have to type the commands like ECHO, ls, pwd, etc... I have windows xp and I also have vmware. I hope I am not missing anything! :p (4 Replies)
Discussion started by: Hanamachi
4 Replies

6. Shell Programming and Scripting

Parsing of file for Report Generation (String parsing and splitting)

Hey guys, I have this file generated by me... i want to create some HTML output from it. The problem is that i am really confused about how do I go about reading the file. The file is in the following format: TID1 Name1 ATime=xx AResult=yyy AExpected=yyy BTime=xx BResult=yyy... (8 Replies)
Discussion started by: umar.shaikh
8 Replies

7. Shell Programming and Scripting

Parsing file, yaml file? Extracting specific sections

Here is a data file, which I believe is in YAML. I am trying to retrieve just the 'addon_domains" section, which doesnt seem to be as easy as I had originally thought. Any help on this would be greatly appreciated!! I have been trying to do this in awk and mostly bash scripting instead of perl... (3 Replies)
Discussion started by: Rhije
3 Replies

8. UNIX for Dummies Questions & Answers

Script for parsing details in a log file to a seperate file

Hi Experts, Im a new bee for scripting, I would ned to do the following via linux shell scripting, I have an application which throws a log file, on each action of a particular work with the application, as sson as the action is done, the log file would vanish or stops updating there, the... (2 Replies)
Discussion started by: pingnagan
2 Replies

9. Shell Programming and Scripting

Finding & Moving Oldest File by Parsing/Sorting Date Info in File Names

I'm trying to write a script that will look in an /exports folder for the oldest export file and move it to a /staging folder. "Oldest" in this case is actually determined by date information embedded in the file names themselves. Also, the script should only move a file from /exports to... (6 Replies)
Discussion started by: nikosey
6 Replies

10. UNIX for Dummies Questions & Answers

Newbie question about difference between executable file and ordinary file

Hi, I am newbie in unix and just started learning it. I want to know what is the difference between an executable file and a file (say text file). How to create executable file? What is the extension for that? How to differentiate ? How does it get executed? Thanks (1 Reply)
Discussion started by: Balaji
1 Replies
Login or Register to Ask a Question