Sponsored Content
Top Forums Shell Programming and Scripting extract lines with a given list of identifiers Post 302070888 by mskcc on Sunday 9th of April 2006 11:55:34 PM
Old 04-10-2006
extract lines with a given list of identifiers

Hi All,

My question is if the simple but powerful shell scripts can extract data from a big data file by using a list of identifier. I used to put everything in the database and do joining, which sounds stupid but only way I knew. For example, my data file looks like,

GENE13810X GENE7798X 0.982666016
GENE4333X GENE487X 0.981506348
GENE7806X GENE3731X 0.981079102
GENE13020X GENE4755X 0.980102539
GENE7521X GENE3733X 0.979614258
GENE6499X GENE233X 0.979370117
GENE12708X GENE8435X 0.979064941
GENE4114X GENE4113X 0.978820801
GENE10919X GENE10568X 0.978820801
GENE5651X GENE1342X 0.978210449
GENE7657X GENE6004X 0.977905273
NODE9X GENE3712X 0.977783203
GENE12950X NODE22X 0.977783203
NODE19X GENE34X 0.977783203
GENE7642X GENE3768X 0.977539063
GENE10831X GENE8296X 0.977294922
GENE7952X NODE10X 0.977111816
GENE3807X GENE3806X 0.976501465
GENE12393X NODE23X 0.976501465
GENE2694X NODE29X 0.976501465
NODE30X GENE11332X 0.976501465
GENE3703X GENE3702X 0.976257324
GENE9709X GENE5625X 0.976013184
GENE3526X GENE2743X 0.975769043
GENE12776X NODE3X 0.975708008
GENE11770X NODE35X 0.975708008
GENE4542X NODE24X 0.975463867
GENE5074X GENE1267X 0.975280762
GENE14374X GENE8560X 0.975219727
GENE5872X NODE36X 0.974914551
GENE8550X NODE38X 0.974914551

The given list based on first column can be,
GENE12708X
GENE4114X
GENE10919X
GENE5651X
GENE7657X
NODE9X
GENE12950X
NODE19X
GENE7642X
GENE10831X
GENE7952X

Thanks
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

extract the lines

Hi, I have a text file with 15 columns and i want to extract those lines of which 7th column is ABCD. I think we can do this using awk but could not frame the command. Please help. TIA Prvn (2 Replies)
Discussion started by: prvnrk
2 Replies

2. Shell Programming and Scripting

Print Selection of Line between two Identifiers.

I have a following containing DATA in the following format: DATA....------ --------------- -------------- DATA.....------ -------------------- ------------------ DATA....------ --------------- -------------- I want to extract the selective DATA in between identifiers and ... (4 Replies)
Discussion started by: parshant_bvcoe
4 Replies

3. Shell Programming and Scripting

List of common identifiers

Hi all, I have 4 file and I want to find the common identifier in each file. For example: FILE1 goat door bear cat FILE2 goat moose dog cat FILE3 goat yak tiger (6 Replies)
Discussion started by: phil_heath
6 Replies

4. UNIX for Dummies Questions & Answers

unix: extract a specific list of lines from a file

I would like to extract specific lines from a file and output them into another file. Each line in the file has a unique ID, and I have a specific list of IDs (that are not consecutive) that I wish to extract. for example: 1 aaaaaa bbbcb cccccc 2 aaaaaa bbbbb cccccd 3 aaaaaa bbbab... (6 Replies)
Discussion started by: mert2481
6 Replies

5. Shell Programming and Scripting

Extracting few lines from a file based on identifiers dynamically

i have something like this in a file called mysqldump.sql -- -- Table structure for table `Table11` -- DROP TABLE IF EXISTS `Table11`; /*!40101 SET @saved_cs_client = @@character_set_client */; /*!40101 SET character_set_client = utf8 */; CREATE TABLE `Table11` ( `id` int(11) NOT NULL... (14 Replies)
Discussion started by: vivek d r
14 Replies

6. Shell Programming and Scripting

Extracting lines based on identifiers into multiple files respectively

consider the following is the contents of the file cat 11.sql drop procedure if exists hoop1 ; Delimiter $$ CREATE PROCEDURE hoop1(id int) BEGIN END $$ Delimiter ; . . . . drop procedure if exists hoop2; Delimiter $$ CREATE PROCEDURE hoop2(id int) BEGIN END $$ (8 Replies)
Discussion started by: vivek d r
8 Replies

7. UNIX for Dummies Questions & Answers

Extract lines with specific words with addition 2 lines before and after

Dear all, Greetings. I would like to ask for your help to extract lines with specific words in addition 2 lines before and after these lines by using awk or sed. For example, the input file is: 1 ak1 abc1.0 1 ak2 abc1.0 1 ak3 abc1.0 1 ak4 abc1.0 1 ak5 abc1.1 1 ak6 abc1.1 1 ak7... (7 Replies)
Discussion started by: Amanda Low
7 Replies

8. Shell Programming and Scripting

Search for a pattern,extract value(s) from next line, extract lines having those extracted value(s)

I have hundreds of files to process. In each file I need to look for a pattern then extract value(s) from next line and then search for value(s) selected from point (2) in the same file at a specific position. HEADER ELECTRON TRANSPORT 18-MAR-98 1A7V TITLE CYTOCHROME... (7 Replies)
Discussion started by: AshwaniSharma09
7 Replies

9. Shell Programming and Scripting

ksh sed - Extract specific lines with mulitple occurance of interesting lines

Data file example I look for primary and * to isolate the interesting slot number. slot=`sed '/^primary$/,/\*/!d' filename | tail -1 | sed s'/*//' | awk '{print $1" "$2}'` Now I want to get the Touch line for only the associate slot number, in this case, because the asterisk... (2 Replies)
Discussion started by: popeye
2 Replies

10. Shell Programming and Scripting

Merging two tables including multiple ocurrence of column identifiers and unique lines

I would like to merge two tables based on column 1: File 1: 1 today 1 green 2 tomorrow 3 red File 2: 1 a lot 1 sometimes 2 at work 2 at home 2 sometimes 3 new 4 a lot 5 sometimes 6 at work (4 Replies)
Discussion started by: BSP
4 Replies
cldump(1)							   User Manuals 							 cldump(1)

NAME
cldump - Clarion database extractor SYNOPSIS
cldump [options] database.dat DESCRIPTION
cldump extracts the data contained in a Clarion database; Clarion is a Windows IDE similar to Delphi or others, and has its own (simple) database format. cldump can extract the data contained in such a database, and export it to CSV, SQL (including the database schema, keys and indexes) or its own format (this format will give you all the meta information, but isn't easily parsable). A Clarion database consists in a set of files : .DAT files contain the data, .Kxx files contain the key/index data, .MEM files contain the memo entries associated to the data. OPTIONS
-x n, --decrypt n Decrypt an encrypted file. Required argument n indicates the location where the key will be retrieved. Valid values are in the range 1 - 4 inclusive. n = 1 usually works. Decryption happens in-place so KEEP A BACKUP as there is no guarantee the decryption process won't fail. Encrypted files must be decrypted before they can be dumped. Note that only the data file and the memo file are decrypted in this process; key/index files are left untouched as cldump doesn't use them. -d, --dump-active Dump active entries only -D, --dump-data Dump the actual data (active and deleted entries) -m, --dump-meta Dump meta information (no SQL or CSV output format exist for this option) -f c, --field-separator c Set the field separator to character c. Only valid for CSV output (see below). -c, --csv Dump data or schema in CSV format -S, --sql Dump data or schema in SQL format -s, --schema Dump database schema -M, --mysql Use MySQL specific construct (backticks, ...) -n, --no-memo Do not dump memo entries -U[charset], --utf8[=charset] Transcode strings and memos from charset to UTF-8 (charset defaults to ISO8859-1; for the list of supported charsets, see iconv --list) OUTPUT
cldump outputs the data to stdout or stderr depending on the output format selected, the data to extract and the type of the data (data, meta data). BUGS
The SQL output could be improved. Not all the types supported by the Clarion database format are implemented yet (due to lack of test data- bases using these types of data); see the source code for details. Please report bugs to jb@jblache.org; if possible, please send patches as the set of test databases I have is very limited. AUTHOR
cldump was written by Julien BLACHE <jb@jblache.org>. HOMEPAGE
http://www.technologeek.org/projects/cldump/ Linux November 2010 cldump(1)
All times are GMT -4. The time now is 08:54 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy