[KLUG Members] Parsing a large file

Bert members@kalamazoolinux.org
Wed, 11 Feb 2004 11:09:17 +0100


This is a multi-part message in MIME format.
--------------090402000005090906070702
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit

I wasn't following this, but I see you have it fixed already.
In your orginal question it wasn't clear how you wanted to devide your 
file. If you want to split it into smaller files with the same number of 
columns just use split.

I had a simular problem once and wrote a little perl script for it. I 
added it here, maybe you can use it. I wrote the script to manipulate 
the columns in a file.
The script takes two arguments, a file containing the rows you want and 
in which order. Each on a new line.
eg, you want row 34, 45 and 82 ordered as 45, 82, 34 (becomming row 
1,2,3 ! ); than just put 45, 82 and 34 each on there own line in a 
'column-file'.
The second argument is the file you want to parse. Output is on standard 
output.
The separator needs to be a 'real-tab' put in with vi because the print 
statement just prints a \t as two characters.

Bert.



Andrew Eidson wrote:

>I still had to do some manipulation in Access but yes I am importing the
>final text files into the program I am working with Now..
>
>-----Original Message-----
>From: members-admin@kalamazoolinux.org
>[mailto:members-admin@kalamazoolinux.org]On Behalf Of Robert G. Brown
>Sent: Sunday, February 08, 2004 4:11 PM
>To: members@kalamazoolinux.org
>Subject: Re: [KLUG Members] Parsing a large file
>
>
>On Fri, 06 Feb 2004 13:49:53 -0500, "Andrew Eidson" <aeidson@meglink.com>
>wrote:
>
>  
>
>>I am trying to parse a rather cumbersome file (355 columns , over 1000 rows
>>tab delimited) I have tried importing it into MSSQL but keep getting an
>>error.. so does anyone know of any scripts that may parse this file into 2
>>or even 3 seperate files??
>>    
>>
>
>Andrew,
>  Was this ever resolved? Did you get the file loaded?
>
>						Regards,
>						---> RGB <---
>
>_______________________________________________
>Members mailing list
>Members@kalamazoolinux.org
>
>
>_______________________________________________
>Members mailing list
>Members@kalamazoolinux.org
>
>
>  
>


--------------090402000005090906070702
Content-Type: application/x-perl;
 name="parts.pl"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="parts.pl"

#!/usr/bin/perl

#--------------------------------------------------------
#- BertO. juni 2002
#--------------------------------------------------------
#- syntax: parts.pl <kolom-info> <bestand>
#--------------------------------------------------------
my @kolommen;

	$columns   = shift;
	$file      = shift;

	#input sep.
	$scheider  = '\t';

	#output sep.
	$separator = '	';

	#------------------------------------------------
	#-lees de kolommen in
	#------------------------------------------------

	open IN,"<$columns";
	while (<IN>) {
		push (@kolommen, $_);
	}
	close IN;

	#------------------------------------------------
	#-open bestand
	#------------------------------------------------
	open IN, "<$file";
	while (<IN>) {
		(@row) = (split /$scheider/);
		$count = @kolommen;
		$teller = 0;
		foreach $k (@kolommen) {
			print $row[$k-1];
			print $separator if (++$teller < $count);
		}
		print "\n";
	}
			
	close IN;
	#------------------------------------------------


--------------090402000005090906070702--