[ILUG] [OT] perl and grep
Niall O Broin
niall at linux.ie
Fri Mar 19 09:53:15 GMT 2004
On Friday 19 March 2004, Declan.Grady at nuvotem.com (Declan Grady) wrote:
>Hi,
>I've written a small perl proggy to split up ascii spoolfiles into separate
>ascii files for archiving.
>e.g. a spoolfile containing say 20 order acknowledgements will be split up
>into 20 separate ascii files, and then converted to pdfs, with a simple html
>list linking to them.
>My next problem is that to get the list of spoolfile names, i was using grep,
>since there are lots of files with the same extension...
>grep ACKNOWLEDGMENT spool*sdy | awk 'BEGIN {FS=":"}{print $1}' > filelist
>
>To have this run from within my perl script, do I just use system("grep ..);
>or is there a more clever 'perl way' to do this ?
>
>I thought of opening every sdy file and checking for the word ACKNOWLEDGMENT
>in the specific line number where it would appear, but I think this would be
>overkill - mabye not, as I guess grep would open every file anyway ?
Yes, you do need to open every file but it's not overkill - how else could you
examine their contents? But you definitely don't use system("grep ..) from
within perl - perl was designed as a text processing language, and a superset
of grep and awk. You could wrap your existing perl code in something like this
foreach $spool (@ARGV) {
open SPOOL, "<$spool";
# read entire spool file into one variable, in a block to localise $\
{
local $/; # undef the IRS to read entire file into one variable
$_ = <SPOOL>;
}
if (/ACKNOWLEDGMENT/) {
# split into an array of lines
@lines = split '\n';
# your code goes here
}
close SPOOL;
}
Observe deliberate use of perl's $_ variable here, which is the default
argument for many perl functions. If I hadn't used $_ there, I'd have had to
create a temporary variable to hold the lines, and then search and split that
variable. There is definitely an argument that using this temporary variable
makes the code clearer, but IMO using $_ saves me from having to have an
otherwise useless temporary variable. It's completely readable to a perl
person anyway :-)
At the "# your code goes here" you now have the array @lines, containing all
the lines of the spoolfile. It's time for your code to take over, and split
that up into separate files as you mentioned. Bear in mind that @lines
contains lines which don't have trailing \n (because the split removed that)
whereas if you had done something like
@lines = <SPOOL>
the lines WOULD have trailing \n
You would then run your program like
split_spools spool*sdy
Niall
More information about the ILUG
mailing list