First, a couple of quick news items:
What are the Top Ten Open Source productivity applications? You might be surprised.
A good open source video editor is just around the corner. The problem is, how far away is that corner. But there is hope on the horizon.
And now, for todays Bash Programming Quiz.
Create a script that processes two input files, and outputs a single data stream consisting of alternate lines from the first two files. So, if these are the input files:
File_1:
a
b
c
d
e
File_2:
1
2
3
4
5
The resulting file would be:
a
1
b
2
c
3
d
4
e
5
Submissions written in Perl or Awk will be accepted.



Comments
is there a base assumption that the files are of the same length? if not, what do you want done with the excess? placeholders for missing data, stop at the end of the shorter file, or just dump the data from the longer file at the end?
and what, you don't like python? or are you going for the one-liner? ;-)
Posted by: peter | November 13, 2008 1:29 PM
#!/usr/bin/perl
use strict;
use warnings;
unless (2 == @ARGV) {
die("Usage: <script_name> file1 file2\n");
}
my ($file1, $file2) = @ARGV;
unless (-e $file1) {
die("$file1 does not appear to exist. Exiting.\n");
}
unless (-e $file2) {
die("$file2 does not appear to exist. Exiting.\n");
}
local *FILE1FH;
open(FILE1FH, '<', $file1) || die ("Failed to open $file1: $!\n");
local *FILE2FH;
open(FILE2FH, '<', $file2) || die ("Failed to open $file2: $!\n");
while(my $line1 = <FILE1FH>) {
print $line1;
my $line2 = <FILE2FH>;
if (defined $line2) {
print $line2;
}
}
while(my $line2 = <FILE2FH>) {
print $line2;
}
close(FILE2FH);
close(FILE1FH);
Posted by: Benjamin Franz | November 13, 2008 1:37 PM
quick search brought me this answer, and a lesson in redirection
#!/usr/bin/env bash
while read f1 do
read f2 echo $f1
echo $f2
done 7
called with two arguments, file 1 and file 2
see this article for the longer explanation.
thanks for the question, I hadn't known about that capability.
fyi: this script acts in the first manner I asked about, if file one is longer, then empty lines are inserted until the second file is finished, however if the first file is shorter, output is stopped when the end of the first file is reached
Posted by: peter | November 13, 2008 1:42 PM
What do I win for doing it in lex?
http://www.eskimo.com/~dj3vande/unpublished/merge.l
Don't let the scaffolding at the bottom scare you - the actual lex code is all in line 6 (plus the default action, which being a default doesn't show up in the file).
I haven't tested its behavior with mismatched line counts in the input files.
Posted by: dave | November 13, 2008 1:59 PM
Benjamin: cripes!
peter: you forgot the redirection, it won't work without it.
all yall: try using 'paste', it's a life saver:
paste -d'\n' file1 file2
Posted by: kevin | November 13, 2008 2:03 PM
Since both kevin and peter have given the two obvious answers, here's a not-as-obvious (and frankly, rather silly) one:
join <(nl -ba f1) <(nl -ba f2) |
sed -e 's/^[1-9][0-9]* //'
Posted by: blf | November 13, 2008 3:31 PM
sorry, I didn't forget the redirects, the html ate the bad characters
and every time I previewed, the html entities were stripped by the previewer sorry!Posted by: peter | November 13, 2008 5:12 PM
Entries in Lex, as well as Scheme and Python will of course be accepted.
Ultimately, we want to develop a bash add-in utility called "shuffle"
Shuffle would probably ignore extra lines by default, unless a parameter forces them to be added to the end. Make sense?
[perhaps: if given only one file, shuffle will select a random location in the file and display the contents of that line ... i.e., it will split the deck....]
Posted by: Greg Laden | November 13, 2008 5:22 PM
My thought on programming is to never do something that has already been done. Modules exist that replicate the functionality I would use to write the script. Thus:
#!/usr/bin/perl
use strict;
use warnings;
use Perl6::Slurp;
use List::MoreUtils;
die "usage: script.pl file1 file2" if @ARGV!=2;
my @f1 = slurp $ARGV[0];
my @f2 = slurp $ARGV[1];
print zip @f1, @f2;
So, both files are read into arrays, and the list being printed is $f1[0], $f2[0], $f1[1], $f2[1], ... , $f1[n], $f2[n].
Posted by: C. Chu | November 13, 2008 5:30 PM
Whoops, zip() isn't exported by default, so the call to List::MoreUtils should actually be:
use List::MoreUtils qw(zip);
Posted by: C. Chu | November 13, 2008 5:36 PM
C.Chu: Is there a memory limitation if your script uses arrays, or are they managed in perl in a way that this would not matter.
Posted by: Greg Laden | November 13, 2008 5:38 PM
This perl one liner will do it if the files have equal number of lines:
perl -ne 'push(@A,$_);END{$I=0;$T=scalar(@A)/2;while($I file3I haven't done Perl in forever and I'm positive this can be shortened.
Posted by: charfles | November 13, 2008 5:46 PM
Ugh, it looked fine the in the preview!
http://pastebin.com/m3c09a05 for the above code.
Posted by: charfles | November 13, 2008 5:49 PM
My awk-fu is very weak, but here's one that at least works on a simple test:
#!/usr/bin/awk -f { while(1) { if(!getline <ARGV[1]) break; print; if(!getline <ARGV[2]) break; print; }#Output unmatched lines at the end
# (Only one of these will ever run)
while(getline <ARGV[1]) print;
while(getline <ARGV[2]) print;
}
Posted by: dave | November 13, 2008 6:00 PM
Python:
http://pastebin.com/m28d0dbbb
Posted by: charfles | November 13, 2008 6:01 PM
Greg,
No performance penalty, as memory is allocated dynamically in Perl. Of course Perl will be slower than a low-level language like C or Java, though, since you're dynamically typing based on context. Pushing and popping items off the array automatically frees up space in the memory, as well.
The worst that can happen is an "Out of memory!" error, but the only time I've ever had this happen is when I made an array of hashrefs of arrays of hashes of other stuff, and the file I was reading was somewhere like 100MB big (not a joke). I cleaned up a couple bugs and that same data structure worked fine. Long story short, you'll pretty much never get an "Out of memory!" error.
Posted by: C. Chu | November 13, 2008 6:20 PM
s/pushing/unshifting
damn, I need to hit "preview" more often.
Posted by: C. Chu | November 13, 2008 6:23 PM
s/unshifting/shifting
sorry!
Posted by: C. Chu | November 13, 2008 6:26 PM
C. Chu: "[...] and the file I was reading was somewhere like 100MB"
100MB is SMALL. Any data file bigger than 100K has a better than even chance of being bigger than 100M.
You really want lazy reading, and not slurping everything into memory.
Posted by: dave | November 13, 2008 9:27 PM
#!/usr/bin/env python
# Two-line version, but it stops when the shorter
# file runs out, and it loads both files completely
# into memory.
Posted by: Nick | November 14, 2008 12:00 AM
Here's Python that doesn't load files into memory.
class Zipper(object): def __init__(self,f1,f2): self._f1=f1 self._f2=f2def __iter__(self):
return self
def next(self):
a=self._f1.readline()
b=self._f2.readline()
if not a or not b:
raise StopIteration()
return a,b
for a,b in Zipper(open("data1.txt"),open("data2.txt")):
print a,b,
Posted by: Flaky | November 14, 2008 2:13 AM
charfles:
Your perl one liner is great, especailly because it is a one liner and is totally incomprehensible and undocumentable and all that great perl stuff. But I get a strange result
If file1 is 1through 6 and file2 is athrough f, I get this as file 3:
1
a
2
b
3
c
4
d
5
e
6
f
a
What the heck is that a doing down there at the end, and why the extra newlines?
(By the way I ended each line with a newline, including the last one, in the original files)
Posted by: Greg Laden | November 14, 2008 7:33 AM
I have this solution using awk
$ awk 'FNR==NR{ a[FNR]=$0;next }
{
print $0
print a[FNR+l]
}
' file2 file1
http://unstableme.blogspot.com/2008/03/merge-alternate-lines-of-files-bashawk.html
// Jadu
Posted by: Jadu Saikia | January 6, 2009 12:48 PM