Bash Programming Quiz

First, a couple of quick news items:

What are the Top Ten Open Source productivity applications? You might be surprised.

A good open source video editor is just around the corner. The problem is, how far away is that corner. But there is hope on the horizon.

And now, for todays Bash Programming Quiz.

Create a script that processes two input files, and outputs a single data stream consisting of alternate lines from the first two files. So, if these are the input files:

File_1:

a
b
c
d
e

File_2:

1
2
3
4
5

The resulting file would be:

a
1
b
2
c
3
d
4
e
5

Submissions written in Perl or Awk will be accepted.

Comments

  1. #1 peter
    November 13, 2008

    is there a base assumption that the files are of the same length? if not, what do you want done with the excess? placeholders for missing data, stop at the end of the shorter file, or just dump the data from the longer file at the end?

    and what, you don’t like python? or are you going for the one-liner? ;-)

  2. #2 Benjamin Franz
    November 13, 2008

    #!/usr/bin/perl

    use strict;
    use warnings;

    unless (2 == @ARGV) {
    die(“Usage: <script_name> file1 file2\n”);
    }
    my ($file1, $file2) = @ARGV;
    unless (-e $file1) {
    die(“$file1 does not appear to exist. Exiting.\n”);
    }
    unless (-e $file2) {
    die(“$file2 does not appear to exist. Exiting.\n”);
    }
    local *FILE1FH;
    open(FILE1FH, ‘<’, $file1) || die (“Failed to open $file1: $!\n”);
    local *FILE2FH;
    open(FILE2FH, ‘<’, $file2) || die (“Failed to open $file2: $!\n”);
    while(my $line1 = <FILE1FH>) {
    print $line1;
    my $line2 = <FILE2FH>;
    if (defined $line2) {
    print $line2;
    }
    }
    while(my $line2 = <FILE2FH>) {
    print $line2;
    }
    close(FILE2FH);
    close(FILE1FH);

  3. #4 dave
    November 13, 2008

    What do I win for doing it in lex?
    http://www.eskimo.com/~dj3vande/unpublished/merge.l
    Don’t let the scaffolding at the bottom scare you – the actual lex code is all in line 6 (plus the default action, which being a default doesn’t show up in the file).
    I haven’t tested its behavior with mismatched line counts in the input files.

  4. #5 kevin
    November 13, 2008

    Benjamin: cripes!
    peter: you forgot the redirection, it won’t work without it.
    all yall: try using ‘paste’, it’s a life saver:

    paste -d’\n’ file1 file2

  5. #6 blf
    November 13, 2008

    Since both kevin and peter have given the two obvious answers, here’s a not-as-obvious (and frankly, rather silly) one:

    join <(nl -ba f1) <(nl -ba f2) |
    sed -e ‘s/^[1-9][0-9]* //’

  6. #7 peter
    November 13, 2008

    sorry, I didn’t forget the redirects, the html ate the bad characters

    #!/usr/bin/env bash
    while read f1 <&7
    do
    read f2 <&8
    echo $f1
    echo $f2
    done 7<$1 8<$2

    and every time I previewed, the html entities were stripped by the previewer
    sorry!

  7. #8 Greg Laden
    November 13, 2008

    Entries in Lex, as well as Scheme and Python will of course be accepted.

    Ultimately, we want to develop a bash add-in utility called “shuffle”

    Shuffle would probably ignore extra lines by default, unless a parameter forces them to be added to the end. Make sense?

    [perhaps: if given only one file, shuffle will select a random location in the file and display the contents of that line ... i.e., it will split the deck....]

  8. #9 C. Chu
    November 13, 2008

    My thought on programming is to never do something that has already been done. Modules exist that replicate the functionality I would use to write the script. Thus:


    #!/usr/bin/perl

    use strict;
    use warnings;

    use Perl6::Slurp;
    use List::MoreUtils;

    die "usage: script.pl file1 file2" if @ARGV!=2;

    my @f1 = slurp $ARGV[0];
    my @f2 = slurp $ARGV[1];

    print zip @f1, @f2;

    So, both files are read into arrays, and the list being printed is $f1[0], $f2[0], $f1[1], $f2[1], … , $f1[n], $f2[n].

  9. #10 C. Chu
    November 13, 2008

    Whoops, zip() isn’t exported by default, so the call to List::MoreUtils should actually be:

    use List::MoreUtils qw(zip);

  10. #11 Greg Laden
    November 13, 2008

    C.Chu: Is there a memory limitation if your script uses arrays, or are they managed in perl in a way that this would not matter.

  11. #12 charfles
    November 13, 2008

    This perl one liner will do it if the files have equal number of lines:

    perl -ne 'push(@A,$_);END{$I=0;$T=scalar(@A)/2;while($I< $T){print $A[$I]; print $A[$I+++$T]}};' file1 file2 > file3


    I haven’t done Perl in forever and I’m positive this can be shortened.

  12. #13 charfles
    November 13, 2008

    Ugh, it looked fine the in the preview!

    http://pastebin.com/m3c09a05 for the above code.

  13. #14 dave
    November 13, 2008

    My awk-fu is very weak, but here’s one that at least works on a simple test:

    #!/usr/bin/awk -f
    {
      while(1)
      {
        if(!getline <ARGV[1]) break;
        print;
        if(!getline <ARGV[2]) break;
        print;
      }
    
      #Output unmatched lines at the end
      # (Only one of these will ever run)
      while(getline <ARGV[1]) print;
      while(getline <ARGV[2]) print;
    
    }
    
  14. #15 charfles
    November 13, 2008
  15. #16 C. Chu
    November 13, 2008

    Greg,

    No performance penalty, as memory is allocated dynamically in Perl. Of course Perl will be slower than a low-level language like C or Java, though, since you’re dynamically typing based on context. Pushing and popping items off the array automatically frees up space in the memory, as well.

    The worst that can happen is an “Out of memory!” error, but the only time I’ve ever had this happen is when I made an array of hashrefs of arrays of hashes of other stuff, and the file I was reading was somewhere like 100MB big (not a joke). I cleaned up a couple bugs and that same data structure worked fine. Long story short, you’ll pretty much never get an “Out of memory!” error.

  16. #17 C. Chu
    November 13, 2008

    s/pushing/unshifting

    damn, I need to hit “preview” more often.

  17. #18 C. Chu
    November 13, 2008

    s/unshifting/shifting

    sorry!

  18. #19 dave
    November 13, 2008

    C. Chu: “[...] and the file I was reading was somewhere like 100MB”
    100MB is SMALL. Any data file bigger than 100K has a better than even chance of being bigger than 100M.
    You really want lazy reading, and not slurping everything into memory.

  19. #20 Nick
    November 14, 2008

    #!/usr/bin/env python
    # Two-line version, but it stops when the shorter
    # file runs out, and it loads both files completely
    # into memory.

    for a, b in zip(open('file1', 'r'), open('file2')):
        print a, b,
    
  20. #21 Flaky
    November 14, 2008

    Here’s Python that doesn’t load files into memory.

    class Zipper(object):
       def __init__(self,f1,f2):
          self._f1=f1
          self._f2=f2
    
       def __iter__(self):
          return self
    
       def next(self):
          a=self._f1.readline()
          b=self._f2.readline()
          if not a or not b:
             raise StopIteration()
          return a,b
    
    for a,b in Zipper(open("data1.txt"),open("data2.txt")):
          print a,b,
    
  21. #22 Greg Laden
    November 14, 2008

    charfles:

    Your perl one liner is great, especailly because it is a one liner and is totally incomprehensible and undocumentable and all that great perl stuff. But I get a strange result

    If file1 is 1through 6 and file2 is athrough f, I get this as file 3:

    1
    a
    2
    b
    3
    c
    4
    d
    5
    e
    6
    f

    a

    What the heck is that a doing down there at the end, and why the extra newlines?

    (By the way I ended each line with a newline, including the last one, in the original files)

  22. #23 Jadu Saikia
    January 6, 2009

    I have this solution using awk

    $ awk ‘FNR==NR{ a[FNR]=$0;next }
    {
    print $0
    print a[FNR+l]
    }
    ‘ file2 file1

    http://unstableme.blogspot.com/2008/03/merge-alternate-lines-of-files-bashawk.html

    // Jadu

  23. #24 Tom
    December 14, 2012

    My solution:

    #!/bin/bash

    alpha=( a b c d e )
    number=( 1 2 3 4 5 )

    for (( i = 0; i < "${#alpha[@]}" && i < "${#number[@]}"; i++ ))
    do
    echo "${alpha[i]}"
    echo "${number[i]}"
    done