complex problem

PERL

Re: complex problem

Postby Willem » Sat, 25 Sep 2010 23:17:44 GMT


) I have a hundred of files that have predefined columns but unknown number of 
) rows:
<snip>
) While using regular expression to extract the information is easy, and then 
) building the table by associative array is also simple, the main problem 
) falls into padding the previous columns with no records.

Easy.  Do two passes over the data.
First pass finds max number of columns, second pass does all the padding.


SaSW, Willem
-- 
Disclaimer: I am in no way responsible for any of the statements
            made in the above text. For all I know I might be
            drugged or something..
            No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

complex problem

Postby ela » Sun, 26 Sep 2010 11:52:38 GMT

I have a hundred of files that have predefined columns but unknown number of 
rows:

for example, a row from file1 is like this (delimited by \t)

chr19   56837617    56841944 
SIGLEC14(NM_001098612,expression:8.88665993183852)  0   -   56837617 
56841944    255,12,12   1   4327    0

what I have to do is to build a big table containing information:

Name\tExpression_file1\tExpression_file2\t ... 
Expression_file99\tExpression_file100\n
NM_001098612\t8.88665993183852\t ...\n
...

While using regular expression to extract the information is easy, and then 
building the table by associative array is also simple, the main problem 
falls into padding the previous columns with no records. While the current 
column can be safely padded, the previous columns will have to look up more 
and more. Recursion seems to be the solution after my trial on using 
"for-loops", but the recursion routine appears more difficult than I 
previously thought... 



Re: complex problem

Postby Xho Jingleheimerschmidt » Sun, 26 Sep 2010 11:53:43 GMT



Word wrap makes this rather difficult to read.


Why is that a problem?  If all the files are passed in on @ARGV, and 
each file of input is turned into a new column in the output, then you 
already know what all the columns in the output are going to be, right 
up front.



I don't understand how recursion could plausibly be useful here.

Anyway, what I often find myself doing is using two hashes.

my %exp;
my %sample;
while (<>) {
   my ($refseq,$expression,$sample)=parse_however($_);
   $exp{$refseq}{$sample}=$expression;
   $sample{$sample}=();
};

Now %sample contains an entry for every sample/tissue/file which has at 
least one second-level entry in %exp.

Of course you could have reversed the nesting order of the keys, 
$exp{$sample}{$refseq}, but I assume that would be inconvenient for 
other reasons, or you would have done it already.


Xho

Re: complex problem

Postby Xho Jingleheimerschmidt » Mon, 27 Sep 2010 07:40:23 GMT



Sorry, but it seems self-evident to me, so I don't see how I can explain 
it.  Maybe I'm not correctly apprehending what the purpose is that you 
have in mind.

Xho

Re: complex problem

Postby ela » Mon, 27 Sep 2010 12:56:49 GMT

> Anyway, what I often find myself doing is using two hashes.

sorry but would you mind elaborating why the second hash can do the purpose? 
I can't quite follow it...




Re: complex problem

Postby Tad McClellan » Tue, 28 Sep 2010 00:44:44 GMT





Please put the subject of your article in the Subject of your article.


-- 
Rest In Peace: 
Jonah McClellan gave his life for his country in a
helicopter crash in Afghanistan on September 21,2010.
Please pray for his wife and three children.

Re: complex problem

Postby Xho Jingleheimerschmidt » Tue, 28 Sep 2010 03:32:48 GMT







OK, so you have two hashes, one of them multi-level.

The multilevel one is $base{$id}{$file}=$nucleotide, so it contains all 
of the data.

The other one is just $file{$file}=(), so it tells you every file, i.e. 
every column that needs to exist in the output, so that you can reserve 
space for them all, even if a given $id doesn't have data for given $file.

Once you are done reading all the files, you'd probably want something like:

my @file=sort keys %file; # might want a non-default sort method.

to put that data into a more convenient format for using.

Then:
foreach my $id (keys %base) {
   my @output = @{$id{$base}}{@file};
   defined $_ or $_='n/a' foreach @output;
   print join ("\t", $id, @output), "\n";
};


I've changed the names of the hashes from the previous post, because you 
changed the nature of the data contained in them from your previous example.

Xho

Re: complex problem

Postby Xho Jingleheimerschmidt » Tue, 28 Sep 2010 03:33:28 GMT





If he attempted to do that, he would be accused to committing the XYZ 
problem.

Xho

Re: complex problem

Postby ela » Tue, 28 Sep 2010 11:38:06 GMT







Let me give a simple example:

File 1
ID character
1 A
2 T
3 G

File 2
ID character
1 A
3 T

File 3
ID character
2 A
3 T
4 C

processed result
ID File 1 character File 2 character File 3 character
1               A                        A                n/a
2               T                       n/a                 A
3               G                        T                  T
4               n/a                     n/a                 C




Re: complex problem

Postby RedGrittyBrick » Tue, 28 Sep 2010 18:28:43 GMT







If I were the OP, I'd view being informed of both as a boon.


P.S. Thanks for drawing my attention to the XYZ problem previously I'd 
only known of the XY problem.

 http://www.**--****.com/ 
-- 
RGB

Re: complex problem

Postby sln » Wed, 29 Sep 2010 01:00:43 GMT







This can be done more than one way.
  $data[ $id ][ file number ]   id as index, depends on id being small and integer
or
  $data{ $id }[ file number ]  id as hash key, the keys are not sorted

-sln
--------------
use strict;
use warnings;

my $f1 =<<EO1;
  1  A
  2  T
  3  G
EO1
my $f2 =<<EO2;
  1  A
  3  T
EO2
my $f3 =<<EO3;
  2  A
  3  T
  4  C
EO3

  my %data;
  my $filecount = 0;

  # Put file list in the column order
  # of the output wanted
  # --------------------------
  for my $file (\$f1, \$f2, \$f3) 
  {
     open my $fh, '<', $file or die "can't open $file: @!";

     while ( defined (my $line = <$fh>) )
     {
         my ($id, $char) = parseline ($line);
         next unless defined $id;
         $data{ $id }[ $filecount + 1 ] = $char;
         unless (defined $data{ $id }[ 0 ]) {
            $data{ $id }[ 0 ] = $id;
         }
     }
     close $fh;
     ++$filecount;
}

for my $id (sort keys %data)
{
    for my $count (0 .. $filecount) {
        $data{ $id }[ $count ] = 'n/a'
            unless defined ($data{ $id }[ $count ]);
    }
    print  " '@{$data{ $id }}'\n";
}

sub parseline
{
    return $_[0] =~ /\s*(\S+)\s+(\S+)\s*/;
}


Similar Threads:

1.Complex problem I need to solve with perl

I have a problem that to me seems complex. I put it here in the perl
forum because this is how I need to solve it.  I've translated the
scenario into a psuedo scenario below but the concept is identical.
Here goes:

I have 10 people: p1..p10
I have 2 products: milk and cookies

I have a list of all 10 people:
p1 gets cookies
p2 gets milk
p3 gets either
p4 gets either
p5 gets milk
p6 gets cookies
p7 gets either
p8 gets cookies
p9 gets milk
p10 gets cookies

I turn the assembly belt on and out comes cookies and milk, random
order.  I need to distribute, one at a time, to the 10 people in a way
that somebody doesn't get seconds, before everybody else capable of
getting that item gets firsts. I've been trying to figure out how to
keep indexes of where each item type was but the problem is the people
that get both.  How do I store who got what?  Now, here's the kicker.
I need to store this in-between program runs.  If it runs every 5
minutes, one time I may get 20 items and one time I may get 2.
Whatever the case may be, it's a continuous loop that needs to start
where it left off.

2.complex problem

ela wrote:
) I have a hundred of files that have predefined columns but unknown number of 
) rows:
<snip>
) While using regular expression to extract the information is easy, and then 
) building the table by associative array is also simple, the main problem 
) falls into padding the previous columns with no records.

Easy.  Do two passes over the data.
First pass finds max number of columns, second pass does all the padding.


SaSW, Willem
-- 
Disclaimer: I am in no way responsible for any of the statements
            made in the above text. For all I know I might be
            drugged or something..
            No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

3.problem in creating a complex hash

Hi All

I have a small issue in arranging data with a array ref .

$arrayref = [ [ [ 'user1, 'c'], [ 'user2', 'a'], [ 'user2', 'b' ],[
'user2', 'd' ],[ 'user3', 'a' ],[ 'user2', 'f' ] ] ];


i tried the following

my %sh ;

foreach my $i ( @$arrayref) {
push (@{$sh{$i->[0]}},{group => [$i->[1]  } );
   }


required hash

 %sh = (  user1 => { group => [  c ] },

            user2 => { group => [ a b d f] },

              user3 => { group => [ a ] }
  )



but i am not able to get it in this format .

Can some one please help me out

Thanks a lot


-- 
Regards
Agnello D'souza

4.Problem with IF Statement Using Complex Variables

Hello;

I would very much appreciate your help.

The following IF statement in g95 Fortran compiler:
Real EPS
Complex  x
 if (abs(aimag(x)) .le. 2.*EPS**2*abs(real(x)))  x=cmplx(real(x),0.)
.................................................................(1)
produces the following error:
Error: Unclassifiable statement following IF-clause at (1)

Thank you kindly.
Monir

5.Problem with (rather) complex grid

As part of a test suite, we have automatically generated a grid pattern
that seems to bug. I have extracted the corresponding instructions. I
would appreciate expert comments. Is there a bug in our program, or
should I create a bug report on SourceForge ?

Here is a standalone code snippet. The frames are expected to fill
exactly the parent window, with no empty space. Which is not the case.

------------
. configure -width 800 -height 500
grid propagate . 0

# first column
frame .panel1 -width 280 -height 200 -bg #123458
frame .panel2 -width 280 -height 200 -bg #3030ff
frame .panel3 -width 500 -bg #20E080

grid .panel1 -row 0 -column 0 -rowspan 2 -sticky w
grid .panel2 -row 2 -column 0 -sticky w
grid .panel3 -row 3 -column 0 -sticky nsw -columnspan 2
grid rowconfigure . 3 -weight 1

# second column
frame .panel4 -height 100 -bg #E0E020
frame .panel5 -height 300 -bg #E020E0
frame .panel6 -bg #A0A0E0

grid .panel4 -row 0 -column 1 -sticky ew -columnspan 2
grid .panel5 -row 1 -column 1 -sticky ew -rowspan 2 -columnspan 2
grid .panel6 -row 3 -column 2 -sticky nsew
grid columnconfigure . 2 -weight 1

-----------------

Thanks,
Franck

6. More problems with complex notation

7. Problem to pass complex array between subroutine and main program

8. splitting a complex string



Return to PERL

 

Who is online

Users browsing this forum: No registered users and 37 guest