Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister

PHP (and Perl) at Yahoo!

by dws (Chancellor)
on Oct 29, 2002 at 19:45 UTC ( #208849=perlnews: print w/replies, xml ) Need Help??

Yahoo! has to deal with problems of scale that many sites don't. They've grown up with a combination of proprietary and Open Source software, and are shifting more towards Open Source. For a bit of history on their technical underpinnings (and a pitch for PHP), see Michael Rawdin's presentation at PHPcon 2002, available on-line at

Of note here is that they have 3 million lines of Perl in their codebase.

Replies are listed 'Best First'.
Re: PHP (and Perl) at Yahoo!
by Ovid (Cardinal) on Oct 29, 2002 at 20:57 UTC

    Very interesting presentation. From what I could see, Perl may have been a serious contender were it not for two areas that they felt were problematic. Sandboxing and TIMTOWTDI.

    The sandboxing issue stems from the fact that Perl is designed to Get Things Done and trusts the programmer to do the right thing in terms of security. Frankly, while I know quite a bit about Web programming security, I don't know as much about "sandboxing" per se. Does the Safe module help with that? At my company, we are developing tools to abstract out many of the dangerous aspects of Perl coding. For example, some of our database work is driven through a database module that forces the developer to use placeholders and automatically rolls back transactions that the developer does not explicitly commit (thus, if the program dies, we don't have "partial" data entered). Unfortunately, these tools are not as mature as they could be.

    Another tool that I have just started using to deal with these issues is a "Web form filter/untainter". Essentially, to read in form data, I set up a list of fields and regex filters for each field. Only data that is passed through a filter (and simultaneously untainted) gets into my code (I hope to integrate something similar into CGI::Safe). This is just to point out that many dangerous aspects of programming can be mitigated with proper tools and standards. However, I am curious to know how this compares with PHP.

    The TIMTOWTDI issue is problematic, though. With only three Perl programmers here, we've been bit by this problem. Perl is just so ridiculously flexible that even a small programming department can produce code that varies widely in style, even if it's all high quality.


    Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

      Wouldn't untainter functionality belong to Data::FormValidator rather than CGI or a derivative thereof? That's where I'd put it, anyway.

      Makeshifts last the longest.

        That sounds reasonable, but this is part of some slightly different functionality that is designed to work specifically with our current Template Toolkit setup here at work. Data::FormValidator appears nice, but I felt that a small, custom-designed tool that is a drop-in replacement for our code was the way to go. My first chance to use this took a 450 line program and reduced it down to 150 lines. No changes to our templates were required. Further, Data::FormValidator makes no reference to untainting the data.

        Don't get me wrong, Data::FormValidator looks great, but it didn't appear to quite be what I need. Subclassing it might have been an option, but it makes internal function calls rather than method calls and that can make subclassing a pain because I would have to reimplement the functions rather than inherit them. The code below demonstrates the problem.

        #!/usr/bin/perl -w use strict; use Data::Dumper; package Foo; sub new { my $class = shift; bless {}, $class; } sub foobared { my $self = shift; $self->{foo} = _test( 3 ); } sub _test { shift } package Bar; @Bar::ISA = 'Foo'; sub foobared { my $self = shift; $self->{foo} = $self->_test( 3 ); } package Main; my $o = Foo->new; $o->foobared; print $o->{foo},$/; my $o2 = Bar->new; $o2->foobared; print $o2->{foo};

        As you can see, mixing regular functions with methods doesn't work. You can't inherit because you have an extra argument.


        Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

Re: PHP (and Perl) at Yahoo!
by grantm (Parson) on Oct 29, 2002 at 20:55 UTC

    Interesting article - it really looked like they were determined to select PHP from the outset. mod_perl clearly won all their benchmarks (except for higher memory usage) but they chose PHP. They were worried about unreadable code (a potential problem they foresaw with Perl) but they chose PHP - and they found code in HTML makes for unreadable and hard to change code.

      I am programming PHP for a living and it is a bit strange that everyone points out that PHP mixes HTML with code and that is a bad thing. Me personally never mixes code and HTML. There are template engines to do the code separation (Smarty comes to my mind, and a Lithuanian phemplate engine). So mixing PHP and HTML rarely happens, especially when you do big websites in OO style (PHP OO support is still very limited though)

        I agree one hundred percent that abstracting your code out into OO modules with templates to render the HTML is the most sensible way to manage code for big web projects - but if you're going to do that, why use PHP? That's a serious question - I don't understand why someone would want to turn their back on the wealth of pre-written code in CPAN in order to use a language which was 'designed for the web' only to use that language in the exact same way they would have used Perl. In Yahoo's case, the decision is even more impenetrable since they already have a large investment in Perl code.

Re: PHP (and Perl) at Yahoo!
by Mr. Muskrat (Canon) on Oct 29, 2002 at 20:58 UTC

    Why not Perl?
    • Pros
    – FreeBSD support and performance is great
    – huge CPAN library
    – we already use it for offline processing
    • Cons
    – There’s More Than One Way To Do It
    – poor sandboxing, easy to screw up server
    – wasn’t designed as web scripting language

    Say what?

    There's More Than One Way To Do It
    This is a bad thing?

    poor sandboxing, easy to screw up server
    And it's not with PHP?

    wasn’t designed as web scripting language
    True, but Perl has a long history of being used in web applications.

      poor sandboxing, easy to screw up server
      And it's not with PHP?

      Any one who thinks that you can't screw up a server with PHP obviously hasn't tried many PHP based message boards...

Re: PHP (and Perl) at Yahoo!
by vek (Prior) on Oct 29, 2002 at 22:01 UTC
    Really interesting article. Cheers for the link dws. Interesting to see their software/hardware progression from the 1994-1995 years through to 2002.

    -- vek --
Re: PHP (and Perl) at Yahoo!
by ignatz (Vicar) on Oct 30, 2002 at 15:46 UTC
    For me the most interesting line is
    "We customize Open Source software we use
    - often improvements are not sent back
    – many are gross Y!-specific hacks
    Will Yahoo! start acting like a good open-source neighbor? If they start giving back to the community that could be a real boost to PHP. If not, it becomes an interesting bit of internet trivia, much like the stories I heard about them doing custom work to get Oracle to play well with Free-BSD. I have no idea if it was true, but there was a time when it would have been real handy to have learned from their experiences. The Free-BSD community would have really benefited from a heavy hitter like Yahoo being more than just users of other peoples work.
      The Free-BSD community would have really benefited from a heavy hitter like Yahoo being more than just users of other peoples work.

      I'm afraid I'm drifting way off-topic here, but I'd like to correct this comment. Yahoo have done lots of good work for the FreeBSD community. Yahoo employ several FreeBSD committers and other people who have contributed to the project. Yahoo host and manage many of the servers that run FreeBSD's CVS repository, Web site, mail, package building, etc. Their staff seem happy to publicly state that their Web servers run on FreeBSD, as in this presentation. I consider them "more than just users of other peoples work".

        --/me Stupid foot. I stand corrected.

        My only experience is specifically to do with FreeBSD running Oracle client libraries. At the time it was very frustrating knowing that Yahoo! was able to get it work but we couldn't (cleanly). Given that it is a comercial product running on an open source OS, that could add to the complexity.

        Node provided for additional downvotes.


Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlnews [id://208849]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (3)
As of 2023-10-03 06:46 GMT
Find Nodes?
    Voting Booth?

    No recent polls found