Re: [PHP-DEV] Re: [PHP-CVS] cvs: php4 / TODO From: Ron Chmara (ron <email protected>)
Date: 10/13/00

John Donagher wrote:
> I work for Intacct. We've built an accounting application similar
> to (but better than) Quickbooks. One of the big issues we've been
> running into is performance due to the serial execution path of the
> application.

Breaking non-related code sections into sub-threads. This would be
nice for slow background processes....

> Every time a script is requested, our database abstraction layer
> (written in PHP as well) has to be initialized, the schema layout
> has to be initialized (we have a data structure which represents
> our schema layout for use in automatic query generation), and our
> 60K security mappings file has to be executed.

Oh my. this sounds like an *architecture* problem, OO gone to
the dark side, like trying to write in Java (OOP) when you need
the speed of C. I'm not sure an application can really help you out
too much here, except for maybe moving your file caching from
the disk to some internal buffers.... let me make sure I have
this right:

Rather than using the *needed* code for each page, you are loading
an entire DB abstraction layer, a schema layout, and 60K of
security mappings? That's an insane amount of code to update
a record. Heck, a full load of one lib could cost you, what,
40k per page when you only needed *two lines*?

Any language can be brought to its knees by excessive coding.
Heck, Microsoft makes OS's that are a 100 times their needed size,
with insane load times, because of their OO mindset, so rather
than making fast code, they make *easy* code. You could do the
Microsoft thing, and just throw absurd amounts of hardware at
everything.... sometimes it's cheaper than re-writing.

> So what I'm trying to get across is
> that our code isn't unnecessarily bloated, but in reality things that
> need to be initialized only once are instead initialized every time any
> script is requested.

This is up you, mostly. There are a few kinds of different parameters
that _can_ be migrated to better places (for example, default database
connection strings), but even for a complex accounting application to
manage 400 sites with over 200 inputs per site, we used, oh, 4. The
rest of the code was only loaded as needed, rather than preloaded.
include() is your friend. :-)

Some thoughts:
1. Better Disk buffers? Stick it in RAM, and it all goes much faster.
2. A shared memory space between what, 16/87/141 applications? (141 httpd
processes?) That might be nice.

> Unfortunately, we've reached the point where we need to look at moving
> data access, sessions, and some of the other heavyweight code into some sort
> of stateful objects, that don't have to be initialized on every script request.
> Since this is new to me, I'm unsure of what the options are. The obvious choice
> is some sort of application server using Java beans for data access and sessions,

HAHAHAHAHAHahhaahahah. And you thought *PHP* was slow.... the problem is that the
code engine is just plain doing far more than required. If you rewrite the same
way in Java, you'll have the same problem. If you think of PHP in a JavaBeans
mindset, where you write tiny, granular blocks, and _only load as needed_, it
will perform much differently than what you're doing now, which is equivalent
to writing in beans and then loading all the blocks, all at once, for each page
hit. You're loading the whole beanery for one cup of cofee. ;-)

Load libs pieces as needed, instead, and pass flags to indicate what should
be loaded. Like this:
if ($db_add){include (dbadd.inc);}
if ($db_up) {include (dbup.inc);}
if ($db_sel){include (dbsel.inc);}

Your entire db management load has been reduced to three lines, so you only
load what you need, when you need it. No 40K bloat for every page, for a single
insert. No 200K for some db abstraction layer that sits unused, etc. Worst
case is a page where you add, update, and select all at once, and even then,
that'll be much rarer than folks just doing two functions and then looking
at the results.

> but I'm curious to see if any other large-scale PHP projects have hit this wall,
> and what they did to get past it.

What we've done to fix speed:
1. Rewrite, based on your frequently accessed site portions.
 a. Rip out/apart the db-abstraction and wasted overhead libraries. Speed is
more important.
 b. rip out excessive error checking. New programmers tend to check a database
search 3-4 times (the connection, the search, the results) and perform similar
pedantry.
 c. rip out function calls if they're only used once per page (why waste the CPU
and RAM?) and use static code, or make a smaller, more specialized function.
 d. elmiminate unused variables, excessive loop structures, condense functions
 e. eliminate comments, whitespace lines, etc.
 f. use newer, compiled, commands to replace slower, legacy, code
 g. trim all includes to use _only_ the code needed. (For example, on a 8
page, 46 subcomponent, site, high speed dictated that the *full* authorization
include and associated routines was only used once. The rest of the time, only a
third of it was used. So we split the include to load the auth lookup pieces
as needed.)

2. Don't store infrequently changed data in a database. An RDBMS is the wrong
place to store data like security settings, which almost never change.
(Same with loading it as a disk file). You need LDAP, or some other DB engine
optimized for fast data retrieval. Most standard DB engines are balanced
for inserts and retrievals, where LDAP is designed for fastest retrieval.
We're talking up to 10,000 retrievals per minute on a single PII 600...
and you can distribute openldap, for *free*, so you can set up a small cluster.
The drawback? Maybe 600 inserts per minute.

3. Rewrite the db engine to hold more logic. If you have to do more than
one lookup per data set, look at it _REAL HARD_. RDBMS's have stored
procedures for a reason.

5. Worst case, upgrade the hardware and introduce more load balancing.

Note that I didn't really address PHP vs. ASP vs. Javabeans vs. whatever
in the list.... regardless of what approach you take, as a site/project
grows, you have to balance out the fast procedures vs. easy OO, and you
have to re-evaluate your major components, not just one (the middleware).

Even if you go to beans, you'll still have to rewrite to small, dynamically
loaded, pieces, with some of them cached, either in engine or in disk cache
.... so there's a rewrite in the future anyways.

I know I may come off as a bit hyper critical of OO, but OO was never
designed with _running_ fast in mind. It's designed to be written
fast, and changed fast. If you want fast running code, you have to
unlearn a bit of the OO philosophy and balance it with some procedural
speed boosting work, and recreate some code management structures (human
structures) to deal with things like multiple-coder inheritance.
Either that, or you have to start coding php in a beans-style, with
lots of loading statements for each page, and rely on disk cache to
load from RAM.

-ronabop

--
Brought to you from iBop the iMac, a MacOS, Win95, Win98, LinuxPPC machine,
which is currently in MacOS land.  Your bopping may vary.

-- PHP Development Mailing List <http://www.php.net/> To unsubscribe, e-mail: php-dev-unsubscribe <email protected> For additional commands, e-mail: php-dev-help <email protected> To contact the list administrators, e-mail: php-list-admin <email protected>