Wednesday, January 03, 2007

The Java Problem

This is the internal memo sent by Sun Microsystems about the problems they were facing with java - insightful- but whats really interesting is that the problems listed in the memo are in JDK 1.3 and 1.4 and it wasnt identified to be a major problem in the orginal JDK version


The Java Problem
Author: Julian S. Taylor
Reviewed by: Steve Talley, Mark Carlson, Henry Knapp, Willy (Waikwan) Hui,
Eugene Krivopaltsev, Peter Madany, Michael Boucher

Executive Summary

While the Java language provides many advantages over C and C++, its
implementation on Solaris presents barriers to the delivery of reliable
applications. These barriers prevent general acceptance of Java for
production software within Sun. A review of the problem indicates that these
issues are not inherent to Java but instead represent implementation
oversights and inconsistencies common to projects which do not communicate
effectively with partners and users.

Within Sun, the institutional mechanism for promoting this sort of
communication between partners is the System Architecture Council codified
in the Software Development Framework (SDF). We propose that the process of
releasing our Java implementation will benefit from conformance with the
SDF.

Introduction

This document details the difficulties that keep our Solaris Java
implementation from being practical for the development of common software
applications. It represents a consensus of several senior engineers within
Sun Microsystems. We believe that our Java implementation is inappropriate
for a large number of categories of software application. We do not believe
these flaws are inherent in the Java platform but that they relate to
difficulties in our Solaris implementation.
We all agree that the Java language offers many advantages over the
alternatives. We would generally prefer to deploy our applications in Java
but the implementation provided for Solaris is inadequate to the task of
producing supportable and reliable products.
Our experience in filing bugs against Java has been to see them rapidly
closed as "will not fix". 22% of accepted non-duplicate bugs against base
Java are closed in this way as opposed to 7% for C++. Key examples include:

4246106 Large virtual memory consumption of JVM
4374713 Anonymous inner classes have incompatible serialization
4380663 Multiple bottlenecks in the JVM
4407856 RMI secure transport provider doesn't timeout SSL sessions
4460368 For jdk1.4, JTable.setCellSelectionEnabled() does not work
4460382 For Jdk1.4, the table editors for JTable do not work.
4433962 JDK1.3 HotSpot JVM crashes Sun Management Center Console
4463644 Calculation of JTable's height is different for jdk1.2 and jdk1.4
4475676 [under jdk1.3.1, new JFrame launch causes jumping]

In personal conversations with Java engineers and managers, it appears that
Solaris is not a priority and the resource issues are not viewed as serious.
Attempts to discuss this have not been productive and the message we hear
routinely from Java engineering is that new features are key and
improvements to the foundation are secondary. This is mentioned only to make
it clear that other avenues for change have been explored but without
success. Here we seek to briefly present the problem and recommend a
solution.

Defining the Java Problem

These are the problems we have observed which we believe indicate the need
for an improved implementation and a modified approach.

1. The support model seems flawed
Since Java is not a self-contained binary, every Java program depends
fundamentally upon the installed Java Runtime Environment (JRE). If that JRE
is broken, correction and relief is required. This sort of relief needs to
happen in a timely manner and needs to fix only the problem without the
likelihood of introducing additional bugs. Java Software does not provide
such relief.
Java packages are released (re-released) every four or five months,
introducing bug fixes and new features and new bugs with each release. These
releases are upgrading packages which remove all trace of the prior
installed packages and cannot be down-graded in the event of an error. The
standard release taxonomy used by the Architecture Review Committees (ARCs)
was developed for use by Solaris and our other mission-critical software
products to help solve these and many other problems.

It is impractical for a project based on Java to correct bugs in the Java
implementation. Java Software corrects bugs only by releasing an entire new
version. For that reason, projects seek to deliver their own copy of Java so
they can maintain it without fear of a future upgrade. Outside vendors, such
as TogetherJ, specify a particular release of Java for their product. The
customer must locate that release and install it. If a future product seeks
to use a different version, that version has to be installed side-by-side
with the prior version or TogetherJ may no longer function.
The ARCs commonly see project submittals requesting permission to ship their
own version of Java. The ARCs have been routinely forbidding projects to do
this even though they are aware of specific cases wherein interfaces or
their underlying behaviors have changed incompatibly across minor releases.
The threat of losing the ability to directly support such a substantial part
of their product has inhibited projects from choosing Java as their
implementation language and caused widely-discussed problems for customers
of projects that have used Java. Consider that the Java language supports
rapid development, simple testing and access to a wide variety of platforms.
Why are the shelves at CompUSA (a Linux friendly store) not crammed with
W32/Linux/etc offerings written in Java? As it stands client-side Java
remains primarily a web language partly because the Netscape platform runs
Java 1.1.5 and has not changed for years. It is buggy but very stable.

This indicates that Java must strictly enforce backward compatibility across
minor releases and must adhere to Sun release taxonomy for the
identification of releases. Further, existing releases must support some
sort of remedy akin to a patch so that existing installations can be
corrected through existing methods.

2. The JRE is very large.
The JRE is significantly larger than comparable runtime environments when
considering resident set size (memory dedicated to this specific program).
It has been seen to grow to as much as 900M. This has a drastic effect on
both performance and resource usage. It also means that multiple JREs
present critical resource constraints on the servers for such thin-client
systems as SunRays. Typical resident set requirements for Java2 programs
include:

Hello World 9M
SMC Server 38M
SLVM GUI 60M
Component Manager 160M
TogetherJ 300 - 900M

The largest program in that list is TogetherJ. From the standpoint of
resource requirements, TogetherJ does much of what Rational Rose does but
Rational Rose appears to function in less than 250M. Startup time is
effected as well. For example, on an Ultra10 TogetherJ requires 5 minutes to
load and start. SMC, Sun's flagship system admin console, takes between one
and two minutes to reach the point that it can be used.
Some of this problem appears to relate to the JRE. We do not have the time
or money to conduct a serious side-by-side study of Java vs other languages
and are therefore calling upon our personal experiences with Java
development. The fact that these experiences are hard to quantify forces us
to try to support the validity of this concern through existing research.

A study performed by an outside team appears to indicate a rough parity in
performance between Java and a common implementation of another OO language
called Python (see IEEE Computing, October 2000, "An Empirical Comparison of
Seven Programming Languages" by Lutz Prechelt of the University of
Karlsruhe). Both platforms are Object Oriented, support web applications,
serialization, internet connections and native interfaces. The key
difference is that Python is a scripting language. This means there is no
compilation to byte code so the Python runtime environment has to do two
things in addition to what the Java runtime environment does. It has to
perform syntax checks and it must parse the ascii text provided by the
programmer. Both of those tasks are performed at compile time by Java and so
that capability does not have to be in the JRE.
Given this data, it appears that the JRE can actually be simpler than the
Python RE since Java does at least some of this work at compile time. The
example above of "Hello World" is a good method for getting an idea of the
minimum support code required at runtime. This support code includes garbage
collector, byte code interpreter, exception processor and the like. Hello
World written in Java2 requires 9M for this most basic support
infrastructure. By comparison, this is slightly larger than automountd on
Solaris8. The Python runtime required to execute Hello World is roughly
1.6M.

Further examples of what is possible include the compiling OO languages
Eiffel and Sather which fit their garbage collector, exception processor and
other infrastructure into roughly 400K of resident set. While the Java VM
(as demonstrated above) grows rapidly as more complex code is executed, the
Python VM grows quite slowly. Indeed, an inventory control program written
entirely in Python having a SQL database, a curses UI, and network
connectivity requires only 1.7M of resident set. This seems to indicate that
the resident set requirements of the JRE could be reduced by at least 80%.

Imagine what happens if our current implementation of Java were ubiquitous
and all 150 users on a SunRay server were running one and only one Java
program equivalent to Component Manager above. The twenty-four gigabytes of
RAM the server would have to supply exclusively to these users is well
beyond the typical configuration. RAM is cheap but performance is what we
sell, all customers on that SunRay server would see significant performance
degradation even with the maximum amount of RAM installed as all other
processes were forced to reside on swap.
The resident set size required by the JRE makes it impractical to run Java
in an initial Solaris install environment. It is impractical to run it as a
non-terminating daemon. A Java daemon could be started from inetd run long
enough to do its job and then quit but the rpc protocol required to pass the
socket port to the daemon is very complex and not Java-friendly. Java
applications cannot be executed at boot time since the loading of the VM
introduces an unacceptable performance degradation. If the Java runtime were
as small as that of Python, it is likely that the Java daemon would become
popular and could provide basic services to applications written in any
number of languages.

3. Extensions do not support modularity.

As new extensions are introduced, they are released separately under their
own names and distributed generally. Each one may go through several
revisions as separate modules. At some point, they are then folded into base
Java, tying base Java's version to the versions of dozens of smaller yet
distinct functionalities. These functionalities are then restricted to a
draconian backward-compatibility rule since once folded in, they are no
longer selectable modules. Examples include modules that used to be called
Swing, RTI, IDL, JSSE and JAAS. These are all good things that should be
part of Java. Our concern is that these are not separable modules which can
evolve as requirements change.
The Java system for evolving the interface (deprecation) does not serve
production software very well. Once the interface disappears, the product
just breaks. If the Java base were simpler and the more advanced features
(those most likely to be deprecated) were delivered as versioned modules, it
would be possible for a commercial product to retain it's older modules on
the system and survive a large number of Java upgrades.
Production quality programs written in Java, like TogetherJ, indicate a
specific Java version which must be installed before the program is run. If
another program is installed, requiring a higher Java version, the user may
be forced to decide which program stays and which goes away. Alternatively,
the other Java version could be installed to a different base directory but
this requires considerable sophistication on the part of the user,
complicates administration and violates the ARC big rule that common
software must be shared.

4. It is not backward-compatible across minor releases.

Among the various incompatibilities across minor releases are:
a) In JDK 1.1 Class.fields() returns only public variables. In 1.2,
protected and private variables are returned.
b) Swing table sizing calculation changed from Java 1.3 to 1.4.
c) Swing JFrame launch behavior changed significantly from Java 1.2.2 to
Java 1.3.1.

Each of these examples is simple, but they demonstrate the general problem
that people cannot program for a particular release of Java and expect that
their programs will continue to run. This is a serious problem now, but has
the potential to become a show-stopper as technology such as auto-update
advances.

What is perhaps more important is that the perception of Java as an unstable
platform is widespread. This perception is restated with every Java-based
project to come to ARC. Within Sun, Java is not viewed as a satisfactory
language for the construction of commercial applications. This perception
and the record require addressing.

The Java Problem is Recognized Internally

That our Java implementation is perceived as inappropriate for many uses is
supported by internal documents and policies. For example:

1. SOESC AI - 092501.2 Java Dependencies for Deployment

In this document provided to SOESC, John Perry describes the concerns
regarding the Solaris "JVM dependencies for deployment". Following is an
excerpt:
-------
- Large footprint of applications when run on Solaris. A simple application
("hello world" type) has a total footprint of 35-40 megs on Solaris 9 (build
48, using Java 1.4 build 82) on both Intel and Sparc machines. Sparc
machines, by far, have a much higher resident footprint then Intel machines
(~30 megs, compared to ~11 megs). The same program run on a Windows machine
has a footprint of ~5 megs, resident footprint being ~3.5 megs

- Slow start up times prevents Java applications from being started while
Solaris is booting up and during mini-root time. This requires applications
which are written in Java to have some kind of mechanism to start-up after
the OS has been fully started.

- Instability of Native code (JNI) which can cause the entire VM to crash.
-------

2. Teams Are Looking for Options

The CIMOM (supporting WBEM) is a Java daemon. It initially occupies around
40M of RSS but grows from there. In order to address this problem, at least
one Sun Engineer, Peter Madany, has been doing research to determine Java
daemon memory utilization when running on a currently unsupported J2ME VM on
Solaris. In other words, we are looking into demonstrating that resource
exhaustion on Solaris Servers could be avoided by using some of the
techniques used in an edition of Java intended for very small systems.

3. New Projects Explain Why They Are Not Using Java

Quoting from the recently submitted Nile case (SARC/2001/617) now under
review:
-------
These libraries should be commercial implementations and must be in native
platform code (ie not Java or Perl). Native code is a requirement because
one of the core requirements for the proxy is for minimum impact on the
target host. Java has too large a footprint (both memory and disk image) and
may not be installed on the customer's host.
-------

4. ARCs Include the Java Problem in Rejection Reasoning

Quoting from the recently rejected SunMC PMA case (LSARC/2000/457):
-------
The CLI interpreter is implemented in Java, and the overhead of starting a
JVM for each command execution is prohibitive. At least one of the votes to
reject was related to this inappropriate use of Java. The Solaris
implementation of Java is slow and very large. While this project did not
provide a measurement of resident set for their CLI, the minimum RSS for the
JVM is known to be 9MB and the typical RSS for a similar Java program is 30
to 40MB, and takes up to 15 seconds to start. The project team admitted in
the review that this CLI may be used on a daily basis. For such a CLI, the
delays and resource requirements of the Solaris Java implementation are
unacceptable.
-------

5. Customers and Field Engineers Are Noticing the Problem

Following is an excerpt from Kevin Tay's e-mail to three Java aliases
regarding a customer installation of a third-party product written in Java
called Vitria. We see typical very large RSS numbers compared to a WinNT
implementation combined with increased resource usage from Solaris7 to
Solaris8:
-------
Customer said they have something like 450+ container servers and 80+
automator server for the Vitria system. So the estimation for the hardware
RAM is around 9GB for USII machine and 14-15GB for the USIII machine.
Questions:

1. Why is Sun systems using so much more memory?
2. Why is the UltraSPARC III/Solaris 8 system using a lot more memory than a
UltraSPARC II/Solaris 7 system (with every other thing being equal)?
3. How can I reduce the memory utilization of the UltraSPARC III system?
-------
NOTE: The response to this e-mail was to suggest moving to a different build
of Java 1.2.2 since the indicated build on Solaris 8 had a known bug; it
should be noted, however, that the 9GB memory footprint for Solaris7 is
still unusually large.

6. Close Call in Solaris9
Bug ID 4526853 describes a bug in Core Java which used to be an external
module called JSSE. Among other products, PatchPro and PatchManager depend
on the JSSE. As long as the module could be used, the JSSE interface could
be trusted to remain stable despite extensive changes in core Java. Now the
Java architecture makes it impossible to use the module. This bug in core
Java completely disables PatchPro and PatchManager. It was introduced in
build 83 of Java 1.4. It was detected and corrected before the final build
of Solaris9. If it had not been detected before the final build, it would
have shipped with Solaris9 FCS.

For those products that depend upon JSSE and operate on multiple OSs, there
would have been no recourse except to deliver with their product an entire
new Java distribution. This distribution would have to upgrade the existing
Java installation. The fact that various products depend upon specific
versions would mean that such an upgrade would carry the risk of breaking
other Java-based software on the target system.

Correcting the Java Problem

We strongly recommend that management require Java to conform to the
Software Development Framework especially from the standpoint of ARCreview.
We believe that the next release of the Sun Java implementation should be
brought to ARC while still in the prototype phase. Both PSARC and LSARC have
dealt with the Java issues peripherally, recognizing numerous problems but
unable to effect change in the underlying source of the difficulties -
namely Java. By bringing the Sun Java implementation through ARC, these
issues can be resolved.

No comments: