WebLogic 10.3 jndi issue with versioned app during cluster restart


I intend to report this issue with WebLogic Server (WLS hereafter) which has taken quite a bit of my peace and sleep over last week or so. Obviously i did try to find help on internet with all sorts of keyword search combinations but all in vein, except for one which looked to point to a Oracle bug report url but broken one.😦

Infrastructure:
I have a clustered WLS setup hosting three applications connecting to Oracle database while Http requestes are load balanced between the cluster nodes via an iPlanet web sever.

  • physical machines – two solaris hardware [say m1, m2]
  • cluster – two clusters [c1, c2] each with two WLS instances. Each node of the cluster resides on different machine, ie c1 is spread accross m1 & m2 and so is c2.
  • applications – three jee EAR archives [app1, app2, app3] with app1 deployed on c1, app2 & app3 deployed on c2. Applications are deployed on all nodes of a cluster. These archives are versioned applications, meaning they define Weblogic-Application-Version property in manifest.mf file of the EAR.
  • JDBC/JMS – all datasources, queue, connectionFactory, mail sessions are defined as global resources

Issue:
Ocassionally i do rolling restart of the clusters, ie restart one node while other is running and then do the same with other node, to ensure availability. However, during the period when one node in the cluster is down, the JNDI tree in the other node seems to get messed up. All EJB accesses by the application fail with javax.naming.NameNotFoundException. The JNDI tree however comes back to normal state when both nodes in the cluster are up and running. Also i noticed only the EJB resources [the stubs and homes] seem to be in error state while the JMS/JDBC resources are intact. See the exception trace below.

Exception StackTrace:

####<20-Jan-2011 16:47:23> <Debug> <JNDI> <m1> <nodea> <[ACTIVE] ExecuteThread: '1' for queue: 'weblogic.kernel.Default (self-tuning)'> <amitagar> <> <> <1295542043376> <BEA-000000> <+++ lookup(com.obfs.ejb.remote.apps.app2, weblogic.jndi.internal.ServerNamingNode) succeeded> 
####<20-Jan-2011 16:47:23> <Debug> <JNDI> <m1> <nodea> <[ACTIVE] ExecuteThread: '1' for queue: 'weblogic.kernel.Default (self-tuning)'> <amitagar> <> <> <1295542043377> <BEA-000000> <+++ lookup(com.obfs.ejb.remote.apps.app2.ApplicationManagementService, weblogic.jndi.internal.ServerNamingNode) succeeded> 
####<20-Jan-2011 16:47:23> <Debug> <JNDI> <m1> <nodea> <[ACTIVE] ExecuteThread: '1' for queue: 'weblogic.kernel.Default (self-tuning)'> <amitagar> <> <> <1295542043379> <BEA-000000> <+++ allowExternalAppLookup check failed: ActiveVersionInfo=ActiveVersionInfo[name=com.obfs.ejb.remote.apps.app2.ApplicationManagementService,appName=app1,version=null,adminVersion=null], CurrentApp=app2#app2_r21_d10_m104, CurrentWorkContext={weblogic.app.app2=app2_r21_d10_m104}, JNDIEnv={java.naming.factory.initial=weblogic.jndi.WLInitialContextFactory, java.naming.factory.url.pkgs=weblogic.jndi.factories} 
java.lang.Exception
at weblogic.jndi.internal.VersionHandler.getCurrentVersion(VersionHandler.java:116)
at weblogic.jndi.internal.ServerNamingNode.lookupHere(ServerNamingNode.java:185)
at weblogic.jndi.internal.BasicNamingNode.lookup(BasicNamingNode.java:206)
at weblogic.jndi.internal.BasicNamingNode.lookup(BasicNamingNode.java:214)
at weblogic.jndi.internal.BasicNamingNode.lookup(BasicNamingNode.java:214)
at weblogic.jndi.internal.BasicNamingNode.lookup(BasicNamingNode.java:214)
at weblogic.jndi.internal.BasicNamingNode.lookup(BasicNamingNode.java:214)
at weblogic.jndi.internal.BasicNamingNode.lookup(BasicNamingNode.java:214)
at weblogic.jndi.internal.BasicNamingNode.lookup(BasicNamingNode.java:214)
at weblogic.jndi.internal.WLEventContextImpl.lookup(WLEventContextImpl.java:254)
at weblogic.jndi.internal.WLContextImpl.lookup(WLContextImpl.java:380)
at javax.naming.InitialContext.lookup(InitialContext.java:392)
at com.obfuscated.common.management.CommonServiceLocator.getApplicationService(CommonServiceLocator.java:85)
at com.obfuscated.bootstrap.appcore.web.PPUniversalFilter.doFilter(PPUniversalFilter.java:128)
at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:42)
at weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.run(WebAppServletContext.java:3496)
at weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:321)
at weblogic.security.service.SecurityManager.runAs(Unknown Source)
at weblogic.servlet.internal.WebAppServletContext.securedExecute(WebAppServletContext.java:2180)
at weblogic.servlet.internal.WebAppServletContext.execute(WebAppServletContext.java:2086)
at weblogic.servlet.internal.ServletRequestImpl.run(ServletRequestImpl.java:1406)
at weblogic.work.ExecuteThread.execute(ExecuteThread.java:201)
at weblogic.work.ExecuteThread.run(ExecuteThread.java:173)
> 
####<20-Jan-2011 16:47:23> <Warning> <JNDI> <m1> <nodea> <[ACTIVE] ExecuteThread: '1' for queue: 'weblogic.kernel.Default (self-tuning)'> <amitagar> <> <> <1295542043380> <BEA-050006> <An attempt was made to look up versioned object "com.obfs.ejb.remote.apps.app2.ApplicationManagementService" from an external client or another application. This can potentially cause in-flight work of the application version not being tracked properly and thus being retired prematurely.> 
####<20-Jan-2011 16:47:23> <Debug> <JNDI> <m1> <nodea> <[ACTIVE] ExecuteThread: '1' for queue: 'weblogic.kernel.Default (self-tuning)'> <amitagar> <> <> <1295542043380> <BEA-000000> <+++ getActiveVersion failed, info=ActiveVersionInfo[name=com.obfs.ejb.remote.apps.app2.ApplicationManagementService,appName=app1,version=null,adminVersion=null], isAdmin=false> 
####<20-Jan-2011 16:47:23> <Error> <app2_r21_d10_m104> <m1> <nodea> <[ACTIVE] ExecuteThread: '1' for queue: 'weblogic.kernel.Default (self-tuning)'> <amitagar> <> <> <1295542043382> <BEA-000000> <CommonServiceLocator> <failed to lookup application service
javax.naming.NameNotFoundException: Unable to resolve 'com.obfs.ejb.remote.apps.app2.ApplicationManagementService'. Possibly previously active version was already unbound.; remaining name ''
at weblogic.jndi.internal.BasicNamingNode.newNameNotFoundException(BasicNamingNode.java:1139)
at weblogic.jndi.internal.VersionHandler.getActiveVersionObject(VersionHandler.java:200)
at weblogic.jndi.internal.VersionHandler.getActiveVersionObjectAndInit(VersionHandler.java:184)
at weblogic.jndi.internal.VersionHandler.getCurrentVersion(VersionHandler.java:125)
at weblogic.jndi.internal.ServerNamingNode.lookupHere(ServerNamingNode.java:185)
at weblogic.jndi.internal.BasicNamingNode.lookup(BasicNamingNode.java:206)
at weblogic.jndi.internal.BasicNamingNode.lookup(BasicNamingNode.java:214)
at weblogic.jndi.internal.BasicNamingNode.lookup(BasicNamingNode.java:214)
at weblogic.jndi.internal.BasicNamingNode.lookup(BasicNamingNode.java:214)
at weblogic.jndi.internal.BasicNamingNode.lookup(BasicNamingNode.java:214)
at weblogic.jndi.internal.BasicNamingNode.lookup(BasicNamingNode.java:214)
at weblogic.jndi.internal.BasicNamingNode.lookup(BasicNamingNode.java:214)
at weblogic.jndi.internal.WLEventContextImpl.lookup(WLEventContextImpl.java:254)
at weblogic.jndi.internal.WLContextImpl.lookup(WLContextImpl.java:380)
at javax.naming.InitialContext.lookup(InitialContext.java:392)
at com.obfuscated.common.management.CommonServiceLocator.getApplicationService(CommonServiceLocator.java:85)
at com.obfuscated.bootstrap.appcore.web.PPUniversalFilter.doFilter(PPUniversalFilter.java:128)
at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:42)
at weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.run(WebAppServletContext.java:3496)
at weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:321)
at weblogic.security.service.SecurityManager.runAs(Unknown Source)
at weblogic.servlet.internal.WebAppServletContext.securedExecute(WebAppServletContext.java:2180)
at weblogic.servlet.internal.WebAppServletContext.execute(WebAppServletContext.java:2086)
at weblogic.servlet.internal.ServletRequestImpl.run(ServletRequestImpl.java:1406)
at weblogic.work.ExecuteThread.execute(ExecuteThread.java:201)
at weblogic.work.ExecuteThread.run(ExecuteThread.java:173)

Investigations:

  1. Clusters use multicast addresses with almost all default settings. I did change to use Oracle recommended Unicast messaging but in vein.
  2. All ejbs are SLSB written in 2.1 syntax but deployed with 3.0 deployment descriptors. Tried to force clustered deployment for all ejbs using tags even though it is default. Didnt solve the problem.
  3. Interestingly it works like a charm with wls 8.1 servers with code compiled in jdk 1.4. I just migrated my code to deploy on wls 10.3 with code compiled in jdk 6 and ejbs deployed with 3.0 deployer schemas.
  4. I have ensured that only one [and identical] version of each application is deployed in each node in cluster. So there is no reason to suspect confliciting versions.
  5. If a user has active session before one node is restarted and the same node was serving the user request, wls is able to failover the subsequent request from the user to active node. However, if the logs out and re-login, it failes to find the EJBs.

S/W Versions:
WLS 10.3, JDK 6, iPlanet 6, Oracle 11g, Struts 1.x, Spring 2.x

References:

  1. Programming Requirements and conventions for versioned deployment
  2. Production Redeployment In WLS Clusters
  3. Problem reported by someone else earlier

SOLUTION:
Remove the versioned redployment. Do not version your application ear archives. It surprises me because it all works fine with wls 8.1, jdk 1.4 – not sure why it doesnt work with just wls/jdk upgrades while all else remaining same.

I have faced intermittent issues with this feature of weblogic [versioned application redployment] for a while. And WLS documentation also suggests to use this feature with caution. I would be interested to know what caused the problem i faced though i am a bit relaxed now – that i know how to avoid the problem irrespective of my likings.🙂

7 comments

  1. Well done for solving this. Versioned deployment has promised a lot a few years ago, but unfortunatelly hasn’t lived up to its promises. Many problems including MDBs that would still take receive msgs even when de-activated and so on. It is tricky to identify problems like this, this shows how well you have approached the problem. Sod versioned deployment, the loss is not that great!

  2. I was able to raise this issue with Oracle support and here is the answer from them.

    For this issue,we have a bugBug 8176653, and it is found in 10.3.0 but fixed in the higher version of the weblogic server 10.3.1.Kindly request you to download the patch using the smart update and test it and let us know the result.

    Download and install the following WLS patch from SmartUpdate to resolve this issue:

    Patch ID: 855M
    Passcode: ZIGIWA22

    Also found the following reference while searching Oracle bug database –

    The cause of this problem has been identified in Weblogic Server (WLS) version 10.3.0 Bug 8176653, and it is caused by versioned EJB can not be looked up after one server in cluster is shutdown.

    So the problem is, as we expected earlier, with JNDI management of WLS 10.3.0 with versioned EAR in clustered deployment. Either remove versioning from your application or upgrade WLS version.

  3. We could also specifiy the property -Dweblogic.jndi.relaxVersionLookup=true (to WLS, or Foreign JNDI Providers being used)

    http://download.oracle.com/docs/cd/E17904_01/apirefs.1111/e13941/constant-values.html#weblogic.jndi.WLContext.RELAX_VERSION_LOOKUP

    Specifies the lookup behavior when bindings from the current application version cannot be found. If the property is set to true, the currently active version will be returned, if any. If the property is set to false, a NameNotFoundException will be returned. The default value of the property is false. Users should exercise discretion and ensure that components of different application versions are compatible when setting the property to true.

  4. Hi,
    I’ve similar problem with Weblogic 10Mp1 and we are trying Weblogic cluster for our next release. We have two manage nodes MAN1 & MAN2 and we haven’t clustered EJBs & JMS (only JDBC is clustered). We have created one JMS server for each managed node and with respective JMS module.

    MAN1-JMS-SERVER -> DEPLOYED TO MAN1
    MAN2-JMS-SERVER -> DEPLOYED TO MAN2

    MAN1-JMS-MODULE -> DEPLOYED TO MAN1
    MAN2-JMS-MODULE -> DEPLOYED TO MAN2

    MAN1-JMS-MODULE & MAN2-JMS-MODULE has has Queue & Topic connection factories and queues/topic definitions. Basically, each managed node has same queue/topic factories and queues/topics published.

    In multicast configuration, when I bring down one managed node, it is messing up JNDI trees in another managed node as well and queues/topics are getting removed & it causes NameNotFoundException for queues/topics. This makes managed node (which is up & running) useless and defeats the whole purposes of making application clustered.

    This problem is present in unicast though and it works well here.

    Software versions
    Weblogic 10 MP1, JDK 1.5.0.11, EJB 2.1

    I tried -Dweblogic.jndi.relaxVersionLookup=true in Weblogic startup scripts but this still doesn’t work with multicast configuration. Can any one tell me why this problem exists and how it can be fixed?

  5. I downloaded latest Weblogic 10.3.5 and I see this issue in both unicast & multicast. In both cases, when I shutdown one managed node, it removes JNDI queue & topic references from other managed node as well, which is up & running.

    In weblogic 10 Mp1, this issue is present only in mulitcast and unicast works fine.

    In 10.3.5, the cluster wide JNDI update happens in both unicast & multlcast and if I look at JNDI tree for managed node 1, the EJB has references of both Man1 & Man2 in toString method. In 10 Mp1, this EJB references having both Man1 & Man2 references is present only in multicast and in unicast I see only respective managed nodes references for EJB, which means cluster wide updates are not happening in unicast mode.

    Regards,
    Prakash

  6. Hi,

    I have a similar problem on WLS 10.3.6. I have a web application with JMS dependencies (with Spring and Camel).

    I have a JMSServer deployed on a migratable node (with an Exactly One policy), and the War deployed on cluster.

    When the running node is shutdown, the JMSServer migrate to another node and the web application is still running.

    Where the first one is recovery (and restart) the web app is deployed before that the node try to download the JNDI Tree, so a NameNotFoundException happens.

    Any sugestion?

    Thanks in advance.
    Martin

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s