hunterhacker / jdom

Java manipulation of XML made easy

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

AbstractReaderXSDFactory: SchemaFactory needlessly managed as a thread-global

chrludwig opened this issue · comments

I have a test class that runs multiple scenarios for a validating SaxBuilder - some with a Catalog-based resolver, some without. During development, when I tested the individual test cases, everything was green. But when I run all tests in one go, then some tests reproducibly fail.

I eventually tracked this down to the fact that AbstractReaderXSDFactory caches the SchemaFactory returned by the SchemaFactoryProvider in a static thread-local variable. When multiple of my test cases are run subsequently in the same thread, they use the same SchemaFactory even though I passed a different SchemaFactoryProvider to the constructor.

The rationale given in the source code is as follows:

	/**
	 * Use a Thread-Local system to manage SchemaFactory. SchemaFactory is not
	 * thread-safe, so we need some mechanism to isolate it, and thread-local is
	 * a logical way because it only creates an instance when needed in each
	 * thread, and they die when the thread dies. Does not need any
	 * synchronisation either.
	 */
	private static final ThreadLocal<SchemaFactory> schemafactl = new ThreadLocal<SchemaFactory>();

I don't think this thread-local is needed, though: The thread-local is exclusively accessed in the private static method getSchemaFromSource(SchemaFactoryProvider sfp, Source ... sources). This method is only called (directly or indirectly) from the constructors; the constructors are executed in a single thread, respectively. The caller can easily make sure that different threads use different SchemaFactoryinstances.

The current implementation makes it impossible to use validating SaxBuilders with different schema factories if there is a chance that they are used in the same thread.

Looking through the code I agree that there's a problem in there. My concern is about what motivated putting the ThreadLocal in there to start with... I presume it's a performance concern.... but you're right, the sfp passed in can be coded to account for reusing factories where possible.

Regardless, your use case does show the bug. I have to consider possible implications for fixes. I presume you're not using XMLReaderXSDFactory but instead have a custom extends on AbstractReaderXSDFactory? (I hope so, because that is a "rare" thing to do, and it would make me more comfortable with options for fixes).

If you do, it makes sense for me to move the ThreadLocal logic in to the concrete implementation of the SchemaFactoryProvider here. I believe that solves the original performance concern, while putting the thread-local logic in a place that is contained to just a single factory provider.

Thoughts? (I believe this would be essentially implementing your suggested conceptual fix)

I want to mitigate XXE also when reading schemas. I therefore want to make sure the SaxParser as well as the SchemaFactory used by the XMLReaderJDOMFactory are configured as recommended in the OWASP XXE Prevention Cheat Sheet. XMLReaderXSDFactorydoes not give you sufficient control over the components, that's why I extended AbstractReaderXSDFactory.

In the meantime I realized it is sufficient for me to directly extend AbstractReaderSchemaFactory. I only require a constructor that - besides parser and schema factory - takes the schemas as Sources. So all that was left to implement was the equivalent of AbstractReaderXSDFactory#getSchemaFromSource(SchemaFactoryProvider sfp, Source ... sources); that's basically return schemaFactory.newSchema(sources); wrapped into a try ... catch block.

With this in place, all my tests pass.