wenyuzhao / mmtk-core

Memory Management ToolKit

Home Page:https://www.mmtk.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

LXR Crash on HBase Launch

kk2049 opened this issue · comments

Hello,

I am currently conducting performance testing on LXR with HBase and have encountered some issues. The commit ID of mmtk-core I am using is 333ffb8ad9f9089fcad4338eae0ca9bf81d4955e, which is the latest commit of branch lxr-2023-07-09. Below is the reproduction process for the identified bug.

wget https://archive.apache.org/dist/hbase/2.4.14/hbase-2.4.14-bin.tar.gz
tar -zxvf hbase-2.4.14-bin.tar.gz

export MMTK_PLAN=Immix 
export TRACE_THRESHOLD2=10 
export LOCK_FREE_BLOCKS=32 
export MAX_SURVIVAL_MB=256 
export SURVIVAL_PREDICTOR_WEIGHTED=1
export JAVA_HOME="MY-LXR-JDK-HOME"
export HBASE_OPTS="-Xms1650m -Xmx1650m  -XX:+UseThirdPartyHeap "

bash ./hbase-2.4.14/bin/start-hbase.sh

After the script execution is complete, HBase did not start correctly. By examining the logs at ./hbase-2.4.14/log, the following error messages can be observed.

2024-03-01 02:05:27,911 ERROR [main] impl.MetricsSourceAdapter: Error getting metrics from source JvmMetrics
java.lang.IllegalArgumentException: committed = 9223372036854775807 should be < max = 1730150400
    at java.management/java.lang.management.MemoryUsage.<init>(MemoryUsage.java:166)
    at java.management/sun.management.MemoryImpl.getMemoryUsage0(Native Method)
    at java.management/sun.management.MemoryImpl.getHeapMemoryUsage(MemoryImpl.java:71)
    at org.apache.hadoop.metrics2.source.JvmMetrics.getMemoryUsage(JvmMetrics.java:146)
    at org.apache.hadoop.metrics2.source.JvmMetrics.getMetrics(JvmMetrics.java:138)
    at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:200)
    at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:183)
    at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:156)
    at java.management/com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassName(DefaultMBeanServerInterceptor.java:329)
    at java.management/com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:315)
    at java.management/com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522)
    at org.apache.hadoop.metrics2.util.MBeans.register(MBeans.java:66)
    at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.startMBeans(MetricsSourceAdapter.java:223)
    at org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.start(MetricsSourceAdapter.java:101)
    at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.registerSource(MetricsSystemImpl.java:268)
    at org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:233)
    at org.apache.hadoop.metrics2.source.JvmMetrics.create(JvmMetrics.java:111)
    at org.apache.hadoop.metrics2.source.JvmMetrics$Singleton.init(JvmMetrics.java:61)
    at org.apache.hadoop.metrics2.source.JvmMetrics.initSingleton(JvmMetrics.java:120)
    at org.apache.hadoop.hbase.metrics.BaseSourceImpl$DefaultMetricsSystemInitializer.init(BaseSourceImpl.java:54)
    at org.apache.hadoop.hbase.metrics.BaseSourceImpl.<init>(BaseSourceImpl.java:111)
    at org.apache.hadoop.hbase.io.MetricsIOSourceImpl.<init>(MetricsIOSourceImpl.java:42)
    at org.apache.hadoop.hbase.io.MetricsIOSourceImpl.<init>(MetricsIOSourceImpl.java:37)
    at org.apache.hadoop.hbase.regionserver.MetricsRegionServerSourceFactoryImpl.createIO(MetricsRegionServerSourceFactoryImpl.java:95)
    at org.apache.hadoop.hbase.io.MetricsIO.<init>(MetricsIO.java:32)
    at org.apache.hadoop.hbase.io.hfile.HFile.<clinit>(HFile.java:172)
    at org.apache.hadoop.hbase.regionserver.HRegionServer.<init>(HRegionServer.java:591)
    at org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:422)
    at org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster.<init>(HMasterCommandLine.java:318)
    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
    at org.apache.hadoop.hbase.util.JVMClusterUtil.createMasterThread(JVMClusterUtil.java:122)
    at org.apache.hadoop.hbase.LocalHBaseCluster.addMaster(LocalHBaseCluster.java:222)
    at org.apache.hadoop.hbase.LocalHBaseCluster.<init>(LocalHBaseCluster.java:169)
    at org.apache.hadoop.hbase.LocalHBaseCluster.<init>(LocalHBaseCluster.java:112)
    at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:240)
    at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:145)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
    at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:144)
    at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2934)

Thank you for your attention to this matter. Let me know if you need any further information.

Hi @kk2049, thanks for reporting the error. I can confirm that the error exists even with the latest MMTk. Our OpenJDK binding does not yet fully support some interfaces, e.g., inspecting heap usage.

Another issue I noticed is that you were probably running Immix, not LXR. Immix is a fully stop-the-world, single-generation, full-heap GC that differs from LXR in many ways. At the time we were conducting our artifact evaluation, we hadn't had enough time to clean up our codebase sufficiently. The version we submitted for evaluation was directly piggybacked on the Immix GC. Since May 2022, LXR has undergone significant refactoring, one of which (b00a332) was to move the code to a separate GC plan, no longer piggybacking on Immix. You should use export MMTK_PLAN=LXR now to enable it correctly.

If you are building JDK with MMTk/LXR on your own, please remove any additional cargo features. Also, for precise performance measurement, MMTk by default counts all the malloc-ed metadata as part of the Java heap size. This differs from OpenJDK GCs. Please enable the Rust feature no_meta_counting in mmtk-core to disable this behavior, for a fair comparison with OpenJDK GCs.

Also, the runtime GC parameters TRACE_THRESHOLD2, LOCK_FREE_BLOCKS, MAX_SURVIVAL_MB, SURVIVAL_PREDICTOR_WEIGHTED should be removed. These parameters have changed significantly over the past year. Their current default values are relatively optimized and should just work.

Another issue related to the JVM args: Based on what I found in your AE repo, -XX:ParallelGCThreads=8 -XX:ConcGCThreads=2 did not work with MMTk in July 2023. LXR would always use all the hardware threads for both concurrent and stop-the-world GC. Using export MMTK_THREADS=8 would set both parallel and concurrent GC threads to 8.

However, the latest LXR in the LXR branch has fixed this issue and should now respect the numbers specified in -XX:ParallelGCThreads=8 -XX:ConcGCThreads=2.

Re-submitted as an upstream issue: mmtk/mmtk-openjdk#270

Thank you for providing valuable information. We are currently in the process of re-evaluating the performance of LXR based on your feedback. However, it appears that the performance we are observing still differs from what is documented in the performance history at https://github.com/wenyuzhao/mmtk-core/blob/lxr-2023-07-09/performance-history/history.md. Could you kindly share the parameters used to reproduce the data mentioned in the performance history?

Additionally, if there are any recommended stable versions of LXR that you believe would be more suitable for our testing environment? Greatly appreciate your guidance!

We use an automation tool called running-ng for precise and reproducible performance evaluation. And I use this tool to periodically monitor LXR performance change over time, to check if it still matches the claims we made in the paper.

For example, the config for the latest result is stored in these two files. Some noticeable details:

  • The configs of the G1 and LXR runs. common3, hs, and tph are modifiers to add jvm args or env vars to the run. This is the list of all the modifers, but we only enable some of them. Expanding them will show all the jvm and env args we used for this evaluation.
  • The git commits and rust features for building LXR
  • The per-benchmark min heap sizes. This evaluation used ~2x (precisely, 1.989x) of the min heap size as the actual heap size.
  • For each config, we ran 40 invocations, each invocation run the same workload in a loop 5 times ("5 iterations"). We only collect the results from the last iteration.
  • The machine spec used to run this evaluation: Ryzen 9 5950X, 64GB RAM, Ubuntu 22.04. This is the same machine we used for the paper evaluation.

Here are two lusearch logs produced by this evaluation. Each file contains the 40 invocations of LXR or G1. If you manually expand the modifiers in runbms.yml, it should match the the first line of these two log files. The numbers in the log files should also match the graphs here http://squirrel.anu.edu.au/plotty-public/wenyuz/v8/p/CaGxpz

The whole evaluation is a relatively automated process. You should be able to reproduce some similar results if you use the same parameters, same commits, and roughly same machine specs.

You can also try to directly use running-ng to run the benchmark. You need to update runbms.yml at the following places:

  • L514 to point to your jdk-mmtk-lxr build
  • L506 change to remote_host:
  • Fix paths at L275-L288, L423-L426, and L614, pointing to your local copies. You should be able to find these files at here and here

With all these fixed, use the following command to run the benchmark:

running runbms -p lxr-xput-regression ./logs runbms.yml 12 7

if there are any recommended stable versions of LXR

LXR is not yet fully stable. But the above evaluation run had the best performance results. So probably just use these commits for your evaluation. You can also use the latest LXR commits, the performance should be a little bit better than this evaluation (I haven't upload the results yet).

Sorry one thing I forgot to mention: For a fair comparison with ZGC, you may want to disable compressed pointers. But I'm dealing with some issues on production right now. I believe one of my workarounds only works when enabling compressed oops, otherwise it will crash. Recent versions of LXR probably can't run without compressed pointers. (But lxr-2023-07-09 probably can).

Hi @kk2049, I've recently taken a look at the AE repo again, particularly focusing on the script used for running the dacapo benchmarks. I noticed a few potential areas for refinement in the methodology that you're currently employing.

Firstly, a single benchmark invocation cannot lead to a stable result and hence a strong conclusion. There can be noise across different runs. It would be better to run more invocations, perhaps ~40 or more. Please note that this is different to the -n 6 argument passed to dacapo, which is used for warmup. The overall result should be the average or geomean of all invocations, with error bars.

Secondly, it’s not accurate to compare two JDK configs with more than one variable changed. For instance:

  • L31 vs L37: ZGC doesn't use compressed pointers, so there are two variables in this comparison. It’s not possible to attribute any performance difference to either the GC algorithm change or the compressed pointers.
  • L31 vs L47: Similarly, directly comparing two GCs on two different JDKs might lead to some skewed conclusions. JDK21 has numerous compiler optimizations that significantly enhance the overall performance, so GenZ on JDK21 cannot directly compare against G1 on JDK11, with compressed oops on.

Lastly, it would be better to use the latest stable release of dacapo-23.11-chopin. The version you were using, dacapo-evaluation-git-b00bfa9, is a very early-stage snapshot (~2 years ago) and many issues have been resolved since then in the stable release.

We also adapt the above methodology for all our evaluations. Hope this can be helpful for both your evaluation and reproducing our results.

Thank you for your previous guidance! @wenyuzhao

We are working on addressing some of the shortcomings you mentioned. As you may know, we have a modified version of h2 which includes an additional parameter to limit the instantaneous emission rate of requests to a certain value. Currently, we are conducting tests on the response time curves of both lxr and g1. The launch command for lxr has been adjusted based on your provided instructions. The lxr version has been updated according to your recommendation. Could you please assess whether the data we have obtained appears reasonable? Are we employing lxr in the correct manner?

Your guidance is highly valued and appreciated!

Launch script

# LXR
MMTK_PLAN=LXR numactl -C 0-7 -m0 $LXR_JDK/jdk/bin/java -XX:+UseThirdPartyHeap -XX:MaxMetaspaceSize=1024m -XX:MetaspaceSize=1024m -Xmx2382m -Xms2382m -Xlog:gc=info -jar $DACAPO_PATH h2 --throttle $Throughput

# G1
numactl -C 0-7 -m0 $JDK11/java -XX:MaxMetaspaceSize=1024m -XX:MetaspaceSize=1024m -Xmx2382m -Xms2382m  -XX:ParallelGCThreads=8 -XX:ConcGCThreads=2 -XX:InitiatingHeapOccupancyPercent=65 -XX:+UseG1GC -Xlog:gc=info -jar $DACAPO_PATH h2 --throttle $Throughput

RT Curve

G1
Thro        metered-p99
1970.96	    4319.8	
3882.24	    3985	
5738.16	    2650.8	
7535.32	    3333.8	
9205.94	    5249.6	
10656.96    5100.2	
11903.62    5932	
12901.96    6491	
13811.54    8696.4	
14546.1	    11424.8	
15194.12    12427.6	
15818.56    15096.2	
16246.32    18284.6	

LXR
Thro        metered-p99
1966.96     3581.2	
3864.72	    4457.6	
5683.22	    1686.4	
7421.88	    7354.2	
8957.3	    7377.8	
10240.92    6580.4	
11316.18    5844.2	
12263.82    12238.4	
12922.76    10192.8	
13581.58    20549.4	
14144.34    13480.6	
14696.84    13267.8	
15098.82    17917.2	

LXR version

commits:
    mmtk-core: 4ab99bb8270
    mmtk-openjdk: cdbc8decffa
    openjdk: 7caf8f7d

Looks like LXR is enabled correctly. FYI this commit (4ab99bb) requires to enable the mmtk/opt cargo feature. Otherwise it will have performance issues.

I'm not sure how h2 throttling is enabled. H2 in dacapo is a fixed-workload benchmark. I’m not sure whether scaling throughput like this can work well. It may affect metered latency calculation(?)