google / ghost-userspace

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Errors when reproducing experiments (also when running userspace agents)

liborui opened this issue · comments

Description

Logs

I am trying to reproduce the experiments, and further do something new with ghOst.
But I came across an error below:
(Before I run this command, I finished to compiled the ghost-userspace with bazel build -c opt ...)

sudo ./bazel-bin/experiments/scripts/centralized_queuing.par cfs   # in the root of ghost-userspace. 
# I use "sudo" because it seems the python script ends with "Running CFS experiments... mount: only root can use "--options" option"

It turns out to be

Running CFS experiments...
mount: /dev/cgroup/cpu: cgroup already mounted on /sys/fs/cgroup/systemd.
mount: /dev/cgroup/memory: cgroup already mounted on /sys/fs/cgroup/systemd.
Output Directory: /tmp/ghost_data/2022-04-26 10:22:56                                                                                                                                                                                                                                   
{"throughputs": [10000, 20000, 30000, 40000, 50000, 60000, 70000, 80000, 90000, 100000, 110000, 120000, 130000, 140000, 150000, 160000, 170000, 180000, 190000, 200000, 210000, 220000, 230000, 240000, 250000, 260000, 270000, 280000, 290000, 300000, 310000, 320000, 330000, 340000, 
350000, 360000, 370000, 380000, 390000, 400000, 410000, 420000, 430000, 440000, 450000, 451000, 452000, 453000, 454000, 455000, 456000, 457000, 458000, 459000, 460000, 461000, 462000, 463000, 464000, 465000, 466000, 467000, 468000, 469000, 470000, 471000, 472000, 473000, 474000, 
475000, 476000, 477000, 478000, 479000, 480000], "output_prefix": "/tmp/ghost_data/2022-04-26 10:22:56", "binaries": {"rocksdb": "/dev/shm/rocksdb", "antagonist": "/dev/shm/antagonist", "ghost": "/dev/shm/agent_shinjuku"}, "rocksdb": {"print_format": "csv", "print_distribution": 
false, "print_ns": false, "print_get": true, "print_range": true, "rocksdb_db_path": "/dev/shm/orch_db", "throughput": 20000, "range_query_ratio": 0.0, "load_generator_cpu": 10, "cfs_dispatcher_cpu": 11, "num_workers": 6, "worker_cpus": [12, 13, 14, 15, 16, 17], "cfs_wait_type": 
"spin", "ghost_wait_type": "prio_table", "get_duration": "10us", "range_duration": "5000us", "get_exponential_mean": "1us", "batch": 1, "experiment_duration": "15s", "discard_duration": "2s", "scheduler": "cfs", "ghost_qos": 2}, "antagonist": null, "ghost": null}
Running experiment for throughput = 10000 req/s:
['/dev/shm/rocksdb', '--print_format', 'csv', '--noprint_distribution', '', '--noprint_ns', '', '--print_get', '', '--print_range', '', '--rocksdb_db_path', '/dev/shm/orch_db', '--throughput', '20000', '--range_query_ratio', '0.0', '--load_generator_cpu', '10', '--cfs_dispatcher_
cpu', '11', '--num_workers', '6', '--worker_cpus', '12,13,14,15,16,17', '--cfs_wait_type', 'spin', '--ghost_wait_type', 'prio_table', '--get_duration', '10us', '--range_duration', '5000us', '--get_exponential_mean', '1us', '--batch', '1', '--experiment_duration', '15s', '--discar
d_duration', '2s', '--scheduler', 'cfs', '--ghost_qos', '2', '--throughput', '10000']
experiments/rocksdb/cfs_orchestrator.cc:95(23984) CHECK FAILED: ghost::Ghost::SchedSetAffinity( ghost::Gtid::Current(), ghost::MachineTopology()->ToCpuList( std::vector<int>{options().load_generator_cpu})) == 0 [-1 != 0]
errno: 22 [Invalid argument]
PID 23984 Backtrace:
[0] 0x564e0ac5e487 : ghost_test::CfsOrchestrator::LoadGenerator()
[1] 0x564e0ac8561e : ghost_test::ExperimentThreadPool::ThreadMain()
[2] 0x564e0ac8756b : std::_Function_handler<>::_M_invoke()
[3] 0x7fe9aeb77de4 : (unknown)

Furthermore, I also tried this command under root of ghost-userspace

sudo bazel run fifo_agent

and it turns out to

Extracting Bazel installation...                                                                                                                                                                                                                                                        
Starting local Bazel server and connecting to it...                                                                                                                                                                                                                                     
ERROR: Skipping 'fifo_agent': no such target '//:fifo_agent': target 'fifo_agent' not declared in package '' defined by /home/emnets/ghost-userspace/BUILD                                                                                                                              
WARNING: Target pattern parsing failed.                                                                                                                                                                                                                                                 
ERROR: no such target '//:fifo_agent': target 'fifo_agent' not declared in package '' defined by /home/emnets/ghost-userspace/BUILD                                                                                                                                                     
INFO: Elapsed time: 10.222s                                                                                                                                                                                                                                                             
INFO: 0 processes.                                                                                                                                                                                                                                                                      
FAILED: Build did NOT complete successfully (1 packages loaded)                                                                                                                                                                                                                         
FAILED: Build did NOT complete successfully (1 packages loaded)

Env Info

And here is my environment version info:

lsb_release -a 
# LSB Version:	core-11.1.0ubuntu2-noarch:security-11.1.0ubuntu2-noarch 
# Distributor ID:	Ubuntu 
# Description:	Ubuntu 20.04.2 LTS 
# Release:	20.04 
# Codename:	focal
uname -mrs
# Linux 5.11.0+ x86_64

ghost-kernel hash: 5da05ec77890217e85947ff3573e1480579687d2
ghost-userspace hash: 79ecaeb

P.S. I am using a virtual machine to reproduce ghost. I am wondering if the virtual machine matters.
The virtual machine is using VMware workstation, with 8GB mem and 8 processors (each processor has one core).

Suggestion

My colleagues and I appreciate the paper and this open-source project of ghost.
But I came across many difficulties to conduct the experiments and reproduce the results, because the README do not mentioned this.
I have to refer the the (closed) issues and find the scattered commands to run the experiments.
If you could update the README with a more detailed steps, this will be great. And I could help you out if you need.

Hi Borui,

Thanks for opening this issue. The CFS and ghOSt experiments affine threads to logical cores 10, 11, 12, and so on (this is controlled by _FIRST_CPU in options.py). ghost::Ghost::SchedSetAffinity() calls sched_setaffinity(), which is failing with EINVAL. This error generally means that those logical cores do not exist in your system, and this makes sense given that you mentioned you have 8 logical cores in your machine. The experiment parameters in the Python files need to be changed for your machine -- I would imagine that setting _FIRST_CPU to 0 in options.py would fix your issue.

Your run command for fifo_agent is failing because there is no fifo_agent target. We used to have a fifo_agent target, but we renamed it in 7ab27ed. We have fifo_per_cpu_agent (a ghOSt scheduler with per-CPU ghOSt agents that each have their own FIFO runqueue) and fifo_centralized_agent (a ghOSt scheduler with a global ghOSt agent that has a single FIFO runqueue for the entire machine).

Thanks for the suggestion about a README with instructions. We definitely want to create this along with several extensive tutorials, though we do not have a set timeline for these yet. If you start writing a README/tutorials and want to push your work, we would be more than happy to accept it.

Please let me know if you have additional questions.

commented

I also have the same question, i modifiy the options.py _FIRST_CPU to 0,but it performance not ok
seu@ubuntu:$ cd ghost-userspace/
seu@ubuntu:
/ghost-userspace$ sudo su
[sudo] password for seu:
root@ubuntu:/home/seu/ghost-userspace# sudo ./bazel-bin/experiments/scripts/centralized_queuing.par cfs
Running CFS experiments...
Output Directory: /tmp/ghost_data/2022-12-13 17:03:08
{"throughputs": [10000, 20000, 30000, 40000, 50000, 60000, 70000, 80000, 90000, 100000, 110000, 120000, 130000, 140000, 150000, 160000, 170000, 180000, 190000, 200000, 210000, 220000, 230000, 240000, 250000, 260000, 270000, 280000, 290000, 300000, 310000, 320000, 330000, 340000, 350000, 360000, 370000, 380000, 390000, 400000, 410000, 420000, 430000, 440000, 450000, 451000, 452000, 453000, 454000, 455000, 456000, 457000, 458000, 459000, 460000, 461000, 462000, 463000, 464000, 465000, 466000, 467000, 468000, 469000, 470000, 471000, 472000, 473000, 474000, 475000, 476000, 477000, 478000, 479000, 480000], "output_prefix": "/tmp/ghost_data/2022-12-13 17:03:08", "binaries": {"rocksdb": "/dev/shm/rocksdb", "antagonist": "/dev/shm/antagonist", "ghost": "/dev/shm/agent_shinjuku"}, "rocksdb": {"print_format": "csv", "print_distribution": false, "print_ns": false, "print_get": true, "print_range": true, "rocksdb_db_path": "/dev/shm/orch_db", "throughput": 20000, "range_query_ratio": 0.0, "load_generator_cpu": 10, "cfs_dispatcher_cpu": 11, "num_workers": 6, "worker_cpus": [12, 13, 14, 15, 16, 17], "cfs_wait_type": "spin", "ghost_wait_type": "prio_table", "get_duration": "10us", "range_duration": "5000us", "get_exponential_mean": "1us", "batch": 1, "experiment_duration": "15s", "discard_duration": "2s", "scheduler": "cfs", "ghost_qos": 2}, "antagonist": null, "ghost": null}
Running experiment for throughput = 10000 req/s:
['/dev/shm/rocksdb', '--print_format', 'csv', '--noprint_distribution', '', '--noprint_ns', '', '--print_get', '', '--print_range', '', '--rocksdb_db_path', '/dev/shm/orch_db', '--throughput', '20000', '--range_query_ratio', '0.0', '--load_generator_cpu', '10', '--cfs_dispatcher_cpu', '11', '--num_workers', '6', '--worker_cpus', '12,13,14,15,16,17', '--cfs_wait_type', 'spin', '--ghost_wait_type', 'prio_table', '--get_duration', '10us', '--range_duration', '5000us', '--get_exponential_mean', '1us', '--batch', '1', '--experiment_duration', '15s', '--discard_duration', '2s', '--scheduler', 'cfs', '--ghost_qos', '2', '--throughput', '10000']
experiments/rocksdb/cfs_orchestrator.cc:87(2045) CHECK FAILED: ghost::GhostHelper()->SchedSetAffinity( ghost::Gtid::Current(), ghost::MachineTopology()->ToCpuList( std::vector{options().load_generator_cpu})) == 0 [-1 != 0]
errno: 22 [Invalid argument]
PID 2045 Backtrace:
[0] 0x55c7f5f521b9 : ghost_test::CfsOrchestrator::LoadGenerator()
[1] 0x55c7f5f792ee : ghost_test::ExperimentThreadPool::ThreadMain()
[2] 0x55c7f5f7b23b : std::_Function_handler<>::_M_invoke()
[3] 0x55c7f5f7e23d : std:🧵:_State_impl<>::_M_run()
[4] 0x7efc43502de4 : (unknown)

Hello,

The error output indicates that the failure happens on line 87 in cfs_orchestrator.cc. This line affines the load generator CPU to a logical core. It appears that your script sets the load generator to core 10, which means your system needs to have at least 11 cores for this to work. Also, it seems that the CFS dispatcher is affined to core 11 and the worker threads are affined to cores 12-17.

I assume you want to affine the threads to cores with lower IDs. Are you sure that you are setting FIRST_CPU_ to 0 in options.py?

Thanks,
Jack

commented

oh ,thanks , you reply so quickly
yes ,i am sure i set FIRST_CPU_ to 0 in the options.py
1671079920841
as you can see in the picture
sorry for the delay
if there is any other not configure right?
thank you very much!

commented

add:
I also modify NUM_ROCKSDB_WORKERS = 5 in the options.py ,but still report error "num_workers": 6
so , i guess it do not modify success,
then i reboot it , it also not work well

commented

i think i find the result of the problem:

  1. change the FIRST_CPU_ to 1
  2. need recompile it ,then it work well
commented

thank you ,very much !