TUD-OS / NUL

NOVA userland

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

host82573 driver broken

tfc opened this issue · comments

Hi there,

When using the "host82573" host driver, i get the following strange behaviour:
1.) The application requesting the MAC-address later gets the correct MAC address back.
2.) The MAC-adress of the hardware is set to a wrong value --> So when i reset the machine, PXE boot won't work with this wrong MAC address. I have to power off and power on the machine to reset it to the real MAC address.

Line 532 in nul/julian/host/host82573.cc writes a zero back to the register it has obtained the MAC address from:

    // We assume that the other RA registers have been disabled by the
    // software reset. (The spec says so.)
    //_hwreg[RAL0] = _mac.raw & 0xFFFFFFFFU;
    //_hwreg[RAH0] = 1ULL<<31 | _mac.raw >> 32;
    // XXX Disable all address filtering. We use promiscuous mode.
    _hwreg[RAH0] = 0;

Is it legit to just comment this out and use the lines above?

Furthermore, the "test_ip" application doesn't work:

nul/alexb/apps/ip_test/main.cc

if (!nul_ip_config(IP_NUL_VERSION, &arg) || arg != 0x4) return false;

This code expects version 0x4, but 0x5 is returned. When i change the expected value in the if-clause to 0x5, everything seems to run perfectly, until a packet is sent.

(5) Hello
(5) Region  count 1
(5)        0 virt 80000000 end 84000000 size  4000000 phys ba800000
(5) success - request timer attach
(5) success - request network attach
(5) success - got mac 00:1c:c0:b2:55:50
(5)         - sys_now unimpl.
(5) success - creating udp port 5555
(5) success - creating tcp port 7777
(5) [tcp]   - no connection via port 7777, sending packet failed
(5) [tcp]   - no connection via port 7777, sending packet failed
(5) [tcp]   - no connection via port 7777, sending packet failed
(5) [tcp]   - no connection via port 7777, sending packet failed
(5) [tcp]   - no connection via port 7777, sending packet failed
(5) [tcp]   - no connection via port 7777, sending packet failed
(5) [tcp]   - no connection via port 7777, sending packet failed
(5) update  - got ip=192.168.0.6 mask=255.255.128.0 gw=192.168.0.1
(5) [tcp]   - trying to connect 192.168.0.6:49153 -> 127.0.0.1:7777 - err=0
(5) success - connecting from 49153 to 7777 tcp port
(5) [tcp]   - no connection via port 49153, sending packet failed
(5) [tcp]   - no connection via port 49153, sending packet failed
(5) [tcp]   - no connection via port 49153, sending packet failed

Is this test_ip application still supposed to work with the current version?

test_ip was not updated for a while, I will have a look in the next days.

Regarding host82573.cc: If you do not need your network device to be set to promiscuous mode, then you can safely comment out the _hwreg[RAH0] = 0; line.

What NIC are you using? Can you post lspci -vvvnn output (as root) from this box?

Btw: it sounds like the NIC is not properly reset when you reboot the box. It should read back RAH/L0 from ROM.

After reset, if the NVM is present, the first register (Receive Address register 0) is loaded from the IA field in the NVM

So we are back to the point that the NIC is not properly reset, when you reboot.

I think we have two ICH10 boxes. I can check whether they have the same issue next week.

Mmhh... I re-read the relevant parts of the spec and the Linux e1000e driver. It shouldn't be necessary to set RAL/RAH for our usecase. I'll come up with a patch.

@tfc Can you give the above version a spin and report back?

@tfc: I updated the test, please test it. You can basically send some data to the udp resp. tcp port, see the commit message.

Okay, i just applied all the patches. The problem with the MAC-address is solved. Thank you very much.

Patches regarding ip_test:
This works great in Qemu! But i'm afraid i get the following error messages on real hardware:

(5) Hello
(5) Region  count 1
(5)        0 virt 80000000 end 84000000 size  4000000 phys ba800000
(5) success - request timer attach
(5) success - request network attach
(5) success - got mac 00:1c:c0:b2:55:50
(5)         - sys_now unimpl.
(5) success - creating udp port 5555
(5) success - creating tcp port 7777
82573 c8: Link is UP.
(5) update  - got ip=192.168.0.6 mask=255.255.128.0 gw=192.168.0.1
(5) [tcp]   - trying to connect 192.168.0.6:49153 -> 127.0.0.1:7777 - err=0
(5) success - connecting from 49153 to 7777 tcp port
(5) [tcp]   - warning - err -10, port 49153, pcb_open=0x0, pcb_listing=0x8001a36c
(5) failure - connection established

I'll start digging in the code looking for the exact differences in the execution flow between qemu/real hw now, but maybe someone has an idea...

You may comment out all stuff after line 138 until 158 to simplify it for the beginning (it's the part sending itself packets). The part is not required to receive packets via the wire.

diff --git a/alexb/apps/ip_test/main.cc b/alexb/apps/ip_test/main.cc
index f27910e..957910e 100644
--- a/alexb/apps/ip_test/main.cc
+++ b/alexb/apps/ip_test/main.cc
@@ -134,7 +134,8 @@ class TestIP : public NovaProgram, public ProgramConsole
             Logging::printf("failed  - starting timer\n");

           //dump ip addr if we got one
-          if (nul_ip_config(IP_IPADDR_DUMP, NULL)) {
+          if (nul_ip_config(IP_IPADDR_DUMP, NULL)) { }
+/*

             conn.port = 7777;
             conn.addr = (1 << 24) | 127; //127.0.0.1
@@ -155,6 +156,7 @@ class TestIP : public NovaProgram, public ProgramConsole
             void const * data;
           } arg = { conn.port, 7, "blabla" };
           nul_ip_config(IP_TCP_SEND, &arg);
+*/
         }

         while (netconsumer->has_data()) {

The MAC address patch has been merged to master.

alex-ab:
Ok, this works pretty great now, thank you!
The only difference between running it on Qemu and a real machine is now that 127.0.0.1 doesn't work on real HW. It works in Qemu...

It seems that lwIP needs some special convincing to handle the loopback device: http://www.nongnu.org/lwip/loopif_8c.html

I am not sure whether the current code handles this. If this is the problem, qemu is buggy in reflecting 127.0.0.1/8 traffic back to its VM.

You're right, the lwip port doesn't use the loopback device at all. So, I will uncomment the code in the test.