Get amount of transfered bytes

Question

Get amount of transfered bytes

ariya opened this issue 13 years ago · comments

There is no way to reliably get the amount of transfered bytes for a request.

bodySize is not available for responses with stage==end and the content-length header is not very reliable, especially as it seems to be unset for requests with content-encoding "gzip".

I guess the bodySize has to be summed up for each bunch of data received and made available in a respose with stage==end.

Disclaimer:
This issue was migrated on 2013-03-15 from the project's former issue tracker on Google Code, Issue #156.
🌟 9 people had starred this issue at the time of migration.

Adeel Raza commented 8 years ago

+1

abbasharoon commented 7 years ago

+1

Ariya Hidayat · Answer 1 · Sun Jul 03 2011 20:26:43 GMT+0800 (China Standard Time)

ariya.hi...@gmail.com commented:

Metadata Updates

Label(s) removed:
- Type-Defect
Label(s) added:
- Type-Enhancement
Milestone updated: FutureRelease (was: ---)
Status updated: Accepted

Marcel Duran · Answer 2 · Sat Mar 24 2012 15:12:46 GMT+0800 (China Standard Time)

marceldu...@gmail.com commented:

Any reason why Content-Length header isn't available for Content-Encoding:gzip responses?

While issue 158 is still open there's no way to get both compressed and uncompressed sizes of gzip responses, therefore netsniff.js example that generates HAR file is a bit misleading:

https://github.com/ariya/phantomjs/blob/master/examples/netsniff.js#L51

According to HAR spec (http://www.softwareishard.com/blog/har-12-spec/#content), content.size:
"... should be equal to response.bodySize if there is no compression and bigger when the content has been compressed."

Marcel Duran · Answer 3 · Sat Mar 24 2012 17:29:45 GMT+0800 (China Standard Time)

marceldu...@gmail.com commented:

Another gzip/raw content issue:

By running:
phantomjs netsniff.js http://search.yahoo.com

The generated HAR shows that the main html response headers contains Content-Encoding: gzip and the bodySize is 12726.

However by running curl with compression it gets different result:

curl search.yahoo.com -H "Accept-Encoding:gzip" | wc -c
4328

And without compression it gets similar size for what phantomjs is returning:

curl search.yahoo.com | wc -c
12120

Erik Bremen · Answer 4 · Wed Jun 12 2013 08:12:34 GMT+0800 (China Standard Time)

I see this was migrated to 'feature enhancement', but I think this should be considered a bug. Anyone using the HAR output from netsniff.js are seeing uncompressed bytes only, and are getting an inaccurate representation of actual bytes transferred.

Is this data not easily accessible from QT?

Sveinung Tord Røsaker · Answer 5 · Wed Sep 25 2013 21:21:18 GMT+0800 (China Standard Time)

+1 on this, any suggestion where the extra bytes are comming from?

Florian Weber · Answer 6 · Sat Nov 09 2013 00:40:22 GMT+0800 (China Standard Time)

For me all bytesizes on CSS/JS Files are shown significant smaller then tey are in reallity (According to Chorme Dev Tools and Firebug)
Also comparing to the gzip Files size they are shown too small.

Imagesizes are all shown correct. Anybody else has that kind of Problem?

Zack Weinberg · Answer 7 · Mon Apr 20 2015 02:21:11 GMT+0800 (China Standard Time)

Seems to still be an issue in 2.0. I get the impression Qt/Webkit changes might be needed?

djberriman · Answer 8 · Wed Apr 29 2015 21:21:43 GMT+0800 (China Standard Time)

I beleive if you are talking to a chunking server content-length is not set, instead the size of each chunk is passed before the data itself and when a size of zero is returned the resource is complete. So that may explain why content-length is not present sometimes.

Looking at networkaccessmanager.cpp NetworkAccessManager::handleStarted() sets the bodySize to reply->size(); NetworkAccessManager::handleFinished does not set the bodySize to presumably it is left as is and is the size of the content (when not chunking) or the first chunk.

QTNetworkReply has a downloadProgress signal which returns bytesReceived and bytesTotal. Perhaps that could be used.

NetworkAccessManager::handleFinished could set the bodySize to the content-length where it is available.

Its a pity there does not appear to be a signal for each chunk (unless downloadProgress provides that) as it would then be possible to determine the size downloaded correctly by simply adding the chunksize to bodySize

djberriman · Answer 9 · Wed Apr 29 2015 23:06:36 GMT+0800 (China Standard Time)

I did some more research and it appears QT must be removing the content-length header when gzip is used. I did the same request via telnet and via phantomjs, note chunking is not in use.

telnet response:-

Cache-Control: private
Content-Type: text/html; charset=utf-8
Content-Encoding: gzip
Vary: Accept-Encoding
Server: Microsoft-IIS/8.0
Set-Cookie: ASP.NET_SessionId=onq34pudvbwazeh04ksylpfs; path=/; HttpOnly
X-AspNetMvc-Version: 4.0
X-AspNet-Version: 4.0.30319
X-Powered-By: ASP.NET
X-Frame-Options: SAMEORIGIN
Date: Wed, 29 Apr 2015 14:54:17 GMT
Content-Length: 22767

phantom response:-

Cache-Control = private
Content-Type = text/html; charset=utf-8
Content-Encoding = gzip
Vary = Accept-Encoding
Server = Microsoft-IIS/8.0
Set-Cookie = ASP.NET_SessionId=e02yvkniwvolblo31qyt42ia; path=/; HttpOnly
X-AspNetMvc-Version = 4.0
X-AspNet-Version = 4.0.30319
X-Powered-By = ASP.NET
X-Frame-Options = SAMEORIGIN
Date = Wed, 29 Apr 2015 14:51:42 GMT

It would appear QT is for some reason removing the header.

djberriman · Answer 10 · Thu Apr 30 2015 16:42:22 GMT+0800 (China Standard Time)

Changenetworkaccessmanager.cpp
data["bodySize"] = reply->size();
to
data["bodySize"] = reply->header(QNetworkRequest::ContentLengthHeader);
This then means that when Content-Length is passed bodySize is correct.
It won't work (but then neither does the current code) when chunking is in use or the Content-Length is not passed by QT such as when gzip is used. Disabling gzip in the 2nd case works round that issue.

djberriman · Answer 11 · Thu Apr 30 2015 16:55:59 GMT+0800 (China Standard Time)

from what I can see size() is just the size of the QbyteArray.......

djberriman · Answer 12 · Thu Apr 30 2015 17:10:57 GMT+0800 (China Standard Time)

for gzip you need to set the header yourself to accept gzip as there is a bug in QT

https://bugreports.qt.io/browse/QTBUG-41840

Content-Length is then returned unfortunately you then run accross bug https://forum.qt.io/topic/2308/content-encoding-gzip-with-qt-webkit/9 and the content is not decompressed.

Gaël Métais · Answer 13 · Thu Apr 30 2015 17:16:28 GMT+0800 (China Standard Time)

Great digging work @djberriman! Go on!
Thousands of people are supporting you!

👍 👍 👍

djberriman · Answer 14 · Thu Apr 30 2015 17:31:22 GMT+0800 (China Standard Time)

QT does indeed specifically remove the content-length header on gzip data

void QHttpNetworkReplyPrivate::removeAutoDecompressHeader()
{
// The header "Content-Encoding = gzip" is retained.
// Content-Length is removed since the actual one send by the server is for compressed data
QByteArray name("content-length");
QList<QPair<QByteArray, QByteArray> >::Iterator it = fields.begin(),
end = fields.end();
while (it != end) {
if (qstricmp(name.constData(), it->first.constData()) == 0) {
fields.erase(it);
break;
}
++it;
}

}

djberriman · Answer 15 · Thu Apr 30 2015 18:02:10 GMT+0800 (China Standard Time)

From what I can see from the QT source code it may well be worth using the QTNetworkReply downloadProgress signal which returns bytesReceived and bytesTotal. I believe this will also mean chunked data will work correctly as it will fire for each chunk.

djberriman · Answer 16 · Thu Apr 30 2015 20:31:01 GMT+0800 (China Standard Time)

I appear to have a fix for this not sure how to submit it so I will work on that in a moment.

Basically phantomjs is not trapping one of the emits from QT so the size returned is that of the first read. We need to add another stage as well as 'start' and 'end' which I have called 'data'. If you cater for this in your onResourceReceived function and add up the res.bodySize returned each time it is triggered for a particular resource (End will return 0) then you will have the true size of the content. This should I believe work regardless of conent-length being passed, gzip or chunking. Do not rely on Content-Length.

replace handleStarted() in networkaccessmanager.cpp with the following code.

void NetworkAccessManager::handleStarted()
{
QNetworkReply reply = qobject_cast<QNetworkReply>(sender());
if (!reply)
return;

QVariantList headers;
foreach (QByteArray headerName, reply->rawHeaderList()) {
    QVariantMap header;
    header["name"] = QString::fromUtf8(headerName);
    header["value"] = QString::fromUtf8(reply->rawHeader(headerName));
    headers += header;
}

QVariantMap data;
if (!m_started.contains(reply)) {
  m_started += reply;
  data["stage"] = "start";
}
else {
  data["stage"] = "data";
}
data["id"] = m_ids.value(reply);
data["url"] = reply->url().toEncoded().data();
data["status"] = reply->attribute(QNetworkRequest::HttpStatusCodeAttribute);
data["statusText"] = reply->attribute(QNetworkRequest::HttpReasonPhraseAttribute);
data["contentType"] = reply->header(QNetworkRequest::ContentTypeHeader);
data["bodySize"] = reply->size();
data["redirectURL"] = reply->header(QNetworkRequest::LocationHeader);
data["headers"] = headers;
data["time"] = QDateTime::currentDateTime();

emit resourceReceived(data);

}

Sveinung Tord Røsaker · Answer 17 · Thu Apr 30 2015 22:05:21 GMT+0800 (China Standard Time)

This is awesome @djberriman

djberriman · Answer 18 · Thu Apr 30 2015 23:19:09 GMT+0800 (China Standard Time)

Just be aware the total size returned appears to be the uncompressed size not the content-length when gzip is being used, ran a test allowing gzip and one not allowing gzip and got same results.

kyle tilman · Answer 19 · Wed Jun 17 2015 02:01:19 GMT+0800 (China Standard Time)

@djberriman Any thoughts on getting the gzip sizes?

atwenzel · Answer 20 · Sat Jul 18 2015 03:02:02 GMT+0800 (China Standard Time)

@djberriman Thanks so much for this fix, this is exactly what I need for my project.

Can anyone give a general idea of the changes that should be made to the onResourceReceived function, especially in the context of the netsniff.js example (https://github.com/ariya/phantomjs/blob/master/examples/netsniff.js) I've built phantomjs with this fix but I'm a little unsure how to implement it in a script. Thanks!

EDIT: I seem to have solved my issue. For anyone else with as little phantomjs experience as I have who finds this thread, in the above example, you can change

page.onResourceReceived = function (res) {
if (res.stage === 'start') {
page.resources[res.id].startReply = res;
}
if (res.stage === 'end') {
page.resources[res.id].endReply = res;
}
};

to

page.onResourceReceived = function (res) {
    if (res.stage === 'start') {
        page.resources[res.id].startReply = res;
    }
    if (res.stage === 'data') {
        page.resources[res.id].startReply.bodySize += res.bodySize;
    }
    if (res.stage === 'end') {
        page.resources[res.id].endReply = res;
    }
};

And it should work with @djberriman's change.

Devrim Tufan · Answer 21 · Wed Dec 16 2015 06:01:48 GMT+0800 (China Standard Time)

@ariya @djberriman what's the resolution on this one?

djberriman · Answer 22 · Thu Dec 17 2015 17:33:21 GMT+0800 (China Standard Time)

@tufandevrim Just waiting for @ariya to put it in the main line

Francois-Xavier P. Darveau · Answer 23 · Sat Mar 12 2016 00:20:47 GMT+0800 (China Standard Time)

@ariya @djberriman ... did we finally merged this one in 2.1.1 ? Fix looks good to me.

Jorge Vargas · Answer 24 · Thu Mar 31 2016 02:21:30 GMT+0800 (China Standard Time)

Has this been solved? Thanks.

djberriman · Answer 25 · Tue Apr 05 2016 19:55:59 GMT+0800 (China Standard Time)

onResourceReceived function should read more like:-

if (res.stage == 'start') {
urlRequestedBytes[res.id] = res.bodySize;
}
else {
if (res.bodySize != undefined) {
urlRequestedBytes[res.id] += res.bodySize;
}
}

During my testing I found both 'data' and 'end' could return a size depending on whether chunking is in use and that it can also be returned as undefined. To get the correct size in all cases you need to add the value returned in bodySize in each 'start','data' and 'end'.

djberriman · Answer 26 · Thu Apr 14 2016 04:45:08 GMT+0800 (China Standard Time)

Just a quick update on content-length with encoded response (gzip). The lack of a content-length header was due to a feature of QT whereby they physically removed the header if it was compressed. Following proof of the bug/feature and some discussion the code will now be removed from QT that does this which means content-length will always be passed if returned from the server (chunking servers for instance don't return a length).

Stéphane Bachelier · Answer 27 · Thu Jun 02 2016 02:20:34 GMT+0800 (China Standard Time)

@djberriman for gzipped response you will probably have no header content-length as the content will be stream which you can verify if there is the header "Transfer-Encoding: chunked".
If the content has already been gzipped before (cache, disk, ...), the server will set the content-length header as it will know the length of the gzip archive.

jsut · Answer 28 · Fri Aug 26 2016 03:32:20 GMT+0800 (China Standard Time)

@djberriman with regards to the content length, the current version of QT will emit the content length header though 'downloadMetaData', but i'm not convinced the value of the contentLength header is really the best thing to use if you actually want the amount of bytes transferred, that omits the size of the header, which if you have a lot of cookies can be significant, especially across all the requests required to render a web page.

It seems like using downloadProgress, which you mentioned earlier might be a better approach, depending on what your use case is. Better yet would be if the QT library had something like reply->bytes_transferred. based on the documentation of downloadProgress[1] though, it does seem like that is the best approach. Though I think QT removing the contentLength header is kind of dumb too.

[1] http://doc.qt.io/qt-5/qnetworkreply.html#downloadProgress

Tom Gallagher · Answer 29 · Fri Jan 27 2017 01:04:09 GMT+0800 (China Standard Time)

Strangely I'm in need of the content length header only. What's the state of play on this? Has this been resolved in a later version of Phantom. I'm using 2.1.1