danieleteti / delphimvcframework

DMVCFramework (for short) is a popular and powerful framework for WEB API in Delphi. Supports RESTful and JSON-RPC WEB APIs development.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Performance issue and optimization ideas - TMVCJsonDataObjectsSerializer.JsonObjectToDataSet

MPannier opened this issue · comments

Hello,

I have a JSON array with 7500 elements which I load into a TFDMemTable with 38 fields by using this function FDMemTable1.LoadFromJSONArrayString.
In my case, this takes about 14 sec. I did some research to speed up the data load. I found some optimizations for FDMemTable (BeginBatch/EndBatch and AutoCalcFields := False;).
Now the "load" takes about 12 sec. It is faster but not fast enough.

I have done a test with a classic for i := 1 to 8000 loop to insert some hard coded fake data into my FDMemTable. I would like to know which part is the bottleneck. This takes less than 200 ms!
That means my FDMemTable and also the 8000 records are not the problem.

After that, I have tested the JSON parsing. I've done something like this:

lJsonArray := TJDOJsonArray.Parse(Res.BodyAsString) as TJDOJsonArray;
try
for var J := 0 to Pred(lJsonArray.Count) do
begin
var lJsonObj : TJsonObject := lJsonArray.Items[J].ObjectValue;
DataSet.Append;
DataSetField1.AsInteger := lJsonObj.I['Field1'];
DataSetField2.AsWideString := lJsonObj.S['Field2'];
...
DataSet.Post;
end;

This takes only 200 ms (with all 38 fields). Compared to 12 sec, it is really fast.
Why is LoadFromJSONArrayString slower?

I found the reason in TMVCJsonDataObjectsSerializer.JsonObjectToDataSet. For a new test, I have uncommented the following line:

//lName := GetNameAs(ADataSet.Owner, Field.Name, Field.FieldName);

//if (IsIgnoredAttribute(AIgnoredFields, lName)) or (IsIgnoredComponent(ADataSet.Owner, Field.Name)) then
//continue;

//lName := TMVCSerializerHelper.ApplyNameCase(GetNameCase(ADataSet, ANameCase), lName { Field.FieldName } );

lName := Field.FieldName; //uncomment the previous lines and use only this one

Now the data load takes only 200 ms.
The problem-functions are GetNameAs, IsIgnoredComponent and GetNameCase. All of these functions use RTTI to get some information from TField.
I know we can't remove this functions, but can we speed up these functions?
I have some ideas, but I'm not sure what the best one is.
First a question: Can TField have the attributes MVCNameAsAttribute, MVCDoNotSerializeAttribute and MVCNameCaseAttribute? Maybe we can remove one of these functions?
Is it possible to check for all attributes at the same time? Then the RTTI stuff is done only one time, not three times.
Is caching an option? (field name and a list of attributes)
Is an additional parameter an option? Something like "IgnoreNameCase" or "FieldNameAsIs"?
I have no experience with RTTI. Maybe the RTTI code can be optimized?

What do You think? Any other suggestions?

Yes, unfortunately this method is called for each item (n) in the json array (from procedure TMVCJsonDataObjectsSerializer.JsonArrayToDataSet)

And the method you referenced calls those RTTI methods for each field (n) so this becomes an O(nm) performance calculations (8,000 rows x 38).

My initial thoughts would be to either build a RTTI structure (via a call to a nested proc) and then pass that structure to the method that converts each element of the array to a dataset row.
Others may have better ideas.

Yes guys, that part of code has been under my radar for too much time... I've to give it some attention.
@MPannier can you provide a reproducible and selfcontained example so that we can have a reference?
I'll try to think some solution in this week.

I have made a simple example (console application) with latest DMVC units and Delphi 11.
It seems to be faster than my real project (Delphi 10.4 and DMVC 3_1_0). But the RTTI stuff is still slower than without RTTI.
Example.zip

@MPannier the repo version contains many improvemenets over the 3.2.1-carbon so sound good that its faster than the latest released version. We can however still use the test you wrote to fine tuning it and, perhaps, put this test in the test suite as performance test.

Happy to help with testing

Hi guys, do you found some areas to improve with a reasonable gain? The current version is quite fast in the @MPannier test.

I will do a new test with my real world project this week and give You feedback.

I wrote a simple VCL app to create and populate a 3 column dataset with 20k records, serialise them out to a JSON string and then read them back.

The deserialisation with the current code base is no faster than the codebase we have in our production code (from Sept 2021). We're seeing around 10 seconds to deserialise the JSON, the root of the issue is the repeated lookup of the dataset field using RTTI. 20k x 3 is 60k lookups where 3 would suffice.

ProjectJSON.zip

To improve performance you'd need to providing an overload to
procedure TMVCJsonDataObjectsSerializer.JsonObjectToDataSet(const AJsonObject: TJDOJsonObject; const ADataSet: TDataSet; const AIgnoredFields: TMVCIgnoredList; const ANameCase: TMVCNameCase);
so that it looks more like
procedure TMVCJsonDataObjectsSerializer.JsonObjectToDataSet(const AJsonObject: TJDOJsonObject; const ADataSet: TDataSet; const ADataSetFields: TMVCDataSetFields);

And do the setup of the TMVCDataSetFields before calling this new method once for each record.

Hello,

I have done the tests with the latest version of DMVC Framework from yesterday in my real project. And it was still slow. It needs 15000 milliseconds. My Dataset contains about 7800 records and has 41 columns.
If I comment the lines with GetNameAs, IsIgnoredAttribute und TMVCSerializerHelper.ApplyNameCase in MVCFramework.Serializer.JsonDataObjects.pas → TMVCJsonDataObjectsSerializer.JsonObjectToDataSet
It is much faster. It needs only 300 - 400 milliseconds.

//      lName := GetNameAs(ADataSet.Owner, Field.Name, Field.FieldName);
//
//      if (IsIgnoredAttribute(AIgnoredFields, lName)) or (IsIgnoredComponent(ADataSet.Owner, Field.Name)) then
//        continue;
//
//      lName := TMVCSerializerHelper.ApplyNameCase(GetNameCase(ADataSet, ANameCase), lName { Field.FieldName } );
lName := Field.FieldName;

But why is the test project much faster than my real world project? I have debugged the project and find out, that within the test project the function TMVCAbstractSerializer.GetNameAs gets nil at the line ObjFld := ObjType.GetField(AComponentName);
In my real world project not. Why? I don't know. But this will make the deserialization faster. I decide to make the test project as similar as possible as my real project.
That means I have added a DataModule with a FDMemTable and 40 Fields at Design Time. And now the test project is also slow :-) Please try and verify this statement.

As @fastbike wrote and as I have thought, the problem is the repeated RTTI call for each column and for each record. And there are 3 procedures which calls RTTI. In my case 7800 * 41 * 3.

How can we solve this performance problem?
One easy solution could be to made only one call to RTTI (combine the calls to GetNameAs, IsIgnoredAttribute, IsIgnoredComponent and GetNameCase).
Another idea is some kind of raw-parameter or an ignore-NameCase or ignore-MVCAttributes. In my case I don't need this functionality. See my code sample above. lName := Field.FieldName; works for me.
A universal solution could be some kind of cache (of RTTI calls). In DeserializeDataSet You could read all RTTI stuff of all fields and in JsonObjectToDataSet this cache could be used.

In attachment there is the new test project.
Example2.zip
image

I've made a small change to use the TMVCDataSetFields list, which is populated by calling GetDataSetFields in the TMVCJsonDataObjectsSerializer.JsonArrayToDataSet method , before enumerating the json array.

Time has now fallen from ~ 10 seconds to ~115 milliseconds. Two orders of magnitude faster which is a massive win !

The only gotcha at this stage is that I commented out the lines handling nested datasets to get the code to compile. Either we take a hit on having to make a separate calls to GetDataSetFields for each record that is a nested dataset or figure out another way of handling what could possibly be a recursive nested dataset scenario.

procedure TMVCJsonDataObjectsSerializer.JsonArrayToDataSet(const AJsonArray: TJDOJsonArray; const ADataSet: TDataSet;
  const AIgnoredFields: TMVCIgnoredList; const ANameCase: TMVCNameCase);
var
  I: Integer;
  Fields:  TMVCDataSetFields;
begin
  Fields := GetDataSetFields(ADataSet, AIgnoredFields, ANameCase);
  for I := 0 to Pred(AJsonArray.Count) do
  begin
    ADataSet.Append;
    JsonObjectToDataSet(AJsonArray.Items[I].ObjectValue, ADataSet, Fields);// AIgnoredFields, ANameCase);
    ADataSet.Post;
  end;
end;

and

procedure TMVCJsonDataObjectsSerializer.JsonObjectToDataSet(const AJsonObject: TJDOJsonObject;
  const ADataSet: TDataSet;   const ADataSetFields: TMVCDataSetFields);
var
  J: Integer;
  Field: TField;
  lName: string;
  SS: TStringStream;
  SM: TMemoryStream;
  NestedDataSet: TDataSet;
begin
  if (ADataSet.State in [dsInsert, dsEdit]) then
  begin
    for J := 0 to ADataSetFields.Count - 1 do
    begin
      lName := ADataSetFields[J].FieldName;
      Field := ADataSet.Fields[ADataSetFields[J].I];
      // case GetNameCase(ADataSet, ANameCase) of
      // ncLowerCase:
      // name := LowerCase(Field.FieldName);
      // ncUpperCase:
      // name := UpperCase(Field.FieldName);
      // end;
      if not AJsonObject.Contains(lName) then
        continue;
      if (AJsonObject[lName].Typ = jdtObject) and (AJsonObject.Values[lName].ObjectValue = nil) then
      // Nullable Type
      begin
        Field.Clear;
        continue;
      end;
      case Field.DataType of
// etc

I'd also suggest renaming the record property - for clarity - from "I" to "Index" if that does not break too many things

I can confirm. The change from @fastbike works much faster.
This is what I mean with RTTI cache.

Sounds Good! We need to find a way to get this improvements mantaining the same functionality level. I'll give a look ASAP. Thanks guys.

Hi guys. I did some changes based on your suggestion. That caused some refactoring about methods visibility, paramaters, types and so on. However, it should be OK. All the unit tests (currently 429) passed OK. Please, check it on your side, and thanks for your support.