greedo / python-xbrl

xbrl parser written in Python :bulb:

Home Page:https://pypi.python.org/pypi/python-xbrl

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Parsing the first matching value

artemk93 opened this issue · comments

I am using Arelle app to open xml files and double check the output from the code below

from xbrl import XBRLParser, GAAP, GAAPSerializer

xbrl_parser = XBRLParser()
xbrl = xbrl_parser.parse('aapl-20150627.xml')
gaap_obj = xbrl_parser.parseGAAP(xbrl, doc_date='20150627', context='current', ignore_errors = 0)
serializer = GAAPSerializer()
result = serializer.dump(gaap_obj)

print result

Output:

MarshalResult(data={u'liabilities': 65285.0, u'net_cash_flows_financing_continuing': 0.0, u'revenue': 0.0, u'income_tax_expense_benefit': 3796.0, u'common_shares_authorized': 0.0, u'income_from_equity_investments': 0.0, u'preferred_stock_dividends': 0.0, u'redeemable_noncontrolling_interest': 0.0, u'extraordary_items_gain_loss': 0.0, u'temporary_equity': 0.0, u'costs_and_expenses': 0.0, u'non_current_assets': 4081.0, u'net_cash_flows_discontinued': 0.0, u'net_cash_flows_investing_discontinued': 0.0, u'liabilities_and_equity': 273151.0, u'other_operating_income': 0.0, u'operating_income_loss': 0.0, u'income_before_equity_investments': 0.0, u'net_income_parent': 0.0, u'equity': 0.0, u'income_loss': 14083.0, u'cost_of_revenue': 0.0, u'operating_expenses': 5598.0, u'noncurrent_liabilities': 0.0, u'current_liabilities': 0.0, u'net_cash_flows_investing': 0.0, u'stockholders_equity': 125677.0, u'net_income_loss': 10677.0, u'net_cash_flows_investing_continuing': 0.0, u'nonoperating_income_loss': 0.0, u'common_shares_outstanding': 0.0, u'net_cash_flows_financing': 0.0, u'net_income_shareholders': 0.0, u'comprehensive_income': 9065.0, u'equity_attributable_interest': 0.0, u'commitments_and_contingencies': 0.0, u'comprehensive_income_parent': 9065.0, u'net_cash_flows_operating_discontinued': 0.0, u'comprehensive_income_interest': 0.0, u'other_comprehensive_income': 0.0, u'equity_attributable_parent': 0.0, u'assets': 3991.0, u'common_shares_issued': 0.0, u'gross_profit': 19681.0, u'net_cash_flows_operating_continuing': 0.0, u'current_assets': 0.0, u'interest_and_debt_expense': 0.0, u'net_income_loss_noncontrolling': 0.0, u'net_cash_flows_operating': 0.0}, errors={})

The problem is that every value is the first matching value in the xml file. So liabilities = 65285.0, is actually us-gaap:LiabilitiesCurrent, which comes before us-gaap:Liabilities.
Same thing with assets = 3991.0 is actually
us-gap:FiniteLivedIntangibleAssetsAccumulatedAmortization, which comes before us-gaap:Assets = 273 151 000 000.

I believe it can be solved by slightly changing part of def parseGAAP() in xbrl.py where xbrl.find_all is used for every value (assets, current_assets, etc)

In xbrl.py inside function def parseGAAP()
liabilities = xbrl.find_all(name=re.compile("(us-gaap:liabilities$)", re.IGNORECASE | re.MULTILINE)) seems to solve the problem.
Same thing for assets or any other tag name

I've been working with the code, changing it slightly to look for values that I want to (I hope that is ok). If it helps I can post it here, I can also post the output that I get.

@artemk93 if you think the changes would be valuable to other people, go ahead a submit a PR with what you have and we can work on it. Thanks!

Thanks for the parser, @greedo greedo.

I am planning to follow in @artemk93's footsteps and make piecemeal changes but wanted to check if there was any update on you and @artemk93 work? It does not look like he/she ever actually submitted a PR

No progress on it yet @artemk93 and would gladly welcome your contributions.