Wanalyzer is a simple Python module providing tools to parse and analyze exported WhatsApp conversations (via the Export Chat
feature of WhatsApp).
To install Wanalyzer, you can simply use pip install analyser
or use the setup.py
python script.
And import it to your project with import wanalyzer as wa
.
You can choose between:
- Using the provided conversation parser and send the file path as
str
to theConversation
constructor. - Do the parsing part yourself and send a
List[Message]
to theConversation
constructor (can be useful if your input formatting is not exactly the same as the WhatsApp exported chat feature).
Message dataclass description
Attribute | Type | Description |
---|---|---|
date | datetime |
Time (precise with secs) of the message |
author | str |
Author of the message |
content | str |
Content of the message |
msg_type | MessageType |
Type of the message |
Message.MessageType
can be TEXT
, STICKER
, AUDIO
, IMAGE
, VIDEO
, or FILE
.
You are able to create your own filters of type Callable[[Message], bool]
: It have to be a function taking the Message
as parameter, and return false
if it have to be removed from the analysed conversation.
Some built-in filters as already available:
filter__min_size
: Remove all messages less than the given size.filter__from_authors
: Remove all messages that are not from the given authors.filter__contains_texts
: Remove all messages not containing any of the given texts.filter__not_contains_texts
: Remove all messages containing any of the given texts.
Wanalyzer provide both member based, and time based statistics tools.
Member Statistics
With the Conversation.get_members_stats()
method, you can get a List[MemberStat]
describing the statistics of each member of the conversation.
The MemberStat
dataclass gives informations about the messages sent by the member.
MemberStat dataclass description
Attribute | Type |
---|---|
member_name | str |
member_conv | str |
messages | List[Message] |
msg_count_by_type | Dict[MessageType, int] |
The msg_count_by_type
attribute provide the messages count of each type.
Time scale splitting
With the Conversation.get_split_by(scale: TimeScale)
method, you can get a Dict[datetime, Conversation]
dictionary that split the original conversation into smaller ones, taking the messages sorted by the given TimeScale
: YEAR
, MONTH
, DAY
, HOUR
, MINUTE
or SECOND
.
Pull requests are opened to add any kind of features or improvements in the code.
Please make sure that all the regression tests are passing (using test.sh
) and to clean the repository (using clean.sh
) before submitting your pull request.