kaniska / kaniska.github.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Goal

  • Design, develop and implement data-driven products using right mix of skills and technologies for Data Engineering, Data Scientists and Data Analysts.

Favorite Guides

  • [Building a Data Science team]
  • [Ml Problem Framework]
  • [Business Problem Framework]

Key Professional Projects

Healthcare Insurance & Retail Innovations

  • Responsible for AI-driven Transformative Innovation for the largest Healthcare and Pharmacy Service Provider.
  • Conceptualized and initiated the development of products like Enterprise Medical Insights Discovery through Advanced NLP and Knowledge Graph.
  • Spearheaded the extensive Due-diligence and Business-Impact analysis which eventually led to Multi-disciplinary initiatives for Prior-Authorization analysis, Claims analysis, Clinical Notes, Prescription and Provider content Analysis.
  • Designed and initiated development of integrated Orchestrated ML Workflow and ML Ops Solutions from scratch in the domain of Care Management, Member Analytics, Pharmacy Business and Next-Best-Action (NBA) Campaigns for reducing medical cost and personalized recommendations.
  • Leading multiple teams to deliver results through extreme collaboration with stakeholders, deep understanding of the business and market, devising new growth strategies.

eCommerce Retail: Improve Customer Experience by creating values out of eCommerce data

  • Conceptualized and implemented products in the domain of Marketing Decision Science to combine billions of signals from User Engagement data, User Intents, Product feed and Site page content using Recommendations, Clustering, Classification and advanced NLP techniques.
  • Built systems to crunch petabytes of data to support $4B revenue
  • Developed a Customer Centric, Intents driven Business model for optimizing Search marketing and Search Engine performance.
  • Designed and developed Experimentation platform (SEO Split Tests)

IIOT: Smart City Platform for Sensity Systems (Verizon IOT & Smart Communities)

IIOT: Award winning Real time Location Tracking (RTLS) product 'Where' for the startup Enlighted Inc

IIOT & Networking: Industry's first Network Insights Graph

Cloud Analytics: Founding member of the product DCBA (Dell Cloud Business Applications) & Kitenga incubated by Dell Silicon Valley RnD Center

Development Tools: Design and development of Jackbe Unified Mashup Protocol

http://mdc.jackbe.com/prestodocs/v3.2/mashup-studio/emml-editor.html

Development Tools: Conceptualization and development of Graphical Model Transformation Framework which was adopted as Business Object Modeling Tool by Tibco BPM

https://docs.tibco.com/pub/business-studio-bpm-edition/4.1.0/doc/html/GUID-C7186329-3F2F-4FA5-85A8-C356519618D6.html
http://media.tibco.com/flash/soa/tibco_soa_preso.html

Created Award Winning Tibco Rules Engine Language Editor and State Machine Diagrammer

Core contributor and owner of these tools which generated massive revenues. https://docs.tibco.com/pub/businessevents-express/5.2.1/doc/html/GUID-791378AB-73F6-417D-BFBE-AA64192D95AE.html https://docs.tibco.com/pub/businessevents-enterprise/5.5.0/doc/html/GUID-C6901441-911D-4257-891C-D8BEF9F1CD0E.html

Healthcare Startup initiatives

Product Idea - Pill Reminder Product Idea - Drugs Digital Ads Product Idea - Pain Management Drugs Search - Web Analytics and Phone App

Personal Projects, Lessons Learnt, Open Source, Articles

Mentoring Startups

-- Spotle.AI: Board Member -- involved in non-profit initiative to advise company, teach Deep Learning and help students with ML projects

-- PrivaClave: Advisor, honorary CTO -- Mentoring the development of Graph DSL and Content Rule mining

-- AIForMankind -- Lead a group of volunteers to implement projects for social goods like Wildfire Detection: -- 2nd prize winner for 'Wild Fire detection' hackathon: -- Launched Covid19 Hackathons -- Image Classification and Object Detection Techniques -- NLP Techniques

Personal RnD

Machine Learning Series

-- Code, papers, blogs --Open Source Contributions -- Project COVITA -- Covid-19 Text Analysis for Social Good -- Covid-19 Technical Research -- Project Skylab -- Early detection of Wildfire -- Wildfire detection and mitigation using deep-learning -- Project Spark-NLP -- Algorithm Performance Analysis -- Notes on Selecting ML Algorithms -- Developed a weka‐based tool to compare ML-Algo performances -- contributed to Open Source ‘randomized optimization’ library --Algo Analysis Papers -- Supervised Algo -- Unsupervised Algo -- RL Algo -- Randomized Optimizations Algo -- Regression Modeling in R (Logistics & Linear Regressions) -- Exploratory Analytics

-- Online Shopping Experience -- Opportunities for innovation in eCommerce -- Data Science of Selling Product -- Computational Investments -- Algorithmic Strategies -- Predictive Trading -- Code using Python, H2O ML lib

-- Medical Data Analysis -- MedBots (Drugs Data Analysis and Search) -- Drugs Data Mining Article -- Drugs Data Mining Code -- Lab Setup for HealthCare Data Analysis (Spark-ML, python, scala) -- Mortality Prediction -- Patients Disease Clustering -- Analyze FHIR -- Medical Entity extraction using Spark NLP -- Knowledge-Based AI techniques -- AI Agent teaching human -- AutoGrader Agent -- Machines Solving Psychometric Test -- Robotics & IOT
-- Location Tracking, Kalman Filters, GBR in Robotics -- Building a Smarter Globe -- Object Tracking in Constrained Environment -- Image Detection and Sentiment Analysis -- Comprehensive guidelines for Image Analysis & Object Detection -- Wildfire Object detection -- mnist data analysis (Lin Reg - R)
-- Image Detection and Sentiment Analysis (Deep Learning) - Python-Keras -- Course Recommendation -- Apriori Analysis for Course Recommendation -- Apriori code example

-- NLP -- Comprehensive research on NLP techniques -- Flu outbreak prediction using Twitter texts -- Twitter / FB text analysis for Entity Recognition (Gate API) -- Product Review content analysis (Standford NLP) and topic clustering -- Product Category and Movie Genre Analysis (R - Document Term Matrix) -- Network Vocabulary matching (tensorflow library for term analysis) -- Drugs data extraction (Solr Rank) and boost rank of drugs attributes -- Drugs Annotation using UIMA analysis library -- Text message analysis (near term matching, dis-max parser) -- Misc -- Timeseriese Anomaly -- Graph Analysis -- Graph Query and Domain Vocabulary -- TensorFlow Application -- Product Recommendation, Review Data Analysis -- Movies Gross Prediction (Linear Regression)

Learning from Conferences, Hacking

-- Latest Tools for Machine Learning Workflow Orchestration, Data Labelling, Model Deployment, Tuning, Tracking & Automation -- Making models explainable, interpretable and Ethical -- Building Machine Learning Pipelines (2016) -- Machine Learning Platform & Infrastructure (2016) -- ML-2015 ~Data Science Usecases

Data Streaming & ETL Series

-- Evolution of Streaming Technology -- Streaming ETL-Part 3 ~ Apache Beam,Spark Streaming,Kafka Streams,MapR Streams -- Apache-Edgent-IOT-Analytics -- What's a Streaming Data Management System ? -- Streaming ETL-Part 2 ~ Develop a fault-tolerant, scalable messaging and streaming solution (Kafka, Samza, SQL-Stream , DataTorrents) -- Build a Micro-batch Streaming and Ingestion System -- Streaming ETL -Part 1 ~ Real-time ETL on Hadoop

Data Modeling,Persistence Series

-- Data Modeling best practices-Part 2 -- Statistical Data Analysis using Time Series DB -- Graph Modeling and Query -- Fast Data Transfer using Parquet, Arrow -- Gremlin Domain Specific Traversal -- Data Modeling Reference-Part 1 -- Lightweight Persistence Service for Dynamic Entities -- Using high-speed cache

Big Data Analytics

-- Big Data Analytics - core concepts & key technologies -- MS CS Course ~ Big Data Project for HealthCare Informatics -- Hands-On Spark -- [Analytics Part-3 ~ Ad-hoc Big Data Crunching using Couchbase, Spark SQL, Solr-Spark] -- Continuous Automated Actionable Analytics -- Advantages of ColumnStore -- Optimize Query for old-fashioned Data-warehousing -- Analytics Part-4 ~ Thinking beyond Hadoop : build Fault-tolerant High-throughput low-memory Real-time analytics -- Analytics Part-5 ~ Ad-hoc Analytics on Big Data : Key Technologies -- Analytics Part-2 ~ Approaches for querying current and historical data -- Git repo ~ real-time analytics app using - NodeJs‐Mongo‐Redis‐Socket.io -- Git repo ~ real-time rules execution using logstash -- Git repo ~ real-time triggers using ES Percolator -- Git repo ~ Spark Analytics -- Git repo ~ crawlers (FB, Twitter, Linkedin) -- Data Analysis using R

Data Integration

-- Data Integration Tricks -- Application Integration Platform as a Service

Data Search

-- Building an Enterprise Search Application using SOLR and CASSANDRA -- SOLR Ranking and Query Boosting -- Build a Search APP using MongoDB , github repo

Data-driven Enterprise Analytics System

-- step-by-step approach to build a multi-tenant saas analytics system (2012) -- Multi-Tenancy in SaaS Modules -- ETL for SaaS System -- RnD Code ~ build a simple data ingestion framework

Hadoop Series

-- Big data Analysis in AWS EMR -- Programming with Cloudera Hadoop 4 (Ubuntu) -- git repo ~ Spring-Hadoop App

Development & Architecture Best Practice

-- Coding Tricks to build robust systems -- Java Reloaded -- Designing better API -- Taming Server-side threads -- Best Practices for developing Scalable System -- Learning from JDK Source and Language Specs

Performance Tuning Series

-- Tuning Titan Graph -- Tuning Kafka -- Tuning ElasticSearch -- Tuning Hadoop Stack -- Challenges in streaming large data set -- Tuning Hbase -- Tuning JVM -- Tuning MySQL

Build QuickFire Apps

-- Create a Spark App -- Build a Dynamic Workflow using MEAN/MERN -- Json Search App using MongoDB -- Product Overview Dashboard -- Medical Data Analysis using FHIR

Security

-- Secure Access to MongoDB using bcrypt -- Enforce Security checks on public API -- Securing Application Access

Infrastructure, Fault-Tolerance

-- Building MicroService -- [Run Apps using Mesos & Docker] -- MongoDB Fault-Tolerance Strategy -- git repo: Failover tests, Split-brain scenario -- Setup LAMJ

UI

-- ReactJs -- Learning MEAN Stack

Miscellaneous

-- Bigdata Reference (2012) -- Building a Domain-Specific Graphical Modeling Tool -- Building a Startup Environment -- Key Features of Ad Serving Technology -- Coding with CloudFoundry -- Riding Semantic Wave

Blog posts

-- Blogs(http://wp.me/p2fcIs-3R)

About