bigcode-project / bigcode-analysis

Repository for analysis and experiments in the BigCode project.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

BigCode Analysis

This repository is for the analysis done in BigCode Project. You can find analysis of datasets, models, architecture choices and more.

Contents

  • Data analysis: In the folder data_analysis, we provide code for data analysis:

    • Near deduplication
    • Python data analysis:
      • Natural language distribution in comments/docstrings
      • Data decontamination for HumanEval and MBPP benchmarks
      • Percentage of files that can be successfully compiled
      • Percentage of configuration and test files
      • Exploration of unimax sampling on The Stack Some notebooks with some early data and model loss analysis.
  • Multi-Query Attention experiments, for details please to multi_query_experiments/README.md)

About

Repository for analysis and experiments in the BigCode project.

License:Apache License 2.0


Languages

Language:Jupyter Notebook 98.3%Language:Python 1.7%Language:Shell 0.0%