esp8266 / Arduino

ESP8266 core for Arduino

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

PWM and tone() incompatibility

davidevertuani opened this issue · comments

Basic Infos

Hardware

Hardware: NodeMCU ESP8266
Core Version: 2.4.0

Description

Hi. I'm having trouble using tone() and analogWrite() on the NodeMCU ESP8266.
Calling only tone() works fine, however calling analogWrite() on another pin, setting it to LOW and then calling tone() (on a different pin) only produces an annoying noise.

Sketch

void setup() {
  int pin = 12;
  pinMode(pin,OUTPUT);
  pinMode(14,OUTPUT);
  
  tone(pin,440);
  delay(1000);
  noTone(pin);
  //the tone is emitted correctly
  
  delay(1000);
  
  analogWrite(14,200);  
  delay(1000);
  analogWrite(14,0);
  
  tone(pin,440);
  delay(1000);
  noTone(pin);  
  //no tone is emitted
}

void loop() {
}

I have the same problem too.

Hardware: NodeMCU V1.0 (but I chose "Generic ESP8266 Module in Arduino IDE, v1.8.5)
Core: 00000000
SDK: 2.2.1(cfd48f3)
CPU Freq: 160

I tested with core 2.3.0 and it works!

Hardware: NodeMCU V1.0 (but I chose "Generic ESP8266 Module in Arduino IDE, v1.8.5)
Core: 2_3_0
SDK: 1.5.3(aec24ac9)
CPU Freq: 160

A have the same problem too. After "tone" on one pin and then "analogWrite" on another pin, "tone" doesn't work anymore, when I use core version 2.4.1 for NodeMCU 1.0 (ESP-12 Module). When I use core version 2.3.0 the behavior is different, but also not usable: after "tone" on one pin "analogWrite" works correct at another pin, but after "noTone" "analogWrite" doesn't work anymore. I don't know, how to flash the "Generic ESP8266 Module" for my NodeMCU. All four flash modes caused an upload error.

Personally in my current project I prefer the tone(), so I have removed all analogWrite lines in my code.
I will wait for the solid solution then I will insert those lines again.

I don't know, how to flash the "Generic ESP8266 Module" for my NodeMCU.

I am on fairly recent Git core, and assuming you are using Arduino IDE like me, this is how I flash my NodeMCU V1.0.

capture

I also have a NodeMCU V3 Lolin, not sure if it is a clone or genuine. I can flash it with the above settings but it won't run (won't work). But simply change the flash mode from DIO to QIO, then it works.

Thank you, may be I had forgotten the Reset Method: "nodemcu". But the behavior is the same as with Nodecmu v1 core version 2.3. After using tone and then analogWrite and tone once more, analogWrite doesn't work anymore. Best would be, if I would write an own tone and analogWrite function, which work both together. But if I therefore would use timer1 probably the serial connection wouldn't work anymore.

I wrote an own tone and noTone function with names itone and noItone. itone works with frequencies beginning with 1 Hz. But with core 2.4.1 itone has the same behavior as tone after using analogWrite. analogWrite is very aggressive. If called only once timer1 can't be used nomore for other purposes. analogWrite always grabs timer1 back. This means, if I need timer1, and I need it for precise measurements, then analogWrite mustn't be used.

These are my itone and noItone functions:

/*
    ESP8266 example for tone via timer1
    Hardware: NodeMCU
    2018
    tone using Timer
*/

#define ITONE_CYCLE 2500000 // 1 second cycle in 1/16 CPU clock (1/5 µs) for 80 MHz
// #define ITONE_CYCLE 5000000  // 1 second cycle in 1/16 CPU clock (1/5 µs) for 160 MHz
 
//=======================================================================
//       itone, noItone
//=======================================================================
byte _itonepin = D0;
byte _itoneval = HIGH;

void ICACHE_RAM_ATTR _onItoneTimerISR(){
  _itoneval ^= 1;
  digitalWrite(_itonepin,_itoneval);  //Toggle LED Pin
}
void itone(byte pin,unsigned long frequency) {
  timer1_detachInterrupt();
  timer1_attachInterrupt(_onItoneTimerISR);
  timer1_enable(TIM_DIV16, TIM_EDGE, TIM_LOOP);
  _itoneval = HIGH;
  _itonepin = pin;
  pinMode(pin,OUTPUT);
  digitalWrite(pin,HIGH);
  timer1_write(ITONE_CYCLE/frequency);
}

void noItone() {
  timer1_detachInterrupt();
}

//=======================================================================
//                               Setup
// we test the serial connection during itone running
// result: it still works
//=======================================================================
void setup()
{
  Serial.begin(115200);
  itone(D0,1); // LED blinks with 1 Hz
}
//=======================================================================
//                MAIN LOOP
//=======================================================================
void loop()
{
}

In the same kind also an analogWrite function could be written, which is compatible with a modified itone function.

Oh, the serial connecton still works, when timer1 is used for itone.

I use the arduino ide on my raspberry pi3 and port /dev/ttyUSB0 works without any problems. You should try this command without NodeMCU connected and after with NodeMCU connected:

ls /dev/tty*

then you may compare, whether the serial connection was detected and with which name

yes, you see a lot beginneng with /dev/tty and you should try this with and without the NodeCMU connected and then you can compare, whether there is one more with connected NodeCMU.

was there one one /dev/tty* ? The raspberry po3 is my favored desktop py and I don't like to use its GPIO ports. What if I would destroy my raspberry? But if something would happen to the NodeCMU, that's not any problem.

Please @AlfonsMittelmeyer @santhosh000 stop, this is an esp8266 issue not a Raspberry support forum.

I don't want to talk about the raspberry specially, because it's only my favored linux pc. And talked about connecting the NodeCMU with a linux pc.

@santhosh000: could your linux pc detect an additional device /dev/tty* ?

@santhosh000: what you should do is a detailed error description including pc hardware and os. For example Dell PC OPTIPLEX 760 Linux Ubuntu 16.04 LTS. I tried it with this computer and got /dev/ttyUSB0 when using ls /dev/tty*. The bug could be a defect NodeCMU, a defect USB cable or a PC or a linux, which can't recognize the connection. So try another USB cable, check whether you inserted it correct into the USB connector, try other linux systems and windows systems and other PCS. When nothing works, then it seems to be a defect NodeMCU.

@santhosh000 : aren't you able to tell about your linux pc? The NodeMCU isn't recognized by all linux systems, especially elderly versions. So I have an old Android TV stick MK808 B, which I had flashed with an old linux picuntu. It doesn't recognize the NodeMCU.

Maybe you have to install an USB to UART Bridge driver:

https://www.silabs.com/products/development-tools/software/usb-to-uart-bridge-vcp-drivers

read also:

https://cityos-air.readme.io/docs/1-usb-drivers-for-nodemcu-v10

@santhosh000 @AlfonsMittelmeyer This is not a support forum. This is an issue tracker for issues in the core libs. In addition, this specific issue is about tone and pwm incompatibility, and has nothing to do with usb or tty or rpi. Please discuss at a general forum like esp8266.com or stackoverflow, and stop hijacking.

@santhosh000: we should end this discussion here about your NodeMCU, because this thread is about incompatibility of PWM and tone. I don't want to install debian for further research. Because raspbian is derived from debian, I would think, that also latest versions of debian should recognize a NodeCMU. So most probably your NodeMCU is defect. You could check this by installing Ubuntu 16.04 LTS, because this linux version worked. And if not, then send back your NodeCMU and exchange it with one, which works.

Back to the problem incompatibility PMW and tone. It's clear, that there can't be a solution if PWM and tone is handled in different libraries. Tone has to be integrated in PWM, if we would like to use both.

I just began to deal with NodeMCU and ESP8266 and wonder, where I can find a good documentation. About timer1 I found only this via google:
https://circuits4you.com/2018/01/02/esp8266-timer-ticker-example/

and the header file on github:
https://github.com/esp8266/Arduino/blob/master/cores/esp8266/Arduino.h

Then I wanted to know about an accurate time, not in microseconss but in CPU clocks. Via google I could detect an assemler code for this:
http://sub.nanona.fi/esp8266/timing-and-ticks.html

So I had enough information for writing an own PWM function, in which I could integrate a tone function in the next step. But still there was a problem. I couldn't find any information for which values timer1_write works. Seemingly for values below 28 an overflow takes place, which results in a pause of nearly one second.

Sorry that I am a new bee in connection with the NodeMCU. Is there somebody who knows, where I could find useful documentation?

I think, it's not great problem to use ananogWrite and tone.

Just write an own analogWrite function, where you may integrate an own tone function.

This is such an analogWrite function. It differs from the original analogWrite function in two ways:

1.) digitalWrite will not stop this analogWrite. It has to be stopped by analogWrite(pin,0) for LOW or analogWrite(pin,1024) for HIGH, if we don't want to write an own digitalWrite function too, which considers this itself.

2.) it doesn't grab timer1 back, if we detach this timer for other purposes

What do you think about this analogWrite:

/*
    ESP8266 example for own analogWrite via timer1 and CCOUNT
    Hardware: NodeMCU
    2018
*/

#define CPU_CLOCK 80 // Mhz

//=======================================================================
//                               ianalogWrite
//=======================================================================

#define analogWrite ianalogWrite
#define PWM_CYCLE 1024
#define AWRITE_PINCOUNT 21
#define AWRITE_START_PIN 19
#define AWRITE_STOP_PIN 20
#define TIMER1_MIN_STEP 30

enum {
  PWM_CYCLE_TICKS = PWM_CYCLE * CPU_CLOCK,
};

struct {
    uint32_t value;
    byte nextpin;
    
} awriteData[AWRITE_PINCOUNT];


static inline uint32_t asm_ccount(void) {
    uint32_t r;
    asm volatile ("rsr %0, ccount" : "=r"(r));
    return r;
}

bool _iawritestate = true;

void ICACHE_RAM_ATTR _onIawriteTimerISR(){
  static uint32_t startcount;
  static byte current_pin = AWRITE_START_PIN;
  static uint32_t current_count;
  static uint32_t next_time;

  while(true) {
    
    if(_iawritestate) {
      startcount = asm_ccount();
      current_pin = AWRITE_START_PIN;
      for(int ipin = AWRITE_START_PIN; awriteData[ipin].nextpin != AWRITE_STOP_PIN;) {
        ipin = awriteData[ipin].nextpin;
        digitalWrite(ipin,HIGH);
      }
      _iawritestate = false;
    }
    else {
      digitalWrite(current_pin,LOW);
    }
    uint32_t nextduty =  awriteData[awriteData[current_pin].nextpin].value;
    if(nextduty == PWM_CYCLE_TICKS) {
      _iawritestate = true;
    }
    else {
      current_pin = awriteData[current_pin].nextpin;
    }
    current_count = asm_ccount();
    next_time = startcount + nextduty - current_count;
    if(!(next_time & 0x80000000 || next_time < TIMER1_MIN_STEP)) {
      timer1_write(next_time);
      return;
    }
  }
}

void init_ianalogWrite() {
  awriteData[AWRITE_START_PIN].value = 0;
  awriteData[AWRITE_START_PIN].nextpin = AWRITE_STOP_PIN;
  awriteData[AWRITE_STOP_PIN].nextpin = AWRITE_STOP_PIN;
  awriteData[AWRITE_STOP_PIN].value = PWM_CYCLE_TICKS;
  _iawritestate = true;
  timer1_detachInterrupt();
  timer1_attachInterrupt(_onIawriteTimerISR);
  timer1_enable(TIM_DIV1, TIM_EDGE, TIM_SINGLE);
  timer1_write(TIMER1_MIN_STEP);
}

void ianalogWrite(byte pin,unsigned int value) {
  int ipin = AWRITE_START_PIN;
  byte nextpin;
  pinMode(pin,OUTPUT);

  // if the pin is aleady in PWM, then unregister
  while(awriteData[ipin].nextpin != pin && awriteData[ipin].nextpin != AWRITE_STOP_PIN)
    ipin = awriteData[ipin].nextpin;
  if(awriteData[ipin].nextpin == pin) {
    noInterrupts();
    awriteData[ipin].nextpin = awriteData[pin].nextpin;
    interrupts();
  }

  // if value not 0 and not 1024, then register the pin
  if(value & 0x3FF /* 1023 */ ) {
    ipin = AWRITE_START_PIN;
    nextpin = awriteData[ipin].nextpin;
    value *= CPU_CLOCK;
    while(value > awriteData[nextpin].value) {
      ipin = nextpin;
      nextpin = awriteData[nextpin].nextpin;
    }
    noInterrupts();
    awriteData[pin].value = value;
    awriteData[pin].nextpin = nextpin;
    awriteData[ipin].nextpin = pin;
    interrupts();
  }
  else {
    digitalWrite(pin,value ? HIGH : LOW);
  }
}

//=======================================================================
//                               Setup
//=======================================================================

int startval = 0;
void setup()
{
  Serial.begin(115200);
  while(!Serial);
  init_ianalogWrite();
}
//=======================================================================
//                MAIN LOOP
//=======================================================================
void loop()
{
  analogWrite(D2,1024-startval);
  analogWrite(D1,startval);
  digitalWrite(D1,LOW); // to test, whether it's the own analogWrite
  delay(20);
  startval = (startval + 8) & 0x3FF;
}

Oh, I forgot to tell, that I used a two color LED connected with D1 and D2. Normal LEDs also would be fine.

I recognized a small bug in my algorithm when testing with this code:

void loop()
{

  for(float i = 0.5; i < 1024; i *= 1.03 ) {
    analogWrite(D2,i);
    delay(10);
  }
  for(float i = 1024; i >= 0.5; i /= 1.03 ) {
    analogWrite(D2,i);
    delay(10);
  }
}

I saw a short flicker, when 1024 was reached. It's clear, that changes shouldn't be made during a cycle. Changes should be prepared which take effect exactly at the beginning of the next cycle. So I should change my algorithm.

Now this algorithm seems to work perfect. Does anybody like to integrate tone?

/*
    ESP8266 example for own analogWrite via timer1 and CCOUNT
    Hardware: NodeMCU
    2018
*/

#define CPU_CLOCK 80 // Mhz
#define analogWrite ianalogWrite

//=======================================================================
//                               CCOUNT
//=======================================================================

static inline uint32_t asm_ccount(void) {
    uint32_t r;
    asm volatile ("rsr %0, ccount" : "=r"(r));
    return r;
}

//=======================================================================
//                               ianalogWrite
//=======================================================================

#define TIMER1_MIN_STEP 30
#define AWRITELEN 11
#define PWM_CYCLE 1024

enum {
  PWM_CYCLE_TICKS = PWM_CYCLE * CPU_CLOCK,
};

typedef struct {
    uint32_t value;
    byte pin;
} awriteElement;

typedef struct {
  awriteElement data[AWRITELEN];
  int len  = 0;
} awriteData;

awriteData awriteDataX[2];
int awriteDataIndex = 0;
bool _iawritestate = true;
volatile bool awriteDataChanged = false;

void ICACHE_RAM_ATTR _onIawriteTimerISR(){
  static int datalen;
  static awriteElement * data;
  static int index;
  static uint32_t startcount;
  static uint32_t current_count;
  static uint32_t next_duty;
  static int32_t next_time;
  
  while(true) {

    if(_iawritestate) {
      _iawritestate = false;
      if(awriteDataChanged) {
        awriteDataChanged = false;
        datalen = awriteDataX[awriteDataIndex].len;
        data = awriteDataX[awriteDataIndex].data;
      }
      startcount = asm_ccount();
      for(int i = 0; i < datalen; i++) {
        digitalWrite(data[i].pin,HIGH);
      }
      index = 0;
    }
    else {
      digitalWrite(data[index++].pin,LOW);
    }
    
    if(index == datalen) {
      _iawritestate = true;
      next_duty = PWM_CYCLE_TICKS;
    }
    else {
      next_duty = data[index].value;
    }

    current_count = asm_ccount();
    next_time = startcount + next_duty - current_count;
    if(next_time >= TIMER1_MIN_STEP) {
      timer1_write(next_time);
      return;
    }
  }
}

void init_ianalogWrite() {
  awriteDataIndex = 0;
  awriteDataX[0].len = 0;
  awriteDataX[1].len = 0;
  _iawritestate = true;
  awriteDataChanged = true;

  timer1_detachInterrupt();
  timer1_attachInterrupt(_onIawriteTimerISR);
  timer1_enable(TIM_DIV1, TIM_EDGE, TIM_SINGLE);
  timer1_write(TIMER1_MIN_STEP);
  while(awriteDataChanged);
}

void ianalogWrite(byte pin,unsigned int value) {
  pinMode(pin,OUTPUT);
  int len0 = awriteDataX[awriteDataIndex].len;
  awriteElement * data0 = awriteDataX[awriteDataIndex].data;
  if(!(value & 0x3FF))
  {
    if(len0) {
      bool contains = false;
      for(int i = 0; i < len0; ++i) {
        if(data0[i].pin == pin) {
          contains = true;
          break;
        }
      }
      if(contains) {
        awriteDataIndex ^= 1;
        awriteElement * data1 = awriteDataX[awriteDataIndex].data;
        for(int i = 0; i < len0; ++i) {
          if(data0[i].pin != pin)
            *data1++ = data0[i];
        }
        awriteDataX[awriteDataIndex].len = --len0;
        awriteDataChanged = true;
        while(awriteDataChanged);
      }
    }
    digitalWrite(pin,value ? HIGH : LOW);
  }
  else {
    awriteDataIndex ^= 1;
    awriteElement * data1 = awriteDataX[awriteDataIndex].data;
    int len1 = 0;
    int i = 0;
    value *= CPU_CLOCK;
    for(;i < len0; ++i) {
      if(data0[i].pin != pin) {
        if(data0[i].value < value)
          data1[len1++] = data0[i];
        else
          break;
      }
    }
    data1[len1].pin=pin;
    data1[len1++].value=value;
    for(;i < len0; ++i) {
      if(data0[i].pin != pin)
        data1[len1++] = data0[i];
    }
    awriteDataX[awriteDataIndex].len = len1;
    awriteDataChanged = true;
    while(awriteDataChanged);
  }
}

//=======================================================================
//                               Setup
//=======================================================================

void setup()
{
  Serial.begin(115200);
  while(!Serial);
  init_ianalogWrite();
}
//=======================================================================
//                MAIN LOOP
//=======================================================================
void loop()
{
  for(float i = 1; i < 1024; i *= 1.05 ) {
    analogWrite(D2,i);
    delay(10);
  }
  for(float i = 1024; i >= 1; i /= 1.05 ) {
    analogWrite(D2,i);
    delay(10);
  }
}

@minida28 : hi I did it. PWM at pin D2 (take a LED) and sound at pin D3 (oh susanna). Here is the solution:

/*
    ESP8266 example for own analogWrite and own tone via timer1 and CCOUNT
    PWM at D2, tone at D3
    Hardware: NodeMCU
    2018 by Alfons Mittelmeyer
*/

#include <Ticker.h>  //Ticker Library

#define CPU_CLOCK 80 // Mhz
#define analogWrite ianalogWrite
#define tone itone
#define noTone noItone


//=======================================================================
//                               oh susanna
//=======================================================================

char * keys = "cdefgah ";
int frequencies[] = {262,294,330,349,392,440,494,0};
char * ohsusanna = "cdeggagecdeedcd cdeggagecdeeddc ffaa aggecd cdeggagecdeeddc ";

Ticker playticker;

void pause_susannah(int index) {
  noTone(D3);
  playticker.once_ms(100,play_ohsusanna,index);
  
}

void play_ohsusanna(int index) {
  char tchar = ohsusanna[index];
  int frequency = 0;
  for(int i = 0;;i++) {
    if(keys[i] == tchar) {
      frequency = frequencies[i];
      break;
    }
  }
  if(frequency)
    tone(D3,frequency);
  if(!ohsusanna[++index])
    index = 0;
  playticker.once_ms(400,pause_susannah,index);
}
//=======================================================================
//                               CCOUNT
//=======================================================================

static inline uint32_t asm_ccount(void) {
    uint32_t r;
    asm volatile ("rsr %0, ccount" : "=r"(r));
    return r;
}

//=======================================================================
//                               itone
//=======================================================================

struct {
  bool on = false;
  byte pin;
  byte value;
  uint32_t half_periode;
  uint32_t next_ccount;
} _itone;

void itone(byte pin, unsigned int frequency) {
  uint32_t half_periode = (CPU_CLOCK * 500000) / frequency;
  noInterrupts();
  _itone.pin = pin;
  _itone.value = HIGH;
  _itone.half_periode = half_periode;
  _itone.next_ccount = asm_ccount() + CPU_CLOCK;
  _itone.on = true;
  interrupts();
}

void noItone(byte pin) {
  noInterrupts();
  _itone.on = false;
  interrupts();
}

//=======================================================================
//                               ianalogWrite
//=======================================================================

#define TIMER1_MIN_STEP 30
#define AWRITELEN 11
#define PWM_CYCLE 1024

enum {
  PWM_CYCLE_TICKS = PWM_CYCLE * CPU_CLOCK,
};

typedef struct {
    uint32_t value;
    byte pin;
} awriteElement;

typedef struct {
  awriteElement data[AWRITELEN];
  int len  = 0;
} awriteData;

awriteData awriteDataX[2];
int awriteDataIndex = 0;
bool _iawritestate = true;
volatile bool awriteDataChanged = false;

void ICACHE_RAM_ATTR _onIawriteTimerISR(){
  static int datalen;
  static awriteElement * data;
  static int index;
  static uint32_t startcount;
  static uint32_t current_count;
  static uint32_t next_duty;
  static int32_t next_time;
  static int32_t itone_time;
  static bool  isTone = false;
  
  while(true) {

    if(_iawritestate) {
      _iawritestate = false;
      if(awriteDataChanged) {
        awriteDataChanged = false;
        datalen = awriteDataX[awriteDataIndex].len;
        data = awriteDataX[awriteDataIndex].data;
      }
      startcount = asm_ccount();
      for(int i = 0; i < datalen; i++) {
        digitalWrite(data[i].pin,HIGH);
      }
      index = 0;
    }
    else {
      if(isTone) {
        isTone = false;
        digitalWrite(_itone.pin,_itone.value);
        _itone.next_ccount += _itone.half_periode;
        _itone.value ^= 1;
      }
      else
        digitalWrite(data[index++].pin,LOW);
    }
    
    if(index == datalen) {
      _iawritestate = true;
      next_duty = PWM_CYCLE_TICKS;
    }
    else {
      next_duty = data[index].value;
    }

    current_count = asm_ccount();
    next_time = startcount + next_duty - current_count;
    if(_itone.on) {
      itone_time = _itone.next_ccount - current_count;
      if(itone_time < next_time) {
        _iawritestate = false;
        next_time = itone_time;
        isTone = true;
      }
    }
    if(next_time >= TIMER1_MIN_STEP) {
      timer1_write(next_time);
      return;
    }
  }
}

void init_ianalogWrite() {
  awriteDataIndex = 0;
  awriteDataX[0].len = 0;
  awriteDataX[1].len = 0;
  _iawritestate = true;
  awriteDataChanged = true;

  timer1_detachInterrupt();
  timer1_attachInterrupt(_onIawriteTimerISR);
  timer1_enable(TIM_DIV1, TIM_EDGE, TIM_SINGLE);
  timer1_write(TIMER1_MIN_STEP);
  while(awriteDataChanged);
}

void ianalogWrite(byte pin,unsigned int value) {
  pinMode(pin,OUTPUT);
  int len0 = awriteDataX[awriteDataIndex].len;
  awriteElement * data0 = awriteDataX[awriteDataIndex].data;
  if(!(value & 0x3FF))
  {
    if(len0) {
      bool contains = false;
      for(int i = 0; i < len0; ++i) {
        if(data0[i].pin == pin) {
          contains = true;
          break;
        }
      }
      if(contains) {
        awriteDataIndex ^= 1;
        awriteElement * data1 = awriteDataX[awriteDataIndex].data;
        for(int i = 0; i < len0; ++i) {
          if(data0[i].pin != pin)
            *data1++ = data0[i];
        }
        awriteDataX[awriteDataIndex].len = --len0;
        awriteDataChanged = true;
        while(awriteDataChanged);
      }
    }
    digitalWrite(pin,value ? HIGH : LOW);
  }
  else {
    awriteDataIndex ^= 1;
    awriteElement * data1 = awriteDataX[awriteDataIndex].data;
    int len1 = 0;
    int i = 0;
    value *= CPU_CLOCK;
    for(;i < len0; ++i) {
      if(data0[i].pin != pin) {
        if(data0[i].value < value)
          data1[len1++] = data0[i];
        else
          break;
      }
    }
    data1[len1].pin=pin;
    data1[len1++].value=value;
    for(;i < len0; ++i) {
      if(data0[i].pin != pin)
        data1[len1++] = data0[i];
    }
    awriteDataX[awriteDataIndex].len = len1;
    awriteDataChanged = true;
    while(awriteDataChanged);
  }
}

//=======================================================================
//                               Setup
//=======================================================================

void setup()
{
  Serial.begin(115200);
  while(!Serial);
  init_ianalogWrite();
  pinMode(D3,OUTPUT);
  play_ohsusanna(0);
}
//=======================================================================
//                MAIN LOOP
//=======================================================================
void loop()
{
  for(float i = 1; i < 1024; i *= 1.05 ) {
    analogWrite(D2,i);
    delay(10);
  }
  for(float i = 1024; i >= 1; i /= 1.05 ) {
    analogWrite(D2,i);
    delay(10);
  }
}

Of course, also duo-tones, triads and chords would be possible.

Oh, I forgot the duration, which I also should implement. And about music for several voices. It's clear that this cannot be done by only one pin for tone, because we have only HIGH or LOW. But If we would use several tone channels, the channels at different pins and if we would connect these pins by resistors, then this should work. Wouldn't you think so?

Oh PWM and tone work very fine. Is there no further interest? I tested PWM with music for several voices. It works very well. Of cource, square waves are not the nicest sound.

Here an excerpt of my program:

//=======================================================================
//                          Oh Susanna
//=======================================================================

// melody
float voice1[] = {
  d4,0.125,E4,0.125,SOUND_CLOCK,Fis4,0.25,A4,0.25,A4,0.375,H4,0.125,SOUND_CLOCK,A4,0.25,Fis4,0.25,d4,0.375,E4,0.125,SOUND_CLOCK,Fis4,0.25,Fis4,0.25,E4,0.25,d4,0.25,SOUND_CLOCK,E4,0.75, 
  d4,0.125,E4,0.125,SOUND_CLOCK,Fis4,0.25,A4,0.25,A4,0.375,H4,0.125,SOUND_CLOCK,A4,0.25,Fis4,0.25,d4,0.375,E4,0.125,SOUND_CLOCK,Fis4,0.25,Fis4,0.25,E4,0.25,E4,0.25,SOUND_CLOCK,d4,1.0,SOUND_CLOCK,
  G4,0.5,G4,0.5,SOUND_CLOCK,H4,0.25,H4,0.5,H4,0.25,SOUND_CLOCK,A4,0.25,A4,0.25,Fis4,0.25,d4,0.25,SOUND_CLOCK,E4,0.75,
  d4,0.125,E4,0.125,SOUND_CLOCK,Fis4,0.25,A4,0.25,A4,0.375,H4,0.125,SOUND_CLOCK,A4,0.25,Fis4,0.25,d4,0.375,E4,0.125,SOUND_CLOCK,Fis4,0.25,Fis4,0.25,E4,0.25,E4,0.25,SOUND_CLOCK,d4,1.0,SOUND_CLOCK,
  SOUND_END
  };

// accompaniment 1
float voice2[] = {
  PAUSE,0.25,SOUND_CLOCK,d3,0.25,PAUSE,0.25,A3,0.25,PAUSE,0.25,SOUND_CLOCK,d3,0.25,PAUSE,0.25,A3,0.25,PAUSE,0.25,SOUND_CLOCK,d3,0.25,PAUSE,0.25,A3,0.25,PAUSE,0.25,SOUND_CLOCK,Cis3,0.25,A3,0.25,A3,0.25,
  PAUSE,0.25,SOUND_CLOCK,d3,0.25,PAUSE,0.25,A3,0.25,PAUSE,0.25,SOUND_CLOCK,d3,0.25,PAUSE,0.25,A3,0.25,PAUSE,0.25,SOUND_CLOCK,d3,0.25,A3,0.25,Cis3,0.25,A3,0.25,SOUND_CLOCK,d3,0.25,A3,0.25,A3,0.25,
  PAUSE,0.25,SOUND_CLOCK,H3,0.5,H3,0.5,SOUND_CLOCK,d4,0.25,d4,0.5,d4,0.25,SOUND_CLOCK,d4,1.0,SOUND_CLOCK,PAUSE,0.25,A3,0.25,A3,0.25,
  PAUSE,0.25,SOUND_CLOCK,d3,0.25,PAUSE,0.25,A3,0.25,PAUSE,0.25,SOUND_CLOCK,d3,0.25,PAUSE,0.25,A3,0.25,PAUSE,0.25,SOUND_CLOCK,d3,0.25,A3,0.25,Cis3,0.25,A3,0.25,SOUND_CLOCK,d3,0.25,A3,0.25,A3,0.25,PAUSE,0.25,SOUND_CLOCK,
  SOUND_END
};

// accompaniment 2
float voice3[] = {
  PAUSE,0.25,SOUND_CLOCK,PAUSE,0.5,Fis3,0.25,PAUSE,0.25,SOUND_CLOCK,PAUSE,0.5,Fis3,0.25,PAUSE,0.25,SOUND_CLOCK,PAUSE,0.5,Fis3,0.25,PAUSE,0.25,SOUND_CLOCK,PAUSE,0.25,G3,0.25,G3,0.25,
  PAUSE,0.25,SOUND_CLOCK,PAUSE,0.5,Fis3,0.25,PAUSE,0.25,SOUND_CLOCK,PAUSE,0.5,Fis3,0.25,PAUSE,0.25,SOUND_CLOCK,PAUSE,0.25,Fis3,0.25,PAUSE,0.25,G3,0.25,PAUSE,0.25,SOUND_CLOCK,PAUSE,0.25,Fis3,0.25,Fis3,0.25,
  PAUSE,0.25,SOUND_CLOCK,G3,0.5,G3,0.5,SOUND_CLOCK,G3,0.25,G3,0.5,G3,0.25,SOUND_CLOCK,Fis3,1.0,SOUND_CLOCK,PAUSE,0.25,G3,0.25,G3,0.25,
  PAUSE,0.25,SOUND_CLOCK,PAUSE,0.5,Fis3,0.25,PAUSE,0.25,SOUND_CLOCK,PAUSE,0.5,Fis3,0.25,PAUSE,0.25,SOUND_CLOCK,PAUSE,0.25,Fis3,0.25,PAUSE,0.25,G3,0.25,SOUND_CLOCK,PAUSE,0.25,Fis3,0.25,Fis3,0.25,PAUSE,0.25,SOUND_CLOCK,
  SOUND_END
};

//=======================================================================
//                               Setup
//=======================================================================

void setup()
{
  Serial.begin(115200);
  while(!Serial);

  pinMode(D5,INPUT_PULLUP); // starting with unconnected head phones, because the NodeCMU doesn't boot with too much pins connected  with ground (inspite connected resistors)
  while(digitalRead(D5)) // waiting for connection of headphones
    delay(10);
  delay(2000); // then 2 seconds pause

  init_ianalogWrite();    // initalizing the PWM and tone extensions
  
  tonechannels(D5,D6,D7); // registering pins D5, D6 and D7 as tone channels 0, 1 and 2
  sound_voice(0,voice1);  // registerung melody for tone channel 0
  sound_voice(1,voice2);  // registerung melody for tone channel 1
  sound_voice(2,voice3);  // registerung melody for tone channel 2
  sound_start();          // start playing music at pins D5, D6 and D7
}
//=======================================================================
//                MAIN LOOP
//=======================================================================

// testing PWM with music playing
void loop()
{

  for(float i = 1; i < 1024; i *= 1.05 ) {
    analogWrite(D2,i);
    delay(10);
  }
  for(float i = 1024; i >= 1; i /= 1.05 ) {
    analogWrite(D2,i);
    delay(10);
  }
}

@AlfonsMittelmeyer I have interest, not because I want to use it, but because I want to improve the core implementation.
Your example code is essentially a custom implementation. The question for you is: can you come up with a replacement of the current code in the core libs, without changing the exposed arduino api?

I could come up with a replacement of the current core libs if I would invest maybe a lot of time. The question is also, would this be only for arduino or arwe behind arduino deeper layers involved like the ESP8266 Non-OS SDK of Espressif. There is a Hardware timer API and there are PMW-related APIs. Normally I would think that these two APIS must be integrated. If there would be only an integration at Arduino core level and somebody would use such deeper layers, then the problem exists further, that these layers don't work together. But it wouldn't be a problem to offer such APIS at arduino level. I think, that these Hardware timer API has also a poor accuracy. When I would integrate a hardware timer API in my PMW ISR then a hardwaretimer could have a response time of only two micro seconds for CPU with 80 MHz or 1 micro second for CPU with 160 MHz.

The problem is, that I don't know yet, which dependencies exist in the Arduino core libraries, because I just began to explore about the ESP8266, Arduino and the NodeCMU. When there are different APIS and libraries for tone, hardware timers and PWM, then these APIS would have to be integrated and new mixed, so that all works together. I doubt, that some replacements in some core libs, as they exist yet, would be sufficient.

Later I can say more about this, when I have got a better overview.

But now first I present the new version of the code. With PWM at pin D2 and sound at pins D5, D6 and D7. I connected these pins over 1 kOhm resistors with a headphone. The NodeMCU didn't boot, when the Headphone was connected. So i booted first and then connected the headphone. Now the music is Oh Susanna correct with melody and accompaniment.

/*
    ESP8266 example for own analogWrite and own tone via timer1 and CCOUNT
    PWM at D2, tone at D3
    Hardware: NodeMCU
    2018 by Alfons Mittelmeyer
*/


#include <Ticker.h>  //Ticker Library

#define CPU_CLOCK 80 // Mhz
#define analogWrite ianalogWrite
#define tone itone
#define noTone noItone
#define TONEMAXCHANNELS 4

//=======================================================================
//                               CCOUNT
//=======================================================================

static inline uint32_t asm_ccount(void) {
    uint32_t r;
    asm volatile ("rsr %0, ccount" : "=r"(r));
    return r;
}

//=======================================================================
//                               tonechannels
//=======================================================================

Ticker _itoneTickers[TONEMAXCHANNELS];


int _itonescount = 0;

struct _itone {
  bool on = false;
  bool start = true;
  byte pin;
  byte value;
  uint32_t half_periode;
  uint32_t next_ccount;
};

struct _itone _itones[TONEMAXCHANNELS];

void _itoneChannelDisable(int channel) {
  noInterrupts();
  _itones[channel].on = false;
  interrupts();
}

void tonechannels() {
  _itonescount = 0;
}
void tonechannels(byte pin) {
  pinMode(pin,OUTPUT);
  _itones[0].on = false;
  _itones[0].pin = pin;
  _itonescount = 1;
}

void tonechannels(byte pin1, byte pin2) {
  pinMode(pin2,OUTPUT);
  tonechannels(pin1);
  _itones[1].on = false;
  _itones[1].pin = pin2;
  _itonescount = 2;
}

void tonechannels(byte pin1, byte pin2, byte pin3) {
  pinMode(pin3,OUTPUT);
  tonechannels(pin1,pin2);
  _itones[2].on = false;
  _itones[2].pin = pin3;
  _itonescount = 3;
}

void tonechannels(byte pin1,byte pin2,byte pin3,byte pin4) {
  pinMode(pin4,OUTPUT);
  tonechannels(pin1,pin2,pin3);
  _itones[3].on = false;
  _itones[3].pin = pin4;
  _itonescount = 4;
}

void channelTone(byte id, float frequency, unsigned int duration) {

  uint32_t half_periode = (CPU_CLOCK * 500000) / frequency + 0.5;
  _itoneTickers[id].detach();
  noInterrupts();
  _itones[id].start = true;
  _itones[id].value = HIGH;
  _itones[id].half_periode = half_periode;
  _itones[id].next_ccount = asm_ccount() + CPU_CLOCK;
  _itones[id].on = true;
  interrupts();
  if(duration)
    _itoneTickers[id].once_ms(duration,_itoneChannelDisable,(int) id);

}

void channelTone(byte id, float frequency) {
  channelTone(id,frequency,0);
}

void channelNoTone(byte id) {
  noInterrupts();
  _itones[id].on = false;
  interrupts();
  _itoneTickers[id].detach();
}

//=======================================================================
//                               itone
//=======================================================================

void _itoneDisable(int channel) {
  noInterrupts();
  _itones[channel].on = false;
  _itonescount = 0 ? _itonescount < 2 : _itonescount;
  interrupts();
}

void itone(byte pin, unsigned int frequency,unsigned int duration) {
  pinMode(pin,OUTPUT);
  uint32_t half_periode = (CPU_CLOCK * 500000) / frequency;
  _itoneTickers[0].detach();
  noInterrupts();
  _itones[0].pin = pin;
  _itones[0].start = true;
  _itones[0].value = HIGH;
  _itones[0].half_periode = half_periode;
  _itones[0].next_ccount = asm_ccount() + CPU_CLOCK;
  _itones[0].on = true;
  _itonescount = _itonescount ? _itonescount : 1;
  interrupts();
  if(duration)
    _itoneTickers[0].once_ms(duration,_itoneDisable,0);
  
}

void itone(byte pin, unsigned int frequency) {
  itone(pin,frequency,0);
}

void noItone(byte pin) {
  noInterrupts();
  _itones[0].on = false;
  _itonescount = 0 ? _itonescount < 2 : _itonescount;
  interrupts();
  _itoneTickers[0].detach();
}

//=======================================================================
//                               ianalogWrite
//=======================================================================

#define TIMER1_MIN_STEP 30
#define AWRITELEN 11
#define PWM_CYCLE 1024

enum {
  PWM_CYCLE_TICKS = PWM_CYCLE * CPU_CLOCK,
};

typedef struct {
    uint32_t value;
    byte pin;
} awriteElement;

typedef struct {
  awriteElement data[AWRITELEN];
  int len  = 0;
} awriteData;

awriteData awriteDataX[2];
int awriteDataIndex = 0;
bool _iawritestate = true;
volatile bool awriteDataChanged = false;

void ICACHE_RAM_ATTR _onIawriteTimerISR(){
  static int datalen;
  static awriteElement * data;
  static int index;
  static uint32_t startcount;
  static uint32_t current_count;
  static uint32_t next_duty;
  static int32_t next_time;
  static int32_t itone_time;
  static bool  isTone = false;
  static int tonechannel = 0;
  int itime;
  
  while(true) {

    if(_iawritestate) {
      _iawritestate = false;
      if(awriteDataChanged) {
        awriteDataChanged = false;
        datalen = awriteDataX[awriteDataIndex].len;
        data = awriteDataX[awriteDataIndex].data;
      }
      startcount = asm_ccount();
      for(int i = 0; i < datalen; i++) {
        digitalWrite(data[i].pin,HIGH);
      }
      index = 0;
    }
    else {
      if(isTone) {
        isTone = false;
        digitalWrite(_itones[tonechannel].pin,_itones[tonechannel].value);
        if(_itones[tonechannel].start) {
          _itones[tonechannel].start = false;
          _itones[tonechannel].next_ccount = asm_ccount();
        }
        _itones[tonechannel].next_ccount += _itones[tonechannel].half_periode;
        _itones[tonechannel].value ^= 1;
      }
      else
        digitalWrite(data[index++].pin,LOW);
    }
    
    if(index == datalen) {
      _iawritestate = true;
      next_duty = PWM_CYCLE_TICKS;
    }
    else {
      next_duty = data[index].value;
    }

    current_count = asm_ccount();
    next_time = startcount + next_duty - current_count;

    itone_time = 0x7FFFFFFF;
    if(_itonescount) {
      for(int i = 0; i < _itonescount; ++i) {
        if(_itones[i].on) {
          itime = _itones[i].next_ccount - current_count;
          if(itime <= itone_time) {
            itone_time = itime;
            tonechannel = i;
          }
        }
      }
      if(itone_time < next_time) {
        _iawritestate = false;
        next_time = itone_time;
        isTone = true;
      }
    }
    next_time -= asm_ccount() - current_count;
    if(next_time >= TIMER1_MIN_STEP) {
      timer1_write(next_time);
      return;
    }
  }
}

void init_ianalogWrite() {
  awriteDataIndex = 0;
  awriteDataX[0].len = 0;
  awriteDataX[1].len = 0;
  _iawritestate = true;
  awriteDataChanged = true;

  timer1_detachInterrupt();
  timer1_attachInterrupt(_onIawriteTimerISR);
  timer1_enable(TIM_DIV1, TIM_EDGE, TIM_SINGLE);
  timer1_write(TIMER1_MIN_STEP);
  while(awriteDataChanged);
}

void ianalogWrite(byte pin,unsigned int value) {
  pinMode(pin,OUTPUT);
  int len0 = awriteDataX[awriteDataIndex].len;
  awriteElement * data0 = awriteDataX[awriteDataIndex].data;
  if(!(value & 0x3FF))
  {
    if(len0) {
      bool contains = false;
      for(int i = 0; i < len0; ++i) {
        if(data0[i].pin == pin) {
          contains = true;
          break;
        }
      }
      if(contains) {
        awriteDataIndex ^= 1;
        awriteElement * data1 = awriteDataX[awriteDataIndex].data;
        for(int i = 0; i < len0; ++i) {
          if(data0[i].pin != pin)
            *data1++ = data0[i];
        }
        awriteDataX[awriteDataIndex].len = --len0;
        awriteDataChanged = true;
        while(awriteDataChanged);
      }
    }
    digitalWrite(pin,value ? HIGH : LOW);
  }
  else {
    awriteDataIndex ^= 1;
    awriteElement * data1 = awriteDataX[awriteDataIndex].data;
    int len1 = 0;
    int i = 0;
    value *= CPU_CLOCK;
    for(;i < len0; ++i) {
      if(data0[i].pin != pin) {
        if(data0[i].value < value)
          data1[len1++] = data0[i];
        else
          break;
      }
    }
    data1[len1].pin=pin;
    data1[len1++].value=value;
    for(;i < len0; ++i) {
      if(data0[i].pin != pin)
        data1[len1++] = data0[i];
    }
    awriteDataX[awriteDataIndex].len = len1;
    awriteDataChanged = true;
    while(awriteDataChanged);
  }
}

//=======================================================================
//                             Sound Machine
//=======================================================================

#define SOUND_CLOCK_CYCLE 2000

#define  C3 130.813
#define  Cis3 138.591
#define  d3 146.832
#define  Dis3 155.563
#define  E3 164.814
#define  F3 174.614
#define  Fis3 184.997
#define  G3 195.998
#define  Gis3 207.652
#define  A3 220.0
#define  B3 233.082
#define  H3 246.942
#define  C4 261.626
#define  Cis4 277.183
#define  d4 293.665
#define  Dis4 311.127
#define  E4 329.628
#define  F4 349.228
#define  Fis4 369.994
#define  G4 391.995
#define  Gis4 415.305
#define  A4 440.0
#define  B4 466.164
#define  H4 493.883
#define  C5 523.251
#define  Cis5 554.365
#define  d5 587.330
#define  Dis5 622.254
#define  E5 659.255
#define PAUSE 0
#define SOUND_CLOCK -1
#define SOUND_END -2

float * sound_voices[TONEMAXCHANNELS];
Ticker sound_tickers[TONEMAXCHANNELS];
int sound_indices[TONEMAXCHANNELS];
int sound_voicecount = 0;
  

void play_voice(int channel) {
  float note;
  float nextnote;
  unsigned int duration;

  bool isClock = false;
  float * data = sound_voices[channel];
  note = data[sound_indices[channel]++];
  if(note == SOUND_CLOCK) {
    note = data[sound_indices[channel]++];
    if(!channel)
      isClock = true;    
  }
  if(note == SOUND_END) {
    sound_tickers[0].once_ms(2000,sound_start);
  }
  else {
    duration = data[sound_indices[channel]++] * SOUND_CLOCK_CYCLE;
    nextnote = data[sound_indices[channel]];
    if(!channel || nextnote >= 0)
      sound_tickers[channel].once_ms(duration,play_voice,channel);
    if(note > 0)
      channelTone(channel,note,duration-20);
    if(isClock) {
      for(int i = 1; i < sound_voicecount; ++i)
        play_voice(i);
    }
  }
}    

  
void sound_start() {
  for(int i = 0; i < sound_voicecount; ++i) {
    sound_indices[i] = 0;
    play_voice(i);      
  }
}

void sound_voice(int id, float * voicedata) {
  sound_voicecount = max(sound_voicecount,id+1);
  sound_voices[id] = voicedata;
}

//=======================================================================
//                          Oh Susanna
//=======================================================================

// melody
float voice1[] = {
  d4,0.125,E4,0.125,SOUND_CLOCK,Fis4,0.25,A4,0.25,A4,0.375,H4,0.125,SOUND_CLOCK,A4,0.25,Fis4,0.25,d4,0.375,E4,0.125,SOUND_CLOCK,Fis4,0.25,Fis4,0.25,E4,0.25,d4,0.25,SOUND_CLOCK,E4,0.75, 
  d4,0.125,E4,0.125,SOUND_CLOCK,Fis4,0.25,A4,0.25,A4,0.375,H4,0.125,SOUND_CLOCK,A4,0.25,Fis4,0.25,d4,0.375,E4,0.125,SOUND_CLOCK,Fis4,0.25,Fis4,0.25,E4,0.25,E4,0.25,SOUND_CLOCK,d4,1.0,SOUND_CLOCK,
  G4,0.5,G4,0.5,SOUND_CLOCK,H4,0.25,H4,0.5,H4,0.25,SOUND_CLOCK,A4,0.25,A4,0.25,Fis4,0.25,d4,0.25,SOUND_CLOCK,E4,0.75,
  d4,0.125,E4,0.125,SOUND_CLOCK,Fis4,0.25,A4,0.25,A4,0.375,H4,0.125,SOUND_CLOCK,A4,0.25,Fis4,0.25,d4,0.375,E4,0.125,SOUND_CLOCK,Fis4,0.25,Fis4,0.25,E4,0.25,E4,0.25,SOUND_CLOCK,d4,1.0,SOUND_CLOCK,
  SOUND_END
  };

// accompaniment 1
float voice2[] = {
  PAUSE,0.25,SOUND_CLOCK,d3,0.25,PAUSE,0.25,A3,0.25,PAUSE,0.25,SOUND_CLOCK,d3,0.25,PAUSE,0.25,A3,0.25,PAUSE,0.25,SOUND_CLOCK,d3,0.25,PAUSE,0.25,A3,0.25,PAUSE,0.25,SOUND_CLOCK,Cis3,0.25,A3,0.25,A3,0.25,
  PAUSE,0.25,SOUND_CLOCK,d3,0.25,PAUSE,0.25,A3,0.25,PAUSE,0.25,SOUND_CLOCK,d3,0.25,PAUSE,0.25,A3,0.25,PAUSE,0.25,SOUND_CLOCK,d3,0.25,A3,0.25,Cis3,0.25,A3,0.25,SOUND_CLOCK,d3,0.25,A3,0.25,A3,0.25,
  PAUSE,0.25,SOUND_CLOCK,H3,0.5,H3,0.5,SOUND_CLOCK,d4,0.25,d4,0.5,d4,0.25,SOUND_CLOCK,d4,1.0,SOUND_CLOCK,PAUSE,0.25,A3,0.25,A3,0.25,
  PAUSE,0.25,SOUND_CLOCK,d3,0.25,PAUSE,0.25,A3,0.25,PAUSE,0.25,SOUND_CLOCK,d3,0.25,PAUSE,0.25,A3,0.25,PAUSE,0.25,SOUND_CLOCK,d3,0.25,A3,0.25,Cis3,0.25,A3,0.25,SOUND_CLOCK,d3,0.25,A3,0.25,A3,0.25,PAUSE,0.25,SOUND_CLOCK,
  SOUND_END
};

// accompaniment 2
float voice3[] = {
  PAUSE,0.25,SOUND_CLOCK,PAUSE,0.5,Fis3,0.25,PAUSE,0.25,SOUND_CLOCK,PAUSE,0.5,Fis3,0.25,PAUSE,0.25,SOUND_CLOCK,PAUSE,0.5,Fis3,0.25,PAUSE,0.25,SOUND_CLOCK,PAUSE,0.25,G3,0.25,G3,0.25,
  PAUSE,0.25,SOUND_CLOCK,PAUSE,0.5,Fis3,0.25,PAUSE,0.25,SOUND_CLOCK,PAUSE,0.5,Fis3,0.25,PAUSE,0.25,SOUND_CLOCK,PAUSE,0.25,Fis3,0.25,PAUSE,0.25,G3,0.25,PAUSE,0.25,SOUND_CLOCK,PAUSE,0.25,Fis3,0.25,Fis3,0.25,
  PAUSE,0.25,SOUND_CLOCK,G3,0.5,G3,0.5,SOUND_CLOCK,G3,0.25,G3,0.5,G3,0.25,SOUND_CLOCK,Fis3,1.0,SOUND_CLOCK,PAUSE,0.25,G3,0.25,G3,0.25,
  PAUSE,0.25,SOUND_CLOCK,PAUSE,0.5,Fis3,0.25,PAUSE,0.25,SOUND_CLOCK,PAUSE,0.5,Fis3,0.25,PAUSE,0.25,SOUND_CLOCK,PAUSE,0.25,Fis3,0.25,PAUSE,0.25,G3,0.25,SOUND_CLOCK,PAUSE,0.25,Fis3,0.25,Fis3,0.25,PAUSE,0.25,SOUND_CLOCK,
  SOUND_END
};

//=======================================================================
//                               Setup
//=======================================================================++++++++++++++++++++++++++++++++++++++++--

void setup()
{
  Serial.begin(115200);
  while(!Serial);

  pinMode(D5,INPUT_PULLUP); // starting with unconnected head phones, because the NodeCMU doesn't boot with too much pins connected  with ground (inspite connected resistors)
  while(digitalRead(D5)) // waiting for connection of headphones
    delay(10);
  delay(2000); // then 2 seconds pause

  init_ianalogWrite();    // initalizing the PWM and tone extensions
  
  tonechannels(D5,D6,D7); // registering pins D5, D6 and D7 as tone channels 0, 1 and 2
  sound_voice(0,voice1);  // registerung melody for tone channel 0
  sound_voice(1,voice2);  // registerung melody for tone channel 1
  sound_voice(2,voice3);  // registerung melody for tone channel 2
  sound_start();          // start playing music at pins D5, D6 and D7
}
//=======================================================================
//                MAIN LOOP
//=======================================================================

// testing PWM with music playing
void loop()
{
  for(float i = 1; i < 1024; i *= 1.05 ) {
    analogWrite(D2,i);
    delay(10);
  }
  for(float i = 1024; i >= 1; i /= 1.05 ) {
    analogWrite(D2,i);
    delay(10);
  }
}

@minida28 : I think, it's simpler to do, than I thought yesterday. We need not think of all other posibilities, like fast read ADC now. We could just integrate tone and PWM now and later more functionality. Could you tell me, where I can find these core libraries?

Tone should be easy to integrate, because tone is very simple. Not so simple for me was to write an own PWM function without having seen the original in one of the core libraries.

The other question is a question about security. What for is PWM used? Is it used for steering model aircrafts or other very critical devices. If there is an orchestra playing, maybe the PWM pulse could come one ot two micro seconds late and then a very expensive device is destroyed. Could be this a case, which has to be considered?

If there would be a requirement for absolute exactness of PWM, nothing should be integrated into the current PWM. We could only offer a second PWM without such strict requirements.

Hi @AlfonsMittelmeyer

Great work there, I am sorry I have not have a chance to study and test your code yet.

@minida28 : I think, it's simpler to do, than I thought yesterday. We need not think of all other posibilities, like fast read ADC now. We could just integrate tone and PWM now and later more functionality. Could you tell me, where I can find these core libraries?

I have a very limited knowledge about the core libraries and programming, I am afraid I will point you to a wrong direction. Maybe you mean to mention @devyte ?

What for is PWM used? Is it used for steering model aircrafts

Could be, I believe I have read somewhere someone did that using PWM. But personally in my current project (just a hobby project, a digital clock using 16x128 Led matrix), I want to use PWM to control its brightness. It will also have alarm, so I need tone() to drive the buzzer.

Hi @AlfonsMittelmeyer, you didn't miss the core folder like Tone.cpp?

@Juppit : thank you. I thought about implementing the same behaviour. But my idea, that the structure of the core libraries would have to be changed, was wrong. If tone and PWM would use another shared source file, this would be sufficient. I saw now also, that seemingly a case like mine was foreseen in tone.cpp. Here you see a currently not supported case 0. So we could implement this case 0.

    // set up the interrupt frequency
    switch (tone_timers[_index]) {
      case 0:
        // Not currently supported
        break;

      case 1:
        timer1_disable();
        timer1_isr_init();
        timer1_attachInterrupt(t1IntHandler);
        timer1_enable(TIM_DIV1, TIM_EDGE, TIM_LOOP);
        timer1_write((clockCyclesPerMicrosecond() * 500000) / frequency);
        break;
    }
  }
}

void disableTimer(uint8_t _index) {
  tone_pins[_index] = 255;

  switch (tone_timers[_index]) {
    case 0:
      // Not currently supported
      break;

    case 1:
      timer1_disable();
      break;
  }
}

I got the idea now, for making PWM and tone compatible. We can do this with no change for PWM and only a very small change for tone. PWM uses timer1. We needn't change this. Tone also uses timer1. We change this by using timer2 instead. How can we do this? For timer1 we don't use the hardware timer1, but a virtual timer1 und for timer2 we do the same. More hardware timers than one dont bring any advantage, because we can't use additional time from a parallel universe. Because there is only one time dimension, which we can use, one hardware timer is fully sufficient.

The question is, whether the arduini pwm only uses timer1 or also functions of the pmw API of espressif or functions of ETS. The answer is, it does so:

void pwm_start_timer()
{
    timer1_disable();
    ETS_FRC_TIMER1_INTR_ATTACH(NULL, NULL);
    ETS_FRC_TIMER1_NMI_INTR_ATTACH(pwm_timer_isr);
    timer1_enable(TIM_DIV1, TIM_EDGE, TIM_SINGLE);
    timer1_write(1);
}

This means, changing timer.c isn't a solution. The interrupt service routine of timer.c and of pwm must be the same, if tone and pwm shall work both.

What would this mean? The timer1 functions would have to be integrated into pwm instead being an own library. I don't know, whether others would like this idea.

Faced with the same problem. Trying to make an adaptive lcd backlight brightness with making some sound using analogWrite() and tone() functions. The analogWrite() function controls brightness well, until the tone() function is used. Then analogWrite() resets to 0 and doesn't work anymore. tone() continue working as usual. I hope that respected AlfonsMittelmeyer will find simple and beautiful solution. Thanks pal for the good job!

@X-Stas-EEE : then you use core version 2.3.0. Core version 2.4 kills the tone instead and also the normal timer1 interrupt.

I would like to know, why on earth a timer1 NMI interrupt is now used since core version 2.4.0 for PWM. A NMI interrupt sounds to be a good idea, because it can't be masked. But I tested this yesterday. A NMI interrupt isn't a good idea for small PWM values. A timer1 NMI interrupt for the ESP8266 has a poor accuracy for short times and doesn't work proper for time spans below 10 µs. With normal timer1 interrupts also time spans of 4 µs work proper if only one timer is used.

Yesterday I implemented a module for four shared timers. One timer is for PWM, one timer for tone, one timer could replace timer1 of the timer1 module and for a further timer I would like to implement new features. These four timers are enough because I have a further idea. Why shouldn't these timers have children? Without the implementation of children, I could present a solution maybe tomorrow for PWM and tone working together. Oh not tomorrrow, because tomorrow I don't have time, then today or after tomorrow.

With children also multi channel tone could be used and there would be an unlimited number of further timer interrupts. But this would take about two or three days more.

Maybe multi channel tone and an unlimited number of timer interrupts needn't be implemented as a core module. Should these features be implemted as a library?

@AlfonsMittelmeyer Good stuff! Will wait for the solution. You're awesome! Thanks!

@AlfonsMittelmeyer

Should these features be implemted as a library?

Yes, please!

OK, then the core module for tone will have only one tone channel by using TIMER_SHARED_TONE. But an additional library will offer multi channel tone by using the same timer for children timers. And the same library will also offer children timers for each of the four primary shared timers and maybe some other basic functionalities for creating and measuring time dependent signals.

Sounds good!

Is there something special to know about compiling the core?
The core files are on my Raspberry pi3 in this directory:

/home/pi/.arduino15/packages/esp8266/hardware/esp8266/2.4.1/cores/esp8266

Therein i had put my module "timer_shared.cpp" and the header file "timer_shared.h"

This is the content of my header file:

#include "c_types.h"
//=======================================================================
//                        defines 
//=======================================================================

#define TIMER_SHARED_PWM 0
#define TIMER_SHARED_TONE 1
#define TIMER_SHARED_TIMER1 2
#define TIMER_SHARED_USR 3

//=======================================================================
//                  function prototypes: interface functions
//=======================================================================

void shared_timer_isr_init(void);
void shared_timer_nmi_isr_init(void);

void timer_shared_attachParamInterrupt(uint8_t shared_timer, void (* userFunc)(void *), void * parameters );
void inline timer_shared_attachInterrupt(uint8_t shared_timer, void (* userFunc)(void));

void timer_shared_enable(uint8_t shared_timer, uint8_t divider, uint8_t int_type, uint8_t reload);

void ICACHE_RAM_ATTR timer_shared_stop_inisr(uint8_t shared_timer);
void timer_shared_stop_nonisr(uint8_t shared_timer);
void timer_shared_disable_nonisr(uint8_t shared_timer);
#define timer_shared_detachInterrupt timer_shared_disable_nonisr

void ICACHE_RAM_ATTR timer_shared_write_inisr(uint8_t shared_timer , uint32_t ticks);
void timer_shared_write_nonisr(uint8_t shared_timer , uint32_t ticks);
void timer_shared_reload(uint8_t shared_timer,uint32_t ticks);
void timer_shared_write_reload_nonisr(uint8_t shared_timer , uint32_t ticks, uint32_t reload_ticks);

uint32_t timer_shared_read(uint8_t shared_timer);
uint8_t timer_shared_enabled(uint8_t shared_timer);

My module compiles without problem. But I would like to call functions of my module from the module "core_esp8266_wiring_pwm.c". Of course I included my header file. But when I compile I get such error messages:

arduino.ar(core_esp8266_wiring_pwm.c.o):(.iram.text+0x14): undefined reference to timer_shared_write_inisr'

What must I do, that the references to my functions are found?

I compiled via Arduino IDE.

@AlfonsMittelmeyer you forgot about C++ mangling :) your new core files are .cpp, but you're trying to call your new functions from a .c file. You have to guard your new functions with
#ifdef __cplusplus
extern "C" {
#endif

...

#ifdef __cplusplus
}
#endif

I got it, it compiled. I should use extern "C" and I shouldn't use inline in a declararion for external use.

OK, I should also use #ifdef __cplusplus. In Arduino IDE this isn't neccessary, because cpp Files are compiled by c++ compiler and c files by c compiler. I will do this later after testing.

Or because I don't use classes, I also could make a c file after some work, like put local variables at the beginning of a function. Don't initialize structures at declaration time, maybe don't use line comment and don't use inline.

This with extern "C" was a bit tricky for me. The changes for tone and pwm, so that they use my shared timers were very simple. Now tone and pwm work both. But I go to bed now and wish a good night.

@AlfonsMittelmeyer Good job! Looking forward to try your lib!

@X-Stas-EEE this wasn't a lib but a core module!

I should also make measurements about performance and maybe I should disable other interrupts in non NMI mode, because the normal timer1 ISR does this also.

But not today.

@AlfonsMittelmeyer Wow! You're a rockstar! Many thanks for your job!

@X-Stas-EEE Thank you very much for your kind opinion. But there still is a bug. I thought, that the timer clock will stop, if it reaches zero. But this was a wrong thought. The timer clock still runs on, so I have to disable the edge interrupt. I also want to implement exactness. So a good version I will present tomorrow. I hope that it could become very good.

No, it's some other bug, which I have to find.

@AlfonsMittelmeyer It's ok. We will wait patiently. I believe that your work is very important for the community.

@X-Stas-EEE sorry that I ddn't finish it today. But it looks good, that I could do this tomorrow. After a night wasted for finding a nasty bug, I felt very tired. It's very difficult to find a bug, if there isn't one. When I wanted to give up, because the behaviour was so crazy, that I didn't have any idea more and when I was too tired for thinking, I found it. Wrong ideas are a hinderness for finding something out. Without any ideas, it's very easy. I had looked a long time to the values, which I printed on the screen via Serial.print. And they were so crazy. Why shouldn't I let a LED toggle in each measurement? It toggled not correct. Why should I only look for values about 3000 ticks and below. Why not 80 million ticks? And then the LED blinked. Oh goodness. in my sketch for testing I had set TIM_LOOP instead of TIM_SINGLE for the timer interrupt. I never would have thought this. So giving up thinking is a good way to find bugs. After correcting this, all looked very stable. And then I made a good progress. Some features like no jitter also for small time spans and tuning also for 160 Mhz I will do tomorrow.

No jitter for small values isn't so easy. I need a delay. But how do I get a delay of half the time, which is necessary for storing a value into a variable? Does somebody know this? I would be much delightet, if I don't have to spend much time for investigation.

@AlfonsMittelmeyer it sounds like you're making progress. Take your time, better slow and stable than fast and buggy :)
BTW, if you make a PR with your wip, others can give you feedback as you develop, including help you with testing.

@AlfonsMittelmeyer Looks like you did a great job. Thank you very much! I think you should take a rest. Take care of yourself. No problem is worth the health.
@devyte I think it could help.

@devyte What's a PR and how to do it?

A PR is a pull request. Please look me up in gitter to discuss.

Thank you very much. This I will do later. Now I need to find a solution for a delay in ticks. For greater values I can use the timer counter, which delivers a resolution of 1 tick. But for smaller values I have to take a time consuming loop. Currently I have a resolution of only 6 ticks for small values.

Here you see the ticks, which I had set and which I measured:

set measured

70: 69
71: 69
72: 75

There is a jump of 6 ticks from 69 to 75. I can adjust the jump to take place for 73 instead of 72. For more resolution I need a special delay function with a resolution of one tick. The compiler optimized all my tries away.

Such tries always delivered 75 ticks independent of the parameter:

static volatile uint16_t __attribute__((optimize("O0"))) delay_ticks(uint16_t ticks) {
  while(ticks--)
    asm_ccount();
}

A write operation to a read only register in assembler could be helpful. If the compiler wouldn't know, that it does nothing, then maybe it will not be optimized away. This could give a resolution maybe of 3 ticks. For a resolution of one tick, i have to investigate, how to use the GNU assembler and about the instruction set and registers of the ESP8266 CPU.

I think now, just one instruction, which will not be otimized away, would be sufficient also for one tick resolution, when I combine this with preceding switch cases.

Sorry, I made some other nonsense. Maybe the compiler didn't optimize it away. I just didn't see the output and mistook another output for it.

It's only those who do nothing that make no mistakes © Joseph Conrad

It's difficult to get a constant execution time. In former times code was in the Rom or in the RAM and had a predictable execution time. But now the execution time depends also on location in chache.

This function would be one part of the time delay function:

static volatile void delay_ticks_divided(uint16_t volatile ticks) {
  while(ticks--);
}

But I don't know how I get a constant execution time. For a parameter value of 1000 I got execution times of 6000 until 18000 ticks. I would like to have a constant execution time, which is an integer multiple of the parameter value. For thousand something like 6000 would be good, but not something like 7346. Does somebody know, how to get such a constant execution time for this function?

Maybe what you need is delayMicroseconds()?

@AlfonsMittelmeyer As part of my much delayed ESP8266-eggduino for Easter, I've been writing a new timer1-driven waveform generator which might give you some ideas on how to proceed.

https://github.com/earlephilhower/Arduino/blob/new_timer1_irq/cores/esp8266/core_esp8266_timer1.c

Since a 3d printer (well, 2.5d in the egg case) needs to produce multiple tone()s, analogWrites(), servo signals, and stepper motor control motions, it's set up as a generic, IRQ driven waveform generator driven off of "when is the next edge I need to generate?"

What may help your use is that it uses both Timer1 and the ESP cycle counter in a feedback loop.

Timer1 is loaded with the # of ticks (- epsilon) until the next edge of the multiple waveforms, to ensure the IRQ gets called. But because there's much jitter in when the IRQ actually gets called, I use the ESP's internal cycle counter register to base all the waveform logic off of. That way even if the IRQ is delayed it's taken account of for all other waveforms and not additive.

Also, I'm not sure I understand why you're worried about tick accuracy. 1 tick @ div-by-1 == 1 ESP8266 instruction = 12.5ns or 6.25ns, no? That's unreasonable to expect outside of fixed-function HW like the AVRs which, IIRC, can actually toggle an IO bit in HW off of their internal timers. What frequency are you trying to generate? At 1khz you've got 0.5ms/edge, so if you can keep accuracy to w/in 1% that's 5us = +/-400 or 800 instructions. @ 0.1% accuracy = 1us => not gonna happen with IRQs, there's no time for any user code.

@devyte No, thanks, it's not microseconds, what I would like to have. Currently I have 6 ticks, which is about µs/13. It would be nice to have µs/80 or µs/160 in case of 160 MHz CPU frequency. Of course this exactness would not be necessary. I only wonder, whether it could be done. Yesterday I tried to find a solution in C/C++, but couldn't. Often a while instruction consumes 320 additional ticks and I don't have any idea why and how to avoid this. So now I will try it in assembler.

I don't quite understand the problem you're describing, but here are some thoughts:

  • What if you decorate that function with ICACHE_RAM_ATTR?
  • Have you seen the function bodies of millis()/micros()?

@devyte: The problem with problems is, that problems are difficult to unterstand, because they have some complexity.

I found a solution, which should only be used in nmi interrupts, because it doesn't readjust to time consumed by interruptions like interrupts or tickers. My function delay_ticks works correct for values >= 4

If you would like to unterstand, what's the problem with while statements, then try to replace my assembler routine "delay_ticks_resolution4" by using while instead. I tried a lot without any success. A while routine works correct, if we don't try to adjust also for remaining ticks. If there are also other statements involved, then the compiler compiles something with a timing behaviour, which isn't usable. Instead of 6 ticks for a loop it could be 10. I also got 20. But most disturbing is an additional delay of about 320 ticks, which the compiler produces. I don't know, why it does this, maybe pushing a lot of register contents to the stack?

The same behaviour of an additional delay of 70 or 130 ticks I also got by using an assembler routine, which uses the ccount register. Maybe I should try to implement the switch cases also in assembler. Then such a delay_function could also be used outside of nmi interrupts. Of course, if an interrupt exceeds the duration of a delay function, the delay function cannot stop during an interruption, only readjust in the case, thad the delay exceeds the interrupt.

Here is my code for the delay_ticks function:

// ===========================================================
//               Measuring time by CCOUNT register
// -----------------------------------------------------------
static inline volatile int32_t asm_ccount(void) {
    int32_t r;
    asm volatile ("rsr %0, ccount" : "=r"(r));
    return r;
}
// ===========================================================



// ===========================================================
//               delay_ticks for use during nmi interrupts
// -----------------------------------------------------------

// scaled loop (ticks divided by 4) as assembler routine, because while causes problems

static inline volatile void delay_ticks_resolution4(int32_t ticks) {
    asm volatile (
      "    srai %0 , %0 , 2; "        //  ticks >>= 2; // division by 4, because the loop consumes 4 ticks, arithmetic shift right for not changing the sign flag
      "1:                   ;"        //  do {
      "    addi %0 , %0 , -1;"        //    ticks -= 1;
      "    bgez %0, 1b;      "        //  } while( ticks >= 0 );
      
      : : "r"(ticks)); // ticks is register %0 and is an input register
}

// idle operation (no operation) for time consumption used in switch case

static inline volatile void noop() {
    asm volatile ( "nop;" );
}

static inline volatile void delay_ticks(int32_t volatile ticks) {

  ticks -= 4; // correction for additional execution time of this function and of delay_ticks_resolution4

  switch(ticks & 3) { // cases for consumption of ticks % 4
    case 3:
      noop();
    case 2:
      noop();
    case 1:
      noop();
    case 0:
      delay_ticks_resolution4(ticks); // loop in assembler for consumption of (ticks / 4) * 4
  }
}
// ===========================================================



// ===========================================================
//            variables used for measurement
// -----------------------------------------------------------
volatile int32_t time_0; // time measurement before delay
volatile int32_t time_1; // time measurement after delay
volatile bool isr_ready; // flag for polling whether timer isr had executed
// ===========================================================


// ===========================================================
//      nmi timer 1 isr containing time measurement for delay ticks
// -----------------------------------------------------------
static void ICACHE_RAM_ATTR timer_nmi_isr() {
  
  time_0 = asm_ccount(); // time measurement before delay - consumes 10 ticks: 1 tick for time measurement, 9 ticks for storing in variable

// -----------------------------------------------------------
  delay_ticks(10); // test also with higher values, delay_ticks works correct for ticks >= 4
// -----------------------------------------------------------

  time_1 = asm_ccount(); // time measure after delay

  isr_ready = true; // set flag, that nmi did execute
}
// ===========================================================


// ===========================================================
//        SETUP
// -----------------------------------------------------------

void setup() {
  Serial.begin(115200);

  // initialize timer1 nmi interrupt
  ETS_FRC_TIMER1_NMI_INTR_ATTACH(timer_nmi_isr);
  T1C = 0x80;
  TEIE |= TEIE1;
}
// ===========================================================



// ===========================================================
//    LOOP
// -----------------------------------------------------------
void loop() {
  if(Serial.available()) { // wait for user send serial

    while(Serial.available()) // read user input
      Serial.read();

    isr_ready = false; // clear flag
    T1L = 1; // trigger timer nmi
    while(!isr_ready) // wait until timer1 nmi did execute
        
    Serial.println(time_1 - time_0 - 10); // subtract also 10 ticks for time measurement before delay
  }
// ===========================================================
}

@earlephilhower

You wrote:

What may help your use is that it uses both Timer1 and the ESP cycle counter in a feedback loop.

This I do too. The functions, which replace T1L = ticks, have an integrated ccount measurement.

For PWM I implemented precedence over other shared timers, if it's only a small time span. Therefore exists an own loop for small timespans with only very little overhead. And for pwm there exists also an own trigger function, which executes very fast, but lacks some features like timespans until 0x7FFFFFFF ticks instead only until 0x7FFFFF:

#define timer_shared_write_pwm(ticks) timer_shared_write_pwm_ccount(timer_shared_asm_ccount(),(ticks));
void ICACHE_RAM_ATTR timer_shared_write_pwm_ccount(uint32_t par_ccount, uint32_t ticks);

Here you see, that function "timer_shared_write_pwm(ticks)" is a define, which contains the ccount measurement, already before the real trigger function is called. I couldn't use inline here for the time measurement, because other modules need C binding. Offering an inline version for modules compiled with C++ also doesn't make sense, because this would result in a different -epsilon, which cannot be distinguished without wasting time for flag storage.

You wrote further:

Timer1 is loaded with the # of ticks (- epsilon) until the next edge of the multiple waveforms, to ensure the IRQ gets called.

This is correct, but not sufficient. There is a time delay between the counter expires until the ISR is called, maybe of 130 ticks for NMI timer1 interrupts. This means short time spans like analogWrite(1) or analogWrite(2) cannot be handled by triggering the next IRQ, they must be handled by a loop, which consumes time. I measured, that currently PWM for small values and also for large values are very incorrect. Because it wasn't a precise measurement done by pulseIn I only can estimate it. I would think, that the PWM cycles are 3 µs too long, which results in an error for small values, which shouldn't be tolerated und that pin LOW is often missing for a value of 1020 and fully missing vor values in the range of 1021 .. 1023.

You wrote further:

But because there's much jitter in when the IRQ actually gets called, I use the ESP's internal cycle counter register to base all the waveform logic off of. That way even if the IRQ is delayed it's taken account of for all other waveforms and not additive.

I measured, that there isn't any jitter, when using an NMI interrupt and only one shared timer. Of course using more than one shared timers would result in some jitter, when there would be an overlapping of shared timers. In case of TIM_LOOP, which fits for tone, the jitter isn't additive. In case of TIM_SINGLE the user isr callback for the shared timer has to handle itself, whether the jitter shall be additive or not.

You wrote:

What frequency are you trying to generate? At 1khz you've got 0.5ms/edge, so if you can keep accuracy to w/in 1% that's 5us = +/-400 or 800 instructions. @ 0.1% accuracy = 1us => not gonna happen with IRQs, there's no time for any user code.

Oh it's simple. The default frequency of PWM is 1 khz with analogWrite values from 1 to 1024. and with 1024 for the full cycle. Value 1 means 80000 / 1024 ticks (for 80 MHz CPU frequency). This means about 0.98 µs steps for analogWrite values or 78 CPU ticks for analogWrite(1). I implemented, that if ticks are triggered at the very end of the user isr, then this will be the ticks until the very beginning of the next user isr callback. If it should be the ticks from triggering ticks until next time triggering ticks, then this has to be implemented in the user callback by considering it's execution time. My implementation allows these 78 ticks for PWM. But for a timespan from tick triggering to tick triggering, by subtracting further ticks in PWM, the accuracy will not be fully sufficient for analogWrite(1). analogWrite(2) shouldn't be a problem. PWM would be very exact, with my shared timer implementation. It could be more exact, if the PWM isr itself would consider its own execution time. But then for analogWrite(1) the accuracy will not be enough when using 80 MHz, could be 160 Mhz could do it.

I saw, that the CPU offers loops, which are performed by the processor without consuming time. Such loops are ideal for signal processing. But what a pity. The GNU assembler doesn't know this loop option. A loop done in assembler be decrementing and branching costs 4 ticks per loop. The C/C++ compiler could waste an unspecified time of 6 to 20 ticks for one loop depending on some compiler internals, like code length of the function or number of used registers for local variables. And then there are also unmotivied delays, like 320 ticks, only because I had used a while when I tried to implement a delay_ticks function.

After having noticed such unpredictable timing behaviour of C++ now I understand, why some institutions use ADA instead.

I saw, that the CPU offers loops, which are performed by the processor without consuming time

CPU configuration used in ESP8266 doesn't have zero-overhead loops. The one used in ESP32 does.

@igrr : thank you. Then I don't need to look further for it. The implementation with switch cases is a very great hack, which works for 4 cases, but not for seven. More cases produce more overhead and different times for execution. It's pure luck, that it worked in this case and should be implemented in assembler in a proper mannner. Such switch cases implemented in assembler would also be usable for taking ccount into account. An assembler routine, which considers ccount, would be this:

static inline volatile int32_t asm_delay_ticks(int32_t ticks) {

    int32_t expire_ccount;
    int32_t current_ccount;
    
    asm volatile (
    "   rsr %2, ccount;" // current_ccount = register_ccount;
    "   add %1,%2,%3;"   // expire_ccount = current_ccount + ticks;
    "1:                  // do {
    "   rsr %2, ccount;" //   current_ccount = register_ccount;
    "   sub %0,%1,%2;"   //   ticks = expire_ccount - current_ccount;
    "   bgei %0, 6, 1b;" // } while(ticks >= 6);
    
    : "=r"(ticks),"=r"(expire_ccount),"=r"(current_ccount): "r"(ticks));

    return ticks;
}

Completed by adding the switch cases and by adding not enter the loop for only a few ticks, this would be the ideal delay_tick function.

Today I was ill, but tomorrow I will try to complete the shared timer module.

Sorry, my solution with the switch cases, which I had presented before, is no solution. It only works for constant values. I had thought this over. How could such a function work without any overhead, which it did. This is strict impossible for a real function, which is compiled for variable values.

If we have this function:

static inline volatile void delay_ticks_switch(int32_t ticks) {

  switch(ticks) {
    case 3:
      noop();
    case 2:
      noop();
    case 1:
      noop();
    case 0:
      noop();
  }
}

And if we call this function by:

delay_ticks_switch(3);
What would you think, that the compiler would compile?

It simply compiles:

noop();
noop();
noop();

For more cases it would compile something different and for a variable value something else. So this hack simply isn't something useful. A real implementation of a real delay_ticks function in assembler would have probably an overhead of 5 ticks. And such a time consuming switch case may be implemented in this way:

static inline volatile void asm_switch_delay(int32_t ticks) {
    
    asm (
    "   beqi %0, 0, 2f;"
    "   beqi %0, 1, 2f;"
    "   beqi %0, 2, 2f;"
    "   beqi %0, 3, 2f;"
    "   beqi %0, 4, 2f;"
    "   beqi %0, 5, 2f;"
    "2:               ;"  : : "r"(ticks));
}

I got the delay_ticks function, which is needed:

static inline volatile void delay_ticks(int32_t ticks) {

    int32_t expire_ccount;
    int32_t current_ccount;
    
    asm volatile (

    "   bgei %3, 12, 0f;" // >= 12 // if ticks >= 12 then goto label 0
    "   blti %3,  7, 3f;" // <= 6 ( < 7 ) // in the following cases goto label 3 - the end
    "   beqi %3,  7, 3f;" // == 7
    "   beqi %3,  8, 3f;" // == 8
    "   blti %3, 10, 3f;" // == 9 ( < 10)
    "   beqi %3, 10, 3f;" // == 10
    "   blti %3, 12, 3f;" // == 11 ( < 12 )
    
    "0:                ;"
    "   addi %0, %3, -12;" // ticks -= 12; // adjustment for time consumption for ticks in the range 12 .. 17
    "   blti %0, 6,  2f;"  // <= 17 ( < 18) // if ticks < 18 then goto label 2
    "   addi %0, %0, -2;"  // ticks -= 2 // adjustment for loop, considering the time for he following 2 instructions and the time consumption of the cases after the loop (label 2, overhead 4 ticks)
    
    "   rsr %2, ccount;"    // current_ccount = register_ccount;
    "   add %1, %2 ,%0;"    // expire_ccount = current_ccount + ticks;
    "1:             ;"      // do {
    "   rsr  %2, ccount;"   // current_ccount = register_ccount;
    "   sub  %0, %1, %2;"   //   ticks = expire_ccount - current_ccount;
    "   bgei %0, 6, 1b;"    // } while(ticks >= 6);

    "2:               ;"
    "   blti %0, 1, 3f;" // remaining ticks <= 0 ( < 1 ), instead of == 0, to be sure, that nothing wrong could happen
    "   beqi %0, 1, 3f;"
    "   beqi %0, 2, 3f;"
    "   beqi %0, 3, 3f;"
    "   beqi %0, 4, 3f;"
    "   beqi %0, 5, 3f;"
    
    "3:                ;" 
    
    : "=r"(ticks),"=r"(expire_ccount),"=r"(current_ccount): "r"(ticks));

}

It has an internal overhead of 6 ticks, which is subracted. So it works correct for values >= 6, if the parameter is passed by a local variable (register) or by a constant value < 2048. For values below 6, the execution time will be 6 ticks. For constant values >= 2048 the compiler produces an additional overhead of 6 ticks for parameter passing. It's clear that also passing a value from a variable in the RAM produces an overhead, which has to be considered.

Wasn't this a nice idea? This never could have be done using c/c++ without assembler.

By better measurement via a register (local variable) I saw, that the overhead was only 5 ticks and by further optimising I got now an inline version for exact delays >= 4 ticks and a ICACHE_RAM_ATTR version for exact delays >= 11 ticks.

Now there is enough done about this and so I should complete my work about the shared timers. Further otimising of the shared timers could be done later after a pull request and releasing the first version.

@AlfonsMittelmeyer please look me up in gitter to discuss your work, I have several questions. Also, I'm having a bit of trouble keeping up with what you're doing, and I suspect I'll be reviewing later on, so I'd like to have a direct channel open with you for quick back and forth.

@devyte I just visited gitter for the first time. It's time to begin with discussing the details. Yesterday I had the idea about the optimal interface between the shared timer isr and the user callbacks.

First I had implemented a complicated write function, which could replace T1L = ticks. But yesterday I had the optimal idea: return ticks;

And for stopping the user isr callback from inside: return 0; (which applies for tone with duration)

PWM should work jitterless and 80 ticks for analogWrite(1) shouldn't be a problem.

There isn't much to change: simply forget about TEIE and T1L, just return. But for tone I would like to optimize the isr. digitalWrite with 69 ticks shouln't be used, if this could also be done by a few ticks.

Does somebody know, how to do this, without causing a bug?

timer_index |= 8;

The problem is, that this variable is also used by an interrupt, which changes other bits. If this interrupt would occur between this statement, which changes other bits, I would undo this changes.

I remember, that a former colleague had this problem, when using ^=. And he searched for the bug maybe two weeks and couldn't find it. Then he asked me to help him. I told him, that he has to include the statement in disable and enable the interrupt, because this statement are three instructions:

register = variable;
register |= 8;
variable = register;

And after having done this, the bug was fixed. But now it's an NMI interrupt, which cannot be masked. Maybe the CPU knows an instruction for an |= for memory?

if not, then I have to invest 6 ticks more for reading a flag from another variable. What a pity, currently I could reduce the minimum time for triggering the next call of the PWM ISR to 32 ticks. For doing this, the delay_ticks function was very helpful.

Hi guys! Any progress here?

Yes, there is progress. earlepilhower presented a solution and my solution will be ready also in a few days. Sorry, that it took a long time. Currently there are only very few proplems. I test with multichannel tone with 7 pins for 7 tone channels and pwm at the same time. Tone with 7 channels works very well in combination with PWM frequencies until 4 khz. But not good any more for 10 khz PWM frequency and very bad, when I use 30 khz for PWM. Further 40 khz PWM frequency doesn't work at all. There is some little work still to do. But it shouldn't take much days any more.

Oh, the problem with bad sound in combination with 30 khz PWM could easily be solved. I used timers, which were configurated to be additive to jitter. Changing the Parameter to not additive to jitter solved the problem.

Then there is only the problem, that frequencies of more than 30 kHz cause a reset. But I want up to 50 kHz. Not for PWM, but for sound sampling with 44.2 kHz. I must do measurements for exact adjustment of the software timers, so that I may know, which frequencies are possible and how to adjust the timers correctly. This means to know exactly, when the hardware timer (timer 1) shall be triggered and when the loop within the interrupt should be executed.

Some details about the timers: there are High Priority timers and Low Priority Timers and one First Priority Timer. LOW priority timers don't disturb High Prioriy Timers, because they only use big enough gaps (configurable) between High Priority timers. And the First Priority timer may be configured, so that other High Priority timers don't disturb it. This means, that the First priority timer may execute at the very exact time (in CPU ticks). So exact steps until one CPU clock tick (1/80 µs or 1/160 µs) are possible, if only one pin for PWM is used and if the lowest analogWrite values don't matter. If there are more pins used for PWM, then 2kHz PWM frequency should be possible without any tick jitter, if pin D0 isn't used - I still have to do a little change for reaching this goal.

@AlfonsMittelmeyer sounds fantastic! Thanks!

I could optimize the code. 2 kHz PWM frequency may now work also jitterless for more than one PWM pin (if pin D0 isn't used). But there is a little bug. I clearly saw the difference of analogWrite(pin,1) and analogWrite(pin,2), when I looked at a LED. After this I measured the ticks. I had chosen a PWM frequency of 2 kHz, a PWM range of 1000 and the PWM value 1. So we should expect to measure a difference of 40 ticks from setting the pin to HIGH until setting the pin to LOW. But I measured 39 ticks, which is 1 tick short. The same problem also when I used 1000 kHz, then I got 79 ticks instead of 80. I think, that I forgot rounding in my algorithm. For the PWM frequency I had used float instead of integer, because I think, that this would make more sense.

I found the bug. The problem is this function call:

delayTicks(expire_ccount - asm_ccount());

It works correct, when the code of the function call is located at a 32 bit word border. I inserted a no operation code somewhere before and then it worked correct. A pity that this function call always has to be checked after code changes. Maybe I could implement a function, which works correct independent of the code location of its call? Even if it doesn't work for delays >= 4 ticks but only for delays >= 5 ticks.

An interesting question is about the PWM frequency, up to which jitterless PWM may be done even, when more pins are used for PWM.

I measured 35 ticks as minimum. 2 ticks have to be subracted, which I had added for measurement. So the minimum is 33 ticks, which would mean a PWM frequency of up to 2.4 kHz.

I found the solution for this problem. Instead of:

delayTicks(expire_ccount - asm_ccount());

I use now:

waitUntilExpire();

This is the following macro:

#define waitUntilExpire() { \
  asm volatile ( \
  "0: rsr  %1, ccount;"    /* label0: current_ccount = ccount; // special CPU clock counter register */ \
  "   sub  %0, %2, %1;"    /*         ticks = expire_ccount - current_ccount; */ \
  "   bgei %0,  7, 0b;"    /*         if(ticks >= 7) goto label0; */ \  
\
  "   blti %0,  1, 1f;"    /*         if(ticks <  1) goto label1; */ \
  "   beqi %0,  1, 1f;"    /*         if(ticks == 1) goto label1; */ \
  "   beqi %0,  2, 1f;"    /*         if(ticks == 2) goto label1; */ \
  "   beqi %0,  3, 1f;"    /*         if(ticks == 3) goto label1; */ \
  "   beqi %0,  4, 1f;"    /*         if(ticks == 4) goto label1; */ \ 
  "   beqi %0,  5, 1f;"    /*         if(ticks == 5) goto label1; */ \
  "   bgei %0,  6, 1f;"    /*         if(ticks >= 6) goto label1; */ \
  "1:                ;"    /* label1:                             */ \
\
  : "=r"(ticks),"=&r"(current_ccount): "r"(expire_ccount) \
  ); \
}

The correctness is independent of the location, because it doesn't matter, whether it takes one tick longer at a different code location. Important is only, that for one code location, the behaviour is the same for all values.

I thought, that today I could finish my work. But then a bug occurred. This works fine for 10 kHz:

  analogWriteFreq(10000);
  analogWriteRange(1600);

  for(float i = 1; i < 1600; i *= 1.05 ) {
    analogWrite(D1,i);
    delay(10);
  }
  
  for(float i = 1600; i >= 1; i /= 1.05 ) {
    analogWrite(D1,i);
    delay(10);
  }

But it doesn't work well for 20 kHz. At the end the brightness of the LED should be very low. But for 20 kHz at the end the LED brightness jumps to seemingly full brightness. Seems to be a calculation error somewhere.

@X-Stas-EEE , if you're still trying to do multiple analogWrite()s and tone()s, you could try pull request #4640 which also seems to fix an occasional WDT in the existing analogWrite.

@earlephilhower thanks a lot! I'll try it as soon as I have time!

@earlephilhower Many thanks! It works fine!
@AlfonsMittelmeyer Thank you too!

Closing this issue as fixed with #4640 commit.